# Explaining "Black-box" models with LIME and ELI5

The term black-box model is used for some models because they are so complex that we don't have a good understanding of what decisions the model makes to reach and outcome.

Not understanding the decision making of a model can pose risks in production, but can also make it difficult to know where you need to improve your model, and what it's weak points currently are.

We'll be looking into a technique which tries to give us more insight into such black-boxes.

In [24]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.linear_model import LogisticRegressionCV
from sklearn.pipeline import Pipeline

categories = ['alt.atheism', 'soc.religion.christian',
              'comp.graphics', 'sci.med']
twenty_train = fetch_20newsgroups(
    subset='train',
    categories=categories,
    shuffle=True,
    random_state=42
)
twenty_test = fetch_20newsgroups(
    subset='test',
    categories=categories,
    shuffle=True,
    random_state=42
)

pipe = Pipeline()

pipe.fit(twenty_train.data, twenty_train.target)

In [26]:
from sklearn import metrics

def print_report(pipe):
    y_test = twenty_test.target
    y_pred = pipe.predict(twenty_test.data)
    report = metrics.classification_report(y_test, y_pred,
        target_names=twenty_test.target_names)
    print(report)
    print("accuracy: {:0.3f}".format(metrics.accuracy_score(y_test, y_pred)))

print_report(pipe)

                        precision    recall  f1-score   support

           alt.atheism       0.93      0.79      0.86       319
         comp.graphics       0.87      0.96      0.91       389
               sci.med       0.94      0.81      0.87       396
soc.religion.christian       0.85      0.97      0.91       398

           avg / total       0.89      0.89      0.89      1502

accuracy: 0.889


In [28]:
import eli5

eli5.show_weights(clf, top=10)

Weight?,Feature,Unnamed: 2_level_0,Unnamed: 3_level_0
Weight?,Feature,Unnamed: 2_level_1,Unnamed: 3_level_1
Weight?,Feature,Unnamed: 2_level_2,Unnamed: 3_level_2
Weight?,Feature,Unnamed: 2_level_3,Unnamed: 3_level_3
+1.992,x21167,,
+1.931,x19218,,
+1.833,x5714,,
+1.814,x23677,,
+1.697,x26415,,
+1.695,x15511,,
+1.611,x6440,,
+1.593,x26412,,
… 10720 more positive …,… 10720 more positive …,,
… 25059 more negative …,… 25059 more negative …,,

Weight?,Feature
+1.992,x21167
+1.931,x19218
+1.833,x5714
+1.814,x23677
+1.697,x26415
+1.695,x15511
+1.611,x6440
+1.593,x26412
… 10720 more positive …,… 10720 more positive …
… 25059 more negative …,… 25059 more negative …

Weight?,Feature
+1.702,x15699
+0.825,x17366
+0.798,x14281
+0.786,x30117
+0.779,x14277
+0.773,x17356
+0.729,x24267
+0.724,x7874
+0.702,x2148
… 11358 more positive …,… 11358 more positive …

Weight?,Feature
+2.014,x25234
+1.950,x12026
+1.758,x17854
+1.696,x11729
+1.653,x32847
+1.521,x22379
+1.518,x16328
… 12591 more positive …,… 12591 more positive …
… 23188 more negative …,… 23188 more negative …
-1.766,x15521

Weight?,Feature
+1.193,x28473
+1.029,x8609
+1.021,x8559
+0.946,x8798
+0.899,x8544
+0.796,x8553
… 10961 more positive …,… 10961 more positive …
… 24818 more negative …,… 24818 more negative …
-0.852,x15699
-0.893,x25663


In [29]:
eli5.show_weights(clf, vec=vec, top=10,
                  target_names=twenty_test.target_names)

Weight?,Feature,Unnamed: 2_level_0,Unnamed: 3_level_0
Weight?,Feature,Unnamed: 2_level_1,Unnamed: 3_level_1
Weight?,Feature,Unnamed: 2_level_2,Unnamed: 3_level_2
Weight?,Feature,Unnamed: 2_level_3,Unnamed: 3_level_3
+1.992,mathew,,
+1.931,keith,,
+1.833,atheism,,
+1.814,okcforum,,
+1.697,psuvm,,
+1.695,go,,
+1.611,believing,,
+1.593,psu,,
… 10720 more positive …,… 10720 more positive …,,
… 25059 more negative …,… 25059 more negative …,,

Weight?,Feature
+1.992,mathew
+1.931,keith
+1.833,atheism
+1.814,okcforum
+1.697,psuvm
+1.695,go
+1.611,believing
+1.593,psu
… 10720 more positive …,… 10720 more positive …
… 25059 more negative …,… 25059 more negative …

Weight?,Feature
+1.702,graphics
+0.825,images
+0.798,files
+0.786,software
+0.779,file
+0.773,image
+0.729,package
+0.724,card
+0.702,3d
… 11358 more positive …,… 11358 more positive …

Weight?,Feature
+2.014,pitt
+1.950,doctor
+1.758,information
+1.696,disease
+1.653,treatment
+1.521,msg
+1.518,health
… 12591 more positive …,… 12591 more positive …
… 23188 more negative …,… 23188 more negative …
-1.766,god

Weight?,Feature
+1.193,rutgers
+1.029,church
+1.021,christians
+0.946,clh
+0.899,christ
+0.796,christian
… 10961 more positive …,… 10961 more positive …
… 24818 more negative …,… 24818 more negative …
-0.852,graphics
-0.893,posting


In [32]:
eli5.show_prediction(clf, twenty_test.data[0], vec=vec,
                     target_names=twenty_test.target_names)

Contribution?,Feature
1.657,Highlighted in text (sum)
-10.368,<BIAS>

Contribution?,Feature
-1.379,<BIAS>
-3.212,Highlighted in text (sum)

Contribution?,Feature
8.786,Highlighted in text (sum)
-4.846,<BIAS>

Contribution?,Feature
-0.264,<BIAS>
-6.885,Highlighted in text (sum)
