Credit: This notebook is based on the contents of  https://github.com/TeamHG-Memex/eli5/blob/master/notebooks/TextExplainer.ipynb

# Debugging a black-box text classifier

Let’s look at the 20newsgroups dataset. This is a dataset that contains some discussions about news articles.

In [None]:
from sklearn.datasets import fetch_20newsgroups
categories = ['alt.atheism', 
              'soc.religion.christian', 
              'comp.graphics', 
              'sci.med']
twenty_train = fetch_20newsgroups(subset='train', 
                                  categories=categories, 
                                  shuffle=True,
                                  random_state=42, 
                                  remove=('headers', 'footers'))
twenty_test = fetch_20newsgroups(subset='test', 
                                 categories=categories, 
                                 shuffle=True,
                                 random_state=42, 
                                 remove=('headers', 'footers'))

In [None]:
i = 125
print("Class: {}".format(twenty_train.target_names[twenty_train.target[i]]))
print("-"*20); print()
sample = twenty_train.data[i]; print(sample)


In [None]:
import numpy as np
from scipy.spatial import distance
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import TruncatedSVD
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

As a black-box classifier we use a kernel svm with LSA features. This is clearly non-linear and hard to interpret. 

In [None]:
# LSA features
vec = TfidfVectorizer(min_df=3, stop_words='english', ngram_range=(1, 2))
svd = TruncatedSVD(n_components=100, n_iter=7, random_state=42)
lsa = make_pipeline(vec, svd)

# SVM with rbf-kernel
clf = SVC(C=150, gamma=2e-2, probability=True, kernel="rbf")


clf = SVC(C=150, gamma=2e-2, probability=True)
pipe = make_pipeline(lsa, clf)
pipe.fit(twenty_train.data, twenty_train.target)
pipe.score(twenty_test.data, twenty_test.target)

The dimension of the input documents is reduced to 100, and then a kernel SVM is used to classify the documents.

This is what the pipeline returns for a document - it is pretty sure the first message in test data belongs to sci.med:

In [None]:
def print_prediction(doc):
    y_pred = pipe.predict_proba([doc])[0]
    for target, prob in zip(twenty_train.target_names, y_pred):
        print("{:.3f} {}".format(prob, target))    

doc = twenty_test.data[0]
print_prediction(doc)


The algorithm **cannot provide a good explanation for a black-box classifier which works on character level or uses features that are not directly related to tokens**, depending on the interpretable representation choosen. 

But one can use `eli5.lime.TextExplainer` to debug the prediction - to check what was important in the document to make this decision.

Create a `TextExplainer` instance, then pass the document to explain and a black-box classifier (a function which returns probabilities) to the TextExplainer.fit method, then check the explanation:

In [None]:
!pip install eli5

In [None]:
import eli5
from eli5.lime import TextExplainer

te = TextExplainer(random_state=42)
te.fit(doc, pipe.predict_proba)
te.show_prediction(target_names=twenty_train.target_names)

Explanation makes sense - we expect reasonable classifier to take highlighted words in account. But how can we be sure this is how the pipeline works, not just a nice-looking lie? A simple sanity check is to remove or change the highlighted words, to confirm that they change the outcome:

In [None]:
import re
doc2 = re.sub(r'(recall|kidney|stones|medication|pain|tech)', '', doc, flags=re.I)
print_prediction(doc2)

Predicted probabilities changed a lot indeed.

And in fact, `TextExplainer` did something similar to get the explanation. `TextExplainer` generated a lot of texts similar to the document (by removing some of the words), and then trained a white-box classifier which predicts the output of the black-box classifier (not the true labels!). The explanation we saw is for this white-box classifier.

This approach follows the LIME algorithm; for text data the algorithm is actually pretty straightforward:

1. generate distorted versions of the text;
2. predict probabilities for these distorted texts using the black-box classifier;
3. train another classifier (one of those eli5 supports) which tries to predict output of a black-box classifier on these texts.

The algorithm works because even though it could be hard or impossible to approximate a black-box classifier globally (for every possible text), approximating it in a small neighbourhood near a given text often works well, even with simple white-box classifiers.

