# Anchor explanations for movie sentiment

In this example, we will explain why a certain sentence is classified by a logistic regression as having negative or positive sentiment. The logistic regression is trained on negative and positive movie reviews.

In [1]:
import os
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import spacy
import string
from alibi.explainers import AnchorText
from alibi.datasets import fetch_movie_sentiment
from alibi.utils.download import spacy_model
from alibi.utils.lang_model import DistilbertBaseUncased, BertBaseUncased, RobertaBase

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

%load_ext autoreload
%autoreload 2



### Load movie review dataset

The `fetch_movie_sentiment` function returns a `Bunch` object containing the features, the targets and the target names for the dataset.

In [2]:
movies = fetch_movie_sentiment()
movies.keys()

dict_keys(['data', 'target', 'target_names'])

In [3]:
data = movies.data
labels = movies.target
target_names = movies.target_names

Define shuffled training, validation and test set

In [4]:
train, test, train_labels, test_labels = train_test_split(data, labels, test_size=.2, random_state=42)
train, val, train_labels, val_labels = train_test_split(train, train_labels, test_size=.1, random_state=42)
train_labels = np.array(train_labels)
test_labels = np.array(test_labels)
val_labels = np.array(val_labels)

### Apply CountVectorizer to training set

In [5]:
vectorizer = CountVectorizer(min_df=1)
vectorizer.fit(train)

CountVectorizer()

### Fit model

In [6]:
np.random.seed(0)
clf = LogisticRegression(solver='liblinear')
clf.fit(vectorizer.transform(train), train_labels)

LogisticRegression(solver='liblinear')

### Define prediction function

In [7]:
predict_fn = lambda x: clf.predict(vectorizer.transform(x))

### Make predictions on train and test sets

In [8]:
preds_train = predict_fn(train)
preds_val = predict_fn(val)
preds_test = predict_fn(test)
print('Train accuracy: %.3f' % accuracy_score(train_labels, preds_train))
print('Validation accuracy: %.3f' % accuracy_score(val_labels, preds_val))
print('Test accuracy: %.3f' % accuracy_score(test_labels, preds_test))

Train accuracy: 0.980
Validation accuracy: 0.754
Test accuracy: 0.759


### Load spaCy model

English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.

In [9]:
model = 'en_core_web_md'
spacy_model(model=model)
nlp = spacy.load(model)

### Instance to be explained

In [10]:
class_names = movies.target_names

# select instance to be explained
text = data[4]
print("* Text: %s" % text)

# compute class prediction
pred = class_names[predict_fn([text])[0]]
alternative =  class_names[1 - predict_fn([text])[0]]
print("* Prediction: %s" % pred)

* Text: a visually flashy but narratively opaque and emotionally vapid exercise in style and mystification .
* Prediction: negative


### Initialize anchor text explainer with `unknown` sampling

* `sampling='unkonw'` means we will perturb examples by replacing words with UNKs. Let us now take a look at the anchor. The word 'exercise' basically guarantees a negative prediction.

In [11]:
explainer = AnchorText(
    predictor=predict_fn, 
    sampling_method='unknown',
    nlp=nlp,
)

### Explanation

In [12]:
explanation = explainer.explain(text, threshold=0.95)

In [13]:
print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_false']]))

Anchor: flashy
Precision: 0.99

Examples where anchor applies and model predicts negative:
a UNK flashy UNK UNK opaque and emotionally vapid exercise in style UNK mystification .
a UNK flashy UNK UNK UNK and emotionally UNK exercise UNK UNK and UNK UNK
a UNK flashy UNK narratively opaque UNK UNK UNK exercise in style and UNK UNK
UNK visually flashy UNK narratively UNK and emotionally UNK UNK UNK UNK UNK mystification .
UNK UNK flashy UNK UNK opaque and emotionally UNK UNK in UNK and UNK .
a visually flashy but UNK UNK and UNK UNK UNK in style UNK mystification .
a visually flashy but UNK opaque UNK emotionally vapid UNK in UNK and mystification .
a UNK flashy but narratively UNK UNK emotionally vapid exercise in style UNK mystification UNK
a UNK flashy but narratively opaque UNK emotionally vapid exercise in style and mystification .
a visually flashy UNK UNK opaque UNK UNK UNK exercise in UNK UNK UNK .

Examples where anchor applies and model predicts positive:
UNK UNK flashy but narr

### Initialize anchor text explainer with word `similarity` sampling

Let's try this with another perturbation distribution, namely one that replaces words by similar words instead of UNKs.

In [None]:
explainer = AnchorText(
    predictor=predict_fn, 
    sampling_method='similarity',     # replace masked words by simialar words
    nlp=nlp,                          # spacy object
    sample_proba=0.5,                 # probability of a word to be masked and replace by as similar word
)

In [None]:
explanation = explainer.explain(text, threshold=0.95)

The anchor now shows that we need more to guarantee the negative prediction:

In [None]:
print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_false']]))

We can make the token perturbation distribution sample words that are more similar to the ground truth word via the `top_n` argument. Smaller values (default=100) should result in sentences that are more coherent and thus more in the distribution of natural language which could influence the returned anchor. By setting the `use_proba` to True, the sampling distribution for perturbed tokens is proportional to the similarity score between the possible perturbations and the original word. We can also put more weight on similar words via the `temperature` argument. Lower values of `temperature` increase the sampling weight of more similar words. The following example will perturb tokens in the original sentence with probability equal to `sample_proba`. The sampling distribution for the perturbed tokens is proportional to the similarity score between the ground truth word and each of the `top_n` words.

In [None]:
explainer = AnchorText(
    predictor=predict_fn, 
    sampling_method='similarity',   # replace masked words by simialar words
    nlp=nlp,                        # spacy object
    use_proba=True,                 # sample according to the similiary distribution
    sample_proba=0.5,               # probability of a word to be masked and replace by as similar word
    top_n=20,                       # consider only top 20 words most similar words
    temperature=0.2                 # higher temperature implies more randomness when sampling
)

In [None]:
explanation = explainer.explain(text, threshold=0.95)

In [None]:
print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_false']]))

### Initialize language model

In [14]:
# initialize model (any of the following 3 is supported)
# language_model = RobertaBase()
# language_model = BertBaseUncased()
language_model = DistilbertBaseUncased()

Some layers from the model checkpoint at distilbert-base-uncased were not used when initializing TFDistilBertForMaskedLM: ['activation_13']
- This IS expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertForMaskedLM were initialized from the model checkpoint at distilbert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForMaskedLM for predictions without further training.


### Initialize anchor text explainer with `language_model` sampling (`parallel` filling method )

* `sampling_method='language_model'` means that the words will be sampled according to the output distribution predicted by the language model

* `filling_method='parallel'` means the only one forward pass is performed. The words are the sampled independently of one another.

In [28]:
# initialize explainer
explainer = AnchorText(
    predictor=predict_fn,
    sampling_method="language_model",    # use language model to predict the masked words
    language_model=language_model,       # language model to be used
    filling_method="parallel",           # just one pass through the transformer
    sample_proba=0.5,                    #  probability of masking a word
    frac_mask_templates=0.1,             # fraction of masking templates (smaller value -> faster, less diverse)
    use_proba=True,                      # use words distribution when sampling (if false sample uniform)
    top_n=50,                            # consider the fist 50 most likely words
    temperature=1.0,                     # higher temperature implies more randomness when sampling
    stopwords=['and', 'a', 'but', 'in'], # those words will not be sampled
    batch_size_lm=128,
)

In [29]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, coverage_samples=1000)

elapesed 0.008645057678222656
elapesed 0.00844430923461914
elapesed 0.008248567581176758
elapesed 0.008549928665161133
elapesed 0.008318662643432617
elapesed 0.00867152214050293
elapesed 0.0084991455078125
elapesed 0.008293867111206055
elapesed 0.008391857147216797
elapesed 0.008346080780029297
elapesed 0.008117198944091797
elapesed 0.008720159530639648
elapesed 0.008159637451171875
elapesed 0.008469104766845703
elapesed 0.008139610290527344
elapesed 0.008698463439941406
elapesed 0.008040189743041992
elapesed 0.008133411407470703
elapesed 0.008237361907958984
elapesed 0.008457660675048828
elapesed 0.008272647857666016
elapesed 0.00999307632446289
elapesed 0.013153076171875
elapesed 0.010474205017089844
elapesed 0.009556770324707031
elapesed 0.008692502975463867
elapesed 0.008300542831420898
elapesed 0.008494853973388672
elapesed 0.00845193862915039
elapesed 0.008276939392089844
elapesed 0.00820469856262207
elapesed 0.008226156234741211
elapesed 0.008400917053222656
elapesed 0.008347511

elapesed 0.009472131729125977
elapesed 0.009256124496459961
elapesed 0.008459806442260742
elapesed 0.00862264633178711
elapesed 0.008530378341674805
elapesed 0.009250164031982422
elapesed 0.009589672088623047
elapesed 0.00845479965209961
elapesed 0.009060382843017578
elapesed 0.009032487869262695
elapesed 0.00902414321899414
elapesed 0.008371591567993164
elapesed 0.008476018905639648
elapesed 0.009579896926879883
elapesed 0.00850677490234375
elapesed 0.008783102035522461
elapesed 0.009502172470092773
elapesed 0.009295940399169922
elapesed 0.00875997543334961
elapesed 0.009137392044067383
elapesed 0.012760639190673828
elapesed 0.016233444213867188
elapesed 0.011783361434936523
elapesed 0.012385368347167969
elapesed 0.010622024536132812
elapesed 0.008990049362182617
elapesed 0.00890660285949707
elapesed 0.008586406707763672
elapesed 0.008546829223632812
elapesed 0.008530855178833008
elapesed 0.009577035903930664
elapesed 0.008770465850830078
elapesed 0.008553743362426758
elapesed 0.00821

In [17]:
print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_false']]))

Anchor: vapid AND flashy AND emotionally AND exercise
Precision: 0.98

Examples where anchor applies and model predicts negative:
a less flashy but emotionally determined and emotionally vapid exercise in swimming and gymnastics.
a fairly flashy but often emotional and emotionally vapid exercise in solitude and meditation.
a surprisingly flashy but visually physical and emotionally vapid exercise in biology and conditioning.
a much flashy but highly fast and emotionally vapid exercise in comedy and discipline.
a surprisingly flashy but surprisingly lively and emotionally vapid exercise in golf and discipline.
a decidedly flashy but often emotionally and emotionally vapid exercise in dungeons and yoga.
a naturally flashy but sometimes competitive and emotionally vapid exercise in science and safety.
a typical flashy but emotionally dramatic and emotionally vapid exercise in cooking and dreams.
a rather flashy but socially shy and emotionally vapid exercise in mind and endurance.
a fairl

### Initialize anchor text explainer with `language_model` sampling (`autoregressive` filling method )

* `filling_method='autoregressive'` means that the words are sampled one at the time (autoregressive). Thus, following words to be predicted will be conditioned one the previously generated words.
* `frac_mask_templates=1` in this mode (overwriting it with any other value will not be considered).
* **This procedure is computationally expensive**.

In [None]:
# initialize explainer
explainer = AnchorText(
    predictor=predict_fn,
    sampling_method="language_model",   # use language model to predict the masked words
    language_model=language_model,      # language model to be used
    filling_method="autoregressive",    # just one pass through the transformer
    sample_proba=0.5,                   # probability of masking a word
    use_proba=True,                     # use words distribution when sampling (if false sample uniform)
    top_n=50,                           # consider the fist 50 most likely words
    stopwords=['and', 'a', 'but', 'in'] # those words will not be sampled
)

In [None]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, batch_size=10, coverage_samples=100)

print('Anchor: %s' % (' AND '.join(explanation.anchor)))
print('Precision: %.2f' % explanation.precision)
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x for x in explanation.raw['examples'][-1]['covered_false']]))