# Anchor explanations for movie sentiment

In this example, we will explain why a certain sentence is classified by a transformer model as having negative or positive sentiment. The model is trained on negative and positive movie reviews.

In [1]:
import numpy as np
import pandas as pd

import spacy
from alibi.explainers import AnchorText
from alibi.datasets import fetch_movie_sentiment
from alibi.utils.download import spacy_model

import torch

from simpletransformers.model import TransformerModel

### Load movie review dataset

The `fetch_movie_sentiment` function returns a `Bunch` object containing the features, the targets and the target names for the dataset.

In [2]:
movies = fetch_movie_sentiment()
movies.keys()

dict_keys(['data', 'target', 'target_names'])

In [3]:
data = movies.data
labels = movies.target
target_names = movies.target_names

### Load spaCy model

English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.

In [4]:
model = 'en_core_web_md'
spacy_model(model=model)
nlp = spacy.load(model)

### Load pre-trained model

In [5]:
model = TransformerModel('roberta', 'roberta-base', args=({'fp16': False}))

In [6]:
model.model.load_state_dict(torch.load('outputs/pytorch_model.bin'))

<All keys matched successfully>

### Initialize anchor text explainer

In [7]:
predict_fn = lambda x: model.predict(x)[1].argmax(axis=1)

In [8]:
explainer = AnchorText(nlp, predict_fn)

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  2.97it/s]


### Explain a prediction

In [9]:
class_names = movies.target_names

In [10]:
text = data[4]
print(text)

a visually flashy but narratively opaque and emotionally vapid exercise in style and mystification .


Prediction:

In [11]:
pred = class_names[predict_fn([text])[0]]
alternative =  class_names[1 - predict_fn([text])[0]]
print('Prediction: %s' % pred)

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  4.83it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.63it/s]

Prediction: negative





Explanation:

In [12]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, use_unk=True)

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.64it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.01it/s]


HBox(children=(IntProgress(value=0), HTML(value='')))

  0%|          | 0/13 [00:00<?, ?it/s]




100%|██████████| 13/13 [00:24<00:00,  1.60s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))

  0%|          | 0/13 [00:00<?, ?it/s]




100%|██████████| 13/13 [00:22<00:00,  1.55s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))

  0%|          | 0/13 [00:00<?, ?it/s]




100%|██████████| 13/13 [00:22<00:00,  1.60s/it]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  4.18it/s]


use_unk=True means we will perturb examples by replacing words with UNKs. Let us now take a look at the anchor. The word 'exercise' basically guarantees a negative prediction.

In [13]:
print('Anchor: %s' % explanation['names'])
print('Precision: %.2f' % explanation['precision'])
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_false']]))

Anchor: ['vapid']
Precision: 1.00

Examples where anchor applies and model predicts negative:
a UNK UNK but UNK opaque and emotionally vapid exercise in UNK UNK UNK .
UNK visually UNK UNK UNK UNK UNK emotionally vapid exercise in style and mystification .
a visually UNK but narratively UNK and emotionally vapid UNK UNK style UNK UNK .
a UNK flashy UNK UNK UNK UNK UNK vapid UNK UNK style and UNK UNK
a visually UNK UNK narratively opaque and UNK vapid UNK UNK style and UNK .
UNK visually flashy UNK narratively opaque UNK UNK vapid UNK UNK UNK and mystification .
a UNK flashy but UNK opaque UNK UNK vapid UNK UNK style UNK mystification .
a visually UNK but UNK UNK and UNK vapid exercise in style and mystification UNK
UNK UNK flashy but UNK UNK and emotionally vapid UNK UNK style and mystification UNK
UNK UNK flashy but narratively opaque UNK emotionally vapid UNK UNK UNK UNK mystification .

Examples where anchor applies and model predicts positive:



### Changing the perturbation distribution
Let's try this with another perturbation distribution, namely one that replaces words by similar words instead of UNKs.

Explanation:

In [14]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, use_unk=False, sample_proba=0.5)

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  4.84it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.85it/s]


HBox(children=(IntProgress(value=0), HTML(value='')))

  0%|          | 0/13 [00:00<?, ?it/s]




100%|██████████| 13/13 [00:22<00:00,  1.50s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))




100%|██████████| 13/13 [01:38<00:00,  4.61s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))




100%|██████████| 13/13 [00:31<00:00,  1.80s/it]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  5.04it/s]


The anchor now shows that we need more to guarantee the negative prediction:

In [15]:
print('Anchor: %s' % explanation['names'])
print('Precision: %.2f' % explanation['precision'])
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_false']]))

Anchor: ['vapid']
Precision: 0.99

Examples where anchor applies and model predicts negative:
each visually flashy but physically opaque and tactically vapid exercise before brevity and mystification .
a technically flashy but narratively opaque and terminally vapid strenght in comfort and denial .
a tremendously flashy but narratively acrylic and stylistically vapid handstand in minimalist and uselessness .
this mechanically flashy but truely opaque and militarily vapid exercise in designer and hysteria .
a brilliantly flashy but narratively opaque and emotionally vapid exercise as style and mystification .
both visually skimp but nonetheless opaque and emotionally vapid trampoline among style and wrongness .
a visually flashy but narratively outer and emotionally vapid learner in choice and mystification .
this visually glamorous but distinctly opaque and emotionally vapid exercise with style and mystification .
a perfectly flashy but spiritually opaque and emotionally vapid exercise

We can make the token perturbation distribution sample words that are more similar to the ground truth word via the `top_n` argument. Smaller values (default=100) should result in sentences that are more coherent and thus more in the distribution of natural language which could influence the returned anchor. By setting the `use_probability_proba` to True, the sampling distribution for perturbed tokens is proportional to the similarity score between the possible perturbations and the original word. We can also put more weight on similar words via the `temperature` argument. Lower values of `temperature` increase the sampling weight of more similar words. The following example will perturb tokens in the original sentence with probability equal to `sample_proba`. The sampling distribution for the perturbed tokens is proportional to the similarity score between the ground truth word and each of the `top_n` words.

In [16]:
np.random.seed(0)
explanation = explainer.explain(text, threshold=0.95, use_similarity_proba=True, sample_proba=0.5,
                                use_unk=False, top_n=20, temperature=.2)

HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  4.08it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.80it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.17it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.52it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.40it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.25it/s]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))




100%|██████████| 1/1 [00:00<00:00,  1.21it/s]


HBox(children=(IntProgress(value=0), HTML(value='')))




100%|██████████| 13/13 [00:30<00:00,  1.66s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))




100%|██████████| 13/13 [00:29<00:00,  1.71s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))




100%|██████████| 13/13 [00:25<00:00,  1.79s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))




100%|██████████| 13/13 [00:24<00:00,  1.77s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))




100%|██████████| 13/13 [00:27<00:00,  1.68s/it]


HBox(children=(IntProgress(value=0), HTML(value='')))




100%|██████████| 13/13 [00:25<00:00,  1.90s/it]


HBox(children=(IntProgress(value=0, max=1), HTML(value='')))

  0%|          | 0/1 [00:00<?, ?it/s]




100%|██████████| 1/1 [00:00<00:00,  3.69it/s]


In [17]:
print('Anchor: %s' % explanation['names'])
print('Precision: %.2f' % explanation['precision'])
print('\nExamples where anchor applies and model predicts %s:' % pred)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_true']]))
print('\nExamples where anchor applies and model predicts %s:' % alternative)
print('\n'.join([x[0] for x in explanation['raw']['examples'][-1]['covered_false']]))

Anchor: ['vapid']
Precision: 1.00

Examples where anchor applies and model predicts negative:
a visually flashy but visually opaque and physically vapid exercise throughout style and paranoia .
another visually gaudy but philosophically opaque and emotionally vapid excercise in style and mystification .
a visually gaudy but narratively opaque and emotionally vapid exercise in retro and mystification .
a graphically snazzy but narratively opaque and spiritually vapid weightloss in style and mystification .
a subtly flashy but narratively transparent and emotionally vapid treadmill in minimalism and mystification .
a visually flashy but graphically translucent and terminally vapid exercise into fashion and mystification .
a visually snazzy but stylistically opaque and emotionally vapid excercise in style and mystification .
a anatomically blocky but narratively opaque and emotionally vapid exercise in practicality and mystification .
a visually flashy but narratively translucent and emot