## 1. Classifier visualization

In [None]:
#install the necessary libraries 
!pip install shap
!pip install transformers
!pip install datasets

We can start with an easier classifier using `pipeline`

In [None]:
import transformers
import torch
import datasets
import numpy as np
import scipy as sp
import pandas as pd
import shap 
import sklearn

# load a transformers pipeline model
model = transformers.pipeline('sentiment-analysis', return_all_scores=True)

# explain the model on two sample inputs
explainer = shap.Explainer(model)

In [None]:

#explainer = shap.Explainer(model) 
shap_values = explainer(["I love Burundian coffee! let's #Visit Burundi."])

# visualize the first prediction's explanation for the POSITIVE output class
shap.plots.text(shap_values[0, :, "POSITIVE"])

  0%|          | 0/210 [00:00<?, ?it/s]

Partition explainer: 2it [00:11, 11.24s/it]               


It's abvious that "love" has the highest score for this positive sentiment in this example

### Custom tokenizer and prediction function

Load a pretrained sentiment classifier model from the transformers library as well as the data from IMDB review dataset. This is a large dataset, so we need to subsample to only 30 sequences and use a GPU

In [None]:
import transformers
import torch
import datasets
import numpy as np
import scipy as sp
import pandas as pd
import shap 
import sklearn

# load a BERT sentiment analysis model
tokenizer = transformers.DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")
model = transformers.DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english"
).cuda()

# define a prediction function
def f(x):
    tv = torch.tensor([tokenizer.encode(v, padding='max_length', max_length=500, truncation=True) for v in x]).cuda()
    outputs = model(tv)[0].detach().cpu().numpy()
    scores = (np.exp(outputs).T / np.exp(outputs).sum(-1)).T
    val = sp.special.logit(scores[:,1]) # use one vs rest logit units
    return val

# build an explainer using a token masker
explainer = shap.Explainer(f, tokenizer)

# explain the model's predictions on IMDB reviews
imdb_train = datasets.load_dataset("imdb")["train"]
shap_values = explainer(imdb_train[:30], fixed_context=1, batch_size=2)

From a random review sample, we will visualize its sentiment, as predicted by the model and which part of the sequence participated the most in the prediction probabilities.

In [None]:
# plot a sentence's explanation
shap.plots.text(shap_values[27])

In [None]:
shap.plots.text(shap_values[2])

As you can see, by hovering over the text, shap gives contribution of each word to the final results. It can be a negative contribution that pushes the prediction to the "negative" sentiment or a positive push that give more weight to the "positive"sentiment for the prediction.

## 2. Translation

The translation provide a more intuitive and a good match for our understanding of translation

In [None]:
!pip install sentencepiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import numpy as np
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import shap
import torch

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-fr").cuda()

s=["In my family, we are six: my father, my mother, my elder sister, my younger brother and sister"]

explainer = shap.Explainer(model,tokenizer)

shap_values = explainer(s)

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/784k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.28M [00:00<?, ?B/s]

Recommended: pip install sacremoses.


Downloading:   0%|          | 0.00/287M [00:00<?, ?B/s]

  0%|          | 0/498 [00:00<?, ?it/s]

Partition explainer: 2it [00:52, 52.23s/it]               


In [None]:
shap.plots.text(shap_values)