In [2]:
from transformers.pipelines import pipeline
from alibi.explainers import AnchorText
import spacy
from alibi.utils import DistilbertBaseUncased
import numpy as np

In [3]:
pp = pipeline(
        "text-classification",
        device=-1,
    )

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [4]:
pp(['hello world'])

[{'label': 'POSITIVE', 'score': 0.9997522234916687}]

In [4]:
import spacy
from alibi.utils import spacy_model

model = 'en_core_web_md'
spacy_model(model=model)
nlp = spacy.load(model)

In [23]:
import spacy
#loading the english language small model of spacy
en = spacy.load('en_core_web_md')
stopwords = list(en.Defaults.stop_words)

In [5]:
def predict_fn(x):
    r = pp(x)
    res = []
    for j in r:
        if j["label"] == "POSITIVE":
            res.append(1)
        else:
            res.append(0)
    return np.array(res)

In [13]:
predict_fn(["cambridge is great","Oxford is awlful"])

array([1, 0])

In [25]:
language_model = DistilbertBaseUncased()
explainer = AnchorText(
    predictor=predict_fn,
    sampling_strategy="language_model",   # use language model to predict the masked words
    language_model=language_model,        # language model to be used
    filling="parallel",                   # just one pass through the transformer
    sample_proba=0.5,                     # probability of masking a word
    frac_mask_templates=0.1,              # fraction of masking templates (smaller value -> faster, less diverse)
    use_proba=True,                       # use words distribution when sampling (if False sample uniform)
    top_n=20,                             # consider the fist 20 most likely words
    temperature=1.0,                      # higher temperature implies more randomness when sampling
    stopwords=stopwords,  # those words will not be sampled
    batch_size_lm=32,                     # language model maximum batch size
)

Some layers from the model checkpoint at distilbert-base-uncased were not used when initializing TFDistilBertForMaskedLM: ['activation_13']
- This IS expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertForMaskedLM were initialized from the model checkpoint at distilbert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForMaskedLM for predictions without further training.


In [28]:
explanation = explainer.explain("a visually exquisite but narratively opaque and emotionally vapid experience of style and mystification", threshold=0.95)

In [29]:
explanation

Explanation(meta={
  'name': 'AnchorText',
  'type': ['blackbox'],
  'explanations': ['local'],
  'params': {
              'seed': 0,
              'filling': 'parallel',
              'sample_proba': 0.5,
              'top_n': 20,
              'temperature': 1.0,
              'use_proba': True,
              'frac_mask_templates': 0.1,
              'batch_size_lm': 32,
              'punctuation': '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~',
              'stopwords': ['made', '’s', 'onto', 'him', 'too', 'sixty', 'my', 'herein', 'had', 'thereby', 'your', 'front', 'to', 'and', 'somewhere', 'thereupon', 'regarding', 'latter', 'along', 'what', 'those', 'between', 'somehow', 'for', 'first', 'mostly', 'various', 'indeed', 'do', 'enough', "n't", 'up', "'m", 'back', 'others', 'whom', 'almost', 'further', '’d', 'it', 'of', 'whither', 'thereafter', 'make', '‘d', 'did', 'twenty', "'ve", 'cannot', 'go', 'her', 'same', 'neither', 'when', 'doing', 'noone', 'well', 'while', 'off', 'everyone', 'perhap

In [30]:
from alibi.saving import save_explainer
save_explainer(explainer,"./explainer/data")

In [6]:
from alibi.saving import load_explainer
load_explainer(path="./explainer/data", predictor=predict_fn)

2022-10-20 12:55:49.880324: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
All model checkpoint layers were used when initializing TFDistilBertForMaskedLM.

All the layers of TFDistilBertForMaskedLM were initialized from the model checkpoint at explainer/data/language_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForMaskedLM for predictions without further training.


AnchorText(meta={
  'name': 'AnchorText',
  'type': ['blackbox'],
  'explanations': ['local'],
  'params': {
              'seed': 0,
              'filling': 'parallel',
              'sample_proba': 0.5,
              'top_n': 20,
              'temperature': 1.0,
              'use_proba': True,
              'frac_mask_templates': 0.1,
              'batch_size_lm': 32,
              'punctuation': '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~',
              'stopwords': ['made', '’s', 'onto', 'him', 'too', 'sixty', 'my', 'herein', 'had', 'thereby', 'your', 'front', 'to', 'and', 'somewhere', 'thereupon', 'regarding', 'latter', 'along', 'what', 'those', 'between', 'somehow', 'for', 'first', 'mostly', 'various', 'indeed', 'do', 'enough', "n't", 'up', "'m", 'back', 'others', 'whom', 'almost', 'further', '’d', 'it', 'of', 'whither', 'thereafter', 'make', '‘d', 'did', 'twenty', "'ve", 'cannot', 'go', 'her', 'same', 'neither', 'when', 'doing', 'noone', 'well', 'while', 'off', 'everyone', 'perhaps