# Integrated gradients for transformers models

In this example, we apply the integrated gradients method to two different sentiment analysis models. The first model is a pretrained sentiment analysis model from the  [transformers](https://github.com/huggingface/transformers) library. The second model is a combination of a pretrained BERT model and a simple feed forward network. The feed forward network is trained on the IMDB dataset using the BERT output embeddings as features. 

In text classification models, integrated gradients define an attribution value for each word in the input sentence. The attributions are calculated considering the integral of the model  gradients with respect to the word embedding layer along a straight path from a baseline instance $x^\prime$ to the input instance $x.$ A description of the method can be found [here](https://docs.seldon.io/projects/alibi/en/latest/methods/IntegratedGradients.html). Integrated gradients was originally proposed in Sundararajan et al., ["Axiomatic Attribution for Deep Networks"](https://arxiv.org/abs/1703.01365)

In [2]:
import re
import os
import tensorflow as tf
import numpy as np
import torch
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
from transformers import BertTokenizerFast, TFBertModel, BertConfig
from alibi.explainers import IntegratedGradients
from tensorflow.keras.datasets import imdb
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.regularizers import l1, l2, l1_l2
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.initializers import RandomUniform

Here we define some functions needed to process the data. For consistency with other [text examples](https://github.com/SeldonIO/alibi/blob/master/examples/integrated_gradients_imdb.ipynb) in alibi, we will use the IMDB dataset provided by keras. Since the dataset consists of reviews that are already tokenized, we need to decode each sentence and re-convert them into tokens using the BERT tokenizer.

In [3]:
def decode_sentence(x, reverse_index):
    """Decodes the tokenized sentences from keras IMDB dataset into plain text.
    """
    # the `-3` offset is due to the special tokens used by keras
    # see https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset
    return " ".join([reverse_index.get(i - 3, 'UNK') for i in x])

def preprocess_reviews(reviews):
    """Preprocess the text.
    """
    REPLACE_NO_SPACE = re.compile("[.;:,!\'?\"()\[\]]")
    REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")
    
    reviews = [REPLACE_NO_SPACE.sub("", line.lower()) for line in reviews]
    reviews = [REPLACE_WITH_SPACE.sub(" ", line) for line in reviews]
    
    return reviews

def process_sentences(sentence, 
                      tokenizer, 
                      max_len):
    """Tokenize the text sentences.
    """
    z = tokenizer(sentence, 
                  add_special_tokens = False, 
                  padding = 'max_length', 
                  max_length = max_len, 
                  truncation = True,
                  return_token_type_ids=True, 
                  return_attention_mask = True,  
                  return_tensors = 'np')
    return z

# Automodel

In this section, we will use the tensorflow auto model for sequence classification provided by the [transformers](https://github.com/huggingface/transformers) library. 

The model is pre-trained on the [Stanford Sentiment Treebank (SST)](https://huggingface.co/datasets/sst) dataset. The Stanford Sentiment Treebank is the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language.

Each phrase is labelled as either negative, somewhat negative, neutral, somewhat positive or positive. The corpus with all 5 labels is referred to as SST-5 or SST fine-grained. Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary.  In this example, we will use a text classifier pre-trained on the SST-2 dataset.

In [4]:
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
auto_model_bert = TFAutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


The automodel output is a custom object containing the output logits. We use a wrapper to transform the output into a tensor and apply a softmax function to the logits.

In [5]:
class AutoModelWrapper(tf.keras.Model):

    def __init__(self, model_bert, **kwargs):
        super().__init__()
        self.model_bert = model_bert

    def call(self, inputs, attention_mask=None):
        out = self.model_bert(inputs, 
                              attention_mask=attention_mask)
        return tf.nn.softmax(out.logits)
    
    def get_config(self):
        return {}

    @classmethod
    def from_config(cls, config):
        return cls(**config)

In [6]:
auto_model = AutoModelWrapper(auto_model_bert)

# Calculate integrated gradients

In [7]:
max_features = 10000
max_len = 100

Here we consider some simple sentences such as "I love you, I like you", "I love you, I like you, but I also kind of dislike you" .

In [8]:
z_test_sample = ['I love you, I like you',
                'I love you, I like you, but I also kind of dislike you']
z_test_sample = preprocess_reviews(z_test_sample)
z_test_sample = process_sentences(z_test_sample, 
                                   tokenizer, 
                                   max_len)
x_test_sample = z_test_sample['input_ids']
kwargs = {k:v for k,v in z_test_sample.items() if k == 'attention_mask'}

The auto model consists of a main BERT layer (layer 0) followed by two dense layers. 
We extract the first transformer's block in the main BERT layer.

In [9]:
auto_model.layers[0].layers

[<transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertMainLayer at 0x7f1b51de3f50>,
 <tensorflow.python.keras.layers.core.Dense at 0x7f1b5003f350>,
 <tensorflow.python.keras.layers.core.Dense at 0x7f1b5003f710>,
 <tensorflow.python.keras.layers.core.Dropout at 0x7f1b5003f950>]

In [10]:
#  Extracting the first transformer block
bl = auto_model.layers[0].layers[0].transformer.layer[1]

In [11]:
n_steps = 5
method = "gausslegendre"
internal_batch_size = 5
ig  = IntegratedGradients(auto_model,
                          layer=bl,
                          n_steps=n_steps, 
                          method=method,
                          internal_batch_size=internal_batch_size)

In [12]:
predictions = auto_model(x_test_sample, **kwargs).numpy().argmax(axis=1)
explanation = ig.explain(x_test_sample, 
                         forward_kwargs=kwargs,
                         baselines=None, 
                         target=predictions)

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


In [13]:
# Get attributions values from the explanation object
attrs = explanation.attributions[0]
print('Attributions shape:', attrs.shape)

Attributions shape: (2, 100, 768)


In [14]:
attrs = attrs.sum(axis=2)
print('Attributions shape:', attrs.shape)

Attributions shape: (2, 100)


In [15]:
i = 1
x_i = x_test_sample[i]
attrs_i = attrs[i]
pred = predictions[i]
pred_dict = {1: 'Positive review', 0: 'Negative review'}

In [16]:
from IPython.display import HTML
def  hlstr(string, color='white'):
    """
    Return HTML markup highlighting text with the desired color.
    """
    return f"<mark style=background-color:{color}>{string} </mark>"

In [17]:
def colorize(attrs, cmap='PiYG'):
    """
    Compute hex colors based on the attributions for a single instance.
    Uses a diverging colorscale by default and normalizes and scales
    the colormap so that colors are consistent with the attributions.
    """
    import matplotlib as mpl
    cmap_bound = np.abs(attrs).max()
    norm = mpl.colors.Normalize(vmin=-cmap_bound, vmax=cmap_bound)
    cmap = mpl.cm.get_cmap(cmap)
    
    # now compute hex values of colors
    colors = list(map(lambda x: mpl.colors.rgb2hex(cmap(norm(x))), attrs))
    return colors

In [18]:
words = tokenizer.decode(x_i).split()
colors = colorize(attrs_i)

In [19]:
print('Predicted label =  {}: {}'.format(pred, pred_dict[pred]))

Predicted label =  1: Positive review


In [20]:
HTML("".join(list(map(hlstr, words, colors))))

# Sentiment analysis on IMDB with fine-tuned model head.

We consider a text classifier fine-tuned on the IMDB dataset. We train a feed forward network which uses the pooled output embedding of a pretrained BERT model as input features. The BERT model and the trained ffn are combined to obtain an end-to-end text classifier.

In [469]:
def get_embeddings(X_train, model, batch_size=50):

    args = X_train['input_ids']
    kwargs = {k:v for k, v in  X_train.items() if k != 'input_ids'}
    
    dataset = tf.data.Dataset.from_tensor_slices((args, kwargs)).batch(batch_size)
    dataset = dataset.as_numpy_iterator()
    
    embbedings = []
    for X_batch in dataset:
        args_b, kwargs_b = X_batch
        batch_embeddings = model(args_b, **kwargs_b)
        embbedings.append(batch_embeddings.pooler_output.numpy())

    return np.concatenate(embbedings, axis=0)

## Load and process data

Loading the IMDB dataset. 

In [470]:
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
test_labels = y_test.copy()
train_labels = y_train.copy()
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

index = imdb.get_word_index()
reverse_index = {value: key for (key, value) in index.items()} 

Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
x_train shape: (25000, 100)
x_test shape: (25000, 100)


## Extract embeddings for training

In order to speed up the training, the BERT embeddings are pre-extracted and used as features by the feed forward network.

In [471]:
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
config = BertConfig.from_pretrained("bert-base-uncased")
modelBert = TFBertModel.from_pretrained("bert-base-uncased", config=config)

modelBert.trainable=False

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [472]:
batch_size_emb = 50

Decoding each sentence in the keras IMDB tokenized dataset to obtain the corresponding plain text.

In [473]:
X_train, X_test = [], []
for i in range(len(x_train)):
    tr_sentence = decode_sentence(x_train[i], reverse_index)
    X_train.append(tr_sentence)
    te_sentence = decode_sentence(x_test[i], reverse_index)
    X_test.append(te_sentence)

Re-tokenizing the plain text using the BERT tokenizer.

In [474]:
X_train = preprocess_reviews(X_train)
X_train = process_sentences(X_train, tokenizer, max_len)
X_test = preprocess_reviews(X_test)
X_test = process_sentences(X_test, tokenizer, max_len)

Extracting the BERT embeddings.

In [475]:
train_embbedings = get_embeddings(X_train, 
                                  modelBert, 
                                  batch_size=batch_size_emb)
test_embbedings = get_embeddings(X_test,
                                 modelBert, 
                                 batch_size=batch_size_emb)

## Train model

Here we train the model head using the BERT pooled output embeddings as features. The pooled output embeddings are vectors of dimension 768 and each of them represents a full review. The model head consists of one dense layer 768 hidden units followed by a two units layer with softmax activation. 

In [477]:
dropout = 0.1
hidden_dims = 768

In [478]:
class ModelOut(tf.keras.Model):

    def __init__(self, 
                 dropout=0.2, 
                 hidden_dims=256):
        super().__init__()
        
        self.dropout = dropout
        self.hidden_dims = hidden_dims
        
        self.dense_1 =  tf.keras.layers.Dense(hidden_dims, 
                                              activation='relu')
        self.dropoutl = tf.keras.layers.Dropout(dropout)
        self.dense_2 = tf.keras.layers.Dense(2, 
                                             activation='softmax')

    def call(self, inputs):
        x = self.dense_1(inputs)
        x = self.dropoutl(x)
        x = self.dense_2(x)
        return x
    
    def get_config(self):
        return {"dropout": self.dropout, 
                "hidden_dims": self.hidden_dims}

    @classmethod
    def from_config(cls, config):
        return cls(**config)

In [479]:
model_out = ModelOut(dropout=dropout,
                     hidden_dims=hidden_dims)

Training the model. If the model has been already trained, it can be loaded from the checkpoint directory setting `load_model=True`.

In [481]:
load_model = False
batch_size = 128
epochs = 30

In [482]:
filepath = './model_transformers/'  # change to desired save directory

model_out.compile(optimizer=Adam(1e-4), 
                  loss='categorical_crossentropy', 
                  metrics=['accuracy'])

if not load_model:
    
    checkpoint_path = os.path.join(filepath, "training/cp-{epoch:04d}.ckpt")
    checkpoint_dir = os.path.dirname(checkpoint_path)

    # Create a callback that saves the model's weights every epoch
    cp_callback = tf.keras.callbacks.ModelCheckpoint(
        filepath=checkpoint_path, 
        verbose=1, 
        save_weights_only=True,
        save_freq='epoch')

    model_out.fit(train_embbedings, y_train, 
                  validation_data=(test_embbedings, y_test),
                  epochs=epochs, 
                  batch_size=batch_size,
                  callbacks=[cp_callback],
                  verbose=1)
else:
    epoch = 3
    load_path = os.path.join(filepath, f"training/cp-{epoch:04d}.ckpt")
    model_out.load_weights(load_path)

Epoch 1/30
Epoch 00001: saving model to ./model_transformers/training/cp-0001.ckpt
Epoch 2/30
Epoch 00002: saving model to ./model_transformers/training/cp-0002.ckpt
Epoch 3/30
Epoch 00003: saving model to ./model_transformers/training/cp-0003.ckpt
Epoch 4/30
Epoch 00004: saving model to ./model_transformers/training/cp-0004.ckpt
Epoch 5/30
Epoch 00005: saving model to ./model_transformers/training/cp-0005.ckpt
Epoch 6/30
Epoch 00006: saving model to ./model_transformers/training/cp-0006.ckpt
Epoch 7/30
Epoch 00007: saving model to ./model_transformers/training/cp-0007.ckpt
Epoch 8/30
Epoch 00008: saving model to ./model_transformers/training/cp-0008.ckpt
Epoch 9/30
Epoch 00009: saving model to ./model_transformers/training/cp-0009.ckpt
Epoch 10/30
Epoch 00010: saving model to ./model_transformers/training/cp-0010.ckpt
Epoch 11/30
Epoch 00011: saving model to ./model_transformers/training/cp-0011.ckpt
Epoch 12/30
Epoch 00012: saving model to ./model_transformers/training/cp-0012.ckpt
E

Epoch 29/30
Epoch 00029: saving model to ./model_transformers/training/cp-0029.ckpt
Epoch 30/30
Epoch 00030: saving model to ./model_transformers/training/cp-0030.ckpt


## Combine BERT and feed forward network

Here we combine the BERT model with the model head to obtain a end-to-end text classifier. 

In [483]:
class TextClassifier(tf.keras.Model):

    def __init__(self, model_bert, model_out, **kwargs):
        super().__init__()
        self.model_bert = model_bert
        self.model_out = model_out

    def call(self, inputs, attention_mask=None):
        out = self.model_bert(inputs, attention_mask=attention_mask)
        out = self.model_out(out.pooler_output)
        return out
    
    def get_config(self):
        return {}

    @classmethod
    def from_config(cls, config):
        return cls(**config)

In [484]:
text_classifier = TextClassifier(modelBert, model_out)

# Calculate integrated gradients

We pick the first 10 sentences of the test set as examples.

In [485]:
z_test_sample = [decode_sentence(x_test[i], reverse_index) for i in range(10)]
z_test_sample = preprocess_reviews(z_test_sample)
z_test_sample = process_sentences(z_test_sample, tokenizer, max_len)

x_test_sample = z_test_sample['input_ids']
kwargs = {k:v for k,v in z_test_sample.items() if k == 'attention_mask'}

We calculate the attributions with respect to the first embedding layer of the BERT encoder.

In [486]:
bl = text_classifier.layers[0].bert.encoder.layer[0]

In [487]:
n_steps = 5
method = "gausslegendre"
internal_batch_size = 5
ig  = IntegratedGradients(text_classifier,
                          layer=bl,
                          n_steps=n_steps, 
                          method=method,
                          internal_batch_size=internal_batch_size)

In [488]:
predictions = text_classifier(x_test_sample, **kwargs).numpy().argmax(axis=1)
explanation = ig.explain(x_test_sample, 
                         forward_kwargs=kwargs,
                         baselines=None, 
                         target=predictions)



In [489]:
# Get attributions values from the explanation object
attrs = explanation.attributions[0]
print('Attributions shape:', attrs.shape)

Attributions shape: (10, 100, 768)


In [490]:
attrs = attrs.sum(axis=2)
print('Attributions shape:', attrs.shape)

Attributions shape: (10, 100)


In [491]:
i = 1
x_i = x_test_sample[i]
attrs_i = attrs[i]
pred = predictions[i]
pred_dict = {1: 'Positive review', 0: 'Negative review'}

In [492]:
from IPython.display import HTML
def  hlstr(string, color='white'):
    """
    Return HTML markup highlighting text with the desired color.
    """
    return f"<mark style=background-color:{color}>{string} </mark>"

In [493]:
def colorize(attrs, cmap='PiYG'):
    """
    Compute hex colors based on the attributions for a single instance.
    Uses a diverging colorscale by default and normalizes and scales
    the colormap so that colors are consistent with the attributions.
    """
    import matplotlib as mpl
    cmap_bound = np.abs(attrs).max()
    norm = mpl.colors.Normalize(vmin=-cmap_bound, vmax=cmap_bound)
    cmap = mpl.cm.get_cmap(cmap)
    
    # now compute hex values of colors
    colors = list(map(lambda x: mpl.colors.rgb2hex(cmap(norm(x))), attrs))
    return colors

In [494]:
words = tokenizer.decode(x_i).split()
colors = colorize(attrs_i)

In [495]:
print('Predicted label =  {}: {}'.format(pred, pred_dict[pred]))

Predicted label =  1: Positive review


In [496]:
HTML("".join(list(map(hlstr, words, colors))))