In [1]:
%set_env CUDA_VISIBLE_DEVICES=2

env: CUDA_VISIBLE_DEVICES=2


# Integrated gradients for text classification on the IMDB dataset using transformers

Dependencies: tensorflow_datasets, transformers

In this example, we apply the integrated gradients method to a transformer model fine tuned for sentiment analysis on the IMDB dataset. In text classification models, integrated gradients define an attribution value for each word in the input sentence. The attributions are calculated considering the integral of the model  gradients with respect to the input word embedding layer of the transformer along a straight path from a baseline instance $x^\prime$ to the input instance $x.$ A description of the method can be found [here](https://docs.seldon.io/projects/alibi/en/latest/methods/IntegratedGradients.html). Integrated gradients was originally proposed in Sundararajan et al., ["Axiomatic Attribution for Deep Networks"](https://arxiv.org/abs/1703.01365)

The IMDB data set contains 50K movie reviews labelled as positive or negative. 
We train a convolutional neural network classifier with a single 1-d convolutional layer followed by a fully connected layer on top of a transformer model. In other words, the output embeddings of the transformer are used as features by the convolutional network. The reviews in the dataset are truncated at 100 words and each word is represented by 768-dimesional word embedding vector. We calculate attributions for the elements of the input embedding layer of the transformer.

In [2]:
import tensorflow as tf
import numpy as np
import os
import pandas as pd
import re
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Embedding, Conv1D, GlobalMaxPooling1D, Dropout 
from tensorflow.keras.utils import to_categorical
import tensorflow_datasets as tfds
from transformers import BertTokenizerFast, TFBertModel, BertConfig
from alibi.explainers import IntegratedGradients
import matplotlib.pyplot as plt
print('TF version: ', tf.__version__)
print('Eager execution enabled: ', tf.executing_eagerly()) # True

TF version:  2.3.1
Eager execution enabled:  True


In [3]:
def preprocess_reviews(reviews):
    
    REPLACE_NO_SPACE = re.compile("[.;:,!\'?\"()\[\]]")
    REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")
    
    reviews = [REPLACE_NO_SPACE.sub("", line.lower()) for line in reviews]
    reviews = [REPLACE_WITH_SPACE.sub(" ", line) for line in reviews]
    
    return reviews

def process_sentences(sentence1, tokenizer, max_len):
    
    z = tokenizer(sentence1, 
                  add_special_tokens = True, 
                  padding = 'max_length', 
                  max_length = max_len, truncation = True,
                  return_token_type_ids=True, 
                  return_attention_mask = True,  
                  return_tensors = 'np')
    
    return [z['input_ids'], z['attention_mask']]

def get_tokens_labels(train_test,
                      tokenizer,
                      dataset='imdb_reviews/plain_text',
                      shuffle_files=True,
                      max_len=100):
    
    ds = tfds.load(dataset, 
                     split=train_test, 
                     shuffle_files=shuffle_files)
    df = tfds.as_dataframe(ds)
    df['text'] = df['text'].astype(str)
    df['text'] = df['text'].apply(lambda x: x[1:])
    X = df['text'].tolist()
    X = preprocess_reviews(X)
    X = process_sentences(X, tokenizer, max_len)
    y = to_categorical(df['label'].values)
    
    return X, y

## Load data

Loading the imdb dataset and transforming the plain text into tokes representation using the bert tokenizer. 

In [4]:
max_features = 10000
max_len = 100

In [5]:
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased", 
                                              do_lower_case=True)

X_train, y_train = get_tokens_labels('train',
                                     tokenizer, 
                                     max_len=max_len)
X_test, y_test = get_tokens_labels('test', 
                                   tokenizer, 
                                   max_len=max_len)

INFO:absl:Load dataset info from /home/gio/tensorflow_datasets/imdb_reviews/plain_text/1.0.0
INFO:absl:Reusing dataset imdb_reviews (/home/gio/tensorflow_datasets/imdb_reviews/plain_text/1.0.0)
INFO:absl:Constructing tf.data.Dataset for split train, from /home/gio/tensorflow_datasets/imdb_reviews/plain_text/1.0.0
INFO:absl:Load dataset info from /home/gio/tensorflow_datasets/imdb_reviews/plain_text/1.0.0
INFO:absl:Reusing dataset imdb_reviews (/home/gio/tensorflow_datasets/imdb_reviews/plain_text/1.0.0)
INFO:absl:Constructing tf.data.Dataset for split test, from /home/gio/tensorflow_datasets/imdb_reviews/plain_text/1.0.0


## Extract embeddings

Extracting the output embeddings of the transformer model. These embeddings will be used as input features to train the convolutional network sentiment classifier

In [46]:
def train_generator():
    for s1, s2, l in zip(X_train[0], X_train[1], y_train):
        yield {'input_ids': s1, 'attention_mask': s2}, l
        
def test_generator():
    for s1, s2, l in zip(X_test[0], X_test[1], y_test):
        yield {'input_ids': s1, 'attention_mask': s2}, l

def get_embeddings(generator, batch_size=50):
    dataset = tf.data.Dataset.from_generator(generator,
                                             output_types=({'input_ids': tf.int64,
                                                              'attention_mask': tf.int64}, 
                                                             tf.int64))
    dataset = dataset.batch(batch_size)
    embbedings = []

    i = 0
    for X_batch in dataset:
        batch_embeddings = modelBert(X_batch[0])
        embbedings.append(batch_embeddings.last_hidden_state.numpy())
        i += 1
    
    return np.concatenate(embbedings, axis=0)

In [47]:
config = BertConfig.from_pretrained("bert-base-uncased", 
                                    output_hidden_states=True)
modelBert = TFBertModel.from_pretrained('bert-base-uncased', 
                                        config=config)
modelBert.trainable=False

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [48]:
train_embbedings = get_embeddings(train_generator)
test_embbedings = get_embeddings(test_generator)

## Train Model

The `ModelOut` subclassed model includes one convolutional layer and it is trained on the output embeddings of the bert model. The bert model and convolutional model will be combined after training in a end-to-end functional keras model. The bert model is frozen and it is merely used as a features' extractor.

If `save_model = True`, a local folder `../model_imdb` will be created and the trained model will be saved in that folder. If the model was previously saved, it can be loaded by setting `load_model = True`.

In [11]:
load_model = False
save_model = True

In [12]:
nb_filters=32
dropout_1=0.5
dropout_2=0.5 
hidden_dims=32
batch_size = 128
epochs = 20

In [13]:
class ModelOut(tf.keras.Model):

    def __init__(self, 
                 nb_filters=32,
                 dropout_1=0.2,
                 dropout_2=0.2, 
                 hidden_dims=32):
        super(ModelOut, self).__init__()
        
        self.nb_filters = nb_filters
        self.dropout_1 = dropout_1
        self.dropout_2 = dropout_2
        self.hidden_dims = hidden_dims
        
        self.conv = tf.keras.layers.Conv1D(nb_filters, 
                                           kernel_size=4, 
                                           padding="valid")
        self.maxpool = tf.keras.layers.GlobalMaxPool1D()
        self.dropoutl_1 = tf.keras.layers.Dropout(dropout_1)
        self.flat = tf.keras.layers.Flatten()
        self.dense_1 =  tf.keras.layers.Dense(hidden_dims, 
                                              activation='relu', 
                                              kernel_initializer='normal')
        self.dropoutl_2 = tf.keras.layers.Dropout(dropout_2)
        self.dense_2 = tf.keras.layers.Dense(2, 
                                             activation='softmax', 
                                             kernel_initializer='normal')

    def call(self, inputs):
        x = self.conv(inputs)
        x = self.maxpool(x)
        x = self.dropoutl_1(x)
        x = self.flat(x)
        x = self.dense_1(x)
        x = self.dropoutl_2(x)
        x = self.dense_2(x)
        return x
    
    def get_config(self):
        return {"nb_filters": self.nb_filters,
                "dropout_1": self.dropout_1,
                "dropout_2": self.dropout_2, 
                "hidden_dims": self.hidden_dims}

    @classmethod
    def from_config(cls, config):
        return cls(**config)

Training the convolutional overhead model

In [14]:
filepath = './model_imdb/'  # change to directory where model is downloaded

if load_model:
    model_out = tf.keras.models.load_model(
        filepath, custom_objects={"ModelOut": ModelOut})
else:
    model_out = ModelOut(nb_filters=nb_filters,
                     dropout_1=dropout_1,
                     dropout_2=dropout_2, 
                     hidden_dims=hidden_dims)
    
    model_out.compile(optimizer='adam', 
                  loss='categorical_crossentropy', 
                  metrics=['accuracy'])
    
    model_out.fit(train_embbedings, y_train, 
                  validation_data=(test_embbedings, y_test),
                  epochs=epochs, 
                  batch_size=batch_size, 
                  verbose=1)
    if save_model:
        model_out.save(filepath)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


INFO:tensorflow:Assets written to: /home/gio/intgrads_transformers/model/assets


INFO:tensorflow:Assets written to: /home/gio/intgrads_transformers/model/assets


Combining the bert model with the convolutional overhead

In [15]:
input_ids_in = tf.keras.layers.Input(shape=(max_len,), 
                                     name='input_ids', 
                                     dtype=tf.int32)
attention_masks_in = tf.keras.layers.Input(shape=(max_len,), 
                                           name='attention_mask', 
                                           dtype=tf.int32)
X = modelBert([input_ids_in, attention_masks_in])[0]
X = model_out(X)
frozenModelOut = tf.keras.Model(inputs=[input_ids_in, 
                                        attention_masks_in], 
                                outputs=X)

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method








In [16]:
frozenModelOut.summary()

Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_ids (InputLayer)          [(None, 100)]        0                                            
__________________________________________________________________________________________________
attention_mask (InputLayer)     [(None, 100)]        0                                            
__________________________________________________________________________________________________
tf_bert_model (TFBertModel)     TFBaseModelOutputWit 109482240   input_ids[0][0]                  
                                                                 attention_mask[0][0]             
__________________________________________________________________________________________________
model_out (ModelOut)            (None, 2)            99458       tf_bert_model[0][13]  

## Calculate integrated gradients

The integrated gradients attributions are calculated with respect to the input embedding layer of the bert model for 10 samples from the test set. Since bert uses a word embedding with vector dimensionality of 768 and we have chosen a sequence length of 100 words, the dimensionality of the attributions is (10, 100, 768). In order to obtain a single attribution value for each word, we sum all the attribution values for the 768 elements of each word's vector representation.
 
The default baseline is used in this example which is internally defined as a sequence of zeros. In this case, this corresponds to a sequence of padding characters (**NB:** in general the numerical value corresponding to a "non-informative" baseline such as the PAD token will depend on the tokenizer used, make sure that the numerical value of the baseline used corresponds to your desired token value to avoid surprises). The path integral is defined as a straight line from the baseline to the input image. The path is approximated by choosing 50 discrete steps according to the Gauss-Legendre method.

In [18]:
# extracting the input embeddings layer from the bert model. 
bl = frozenModelOut.layers[2].get_input_embeddings()

In [20]:
n_steps = 50
method = "gausslegendre"
internal_batch_size = 5
nb_samples = 10
ig  = IntegratedGradients(frozenModelOut,
                          layer=bl,
                          n_steps=n_steps, 
                          method=method,
                          internal_batch_size=internal_batch_size)

In [21]:
x_test_sample = [X_test[0][:nb_samples], X_test[1][:nb_samples]]
predictions = frozenModelOut(x_test_sample).numpy().argmax(axis=1)
explanation = ig.explain(x_test_sample, 
                         baselines=None, 
                         target=predictions)

In [22]:
# Metadata from the explanation object
explanation.meta

{'name': 'IntegratedGradients',
 'type': ['whitebox'],
 'explanations': ['local'],
 'params': {'method': 'gausslegendre',
  'n_steps': 50,
  'internal_batch_size': 5,
  'layer': None}}

In [23]:
# Data fields from the explanation object
explanation.data.keys()

dict_keys(['attributions', 'X', 'baselines', 'predictions', 'deltas', 'target'])

In [24]:
# Get attributions values from the explanation object
attrs = explanation.attributions[0]
print('Attributions shape:', attrs.shape)

Attributions shape: (10, 100, 768)


## Sum attributions

In [25]:
attrs = attrs.sum(axis=2)
print('Attributions shape:', attrs.shape)

Attributions shape: (10, 100)


## Visualize attributions

In [26]:
i = 0
x_i = x_test_sample[0][i]
attrs_i = attrs[i]
pred = predictions[i]
pred_dict = {1: 'Positive review', 0: 'Negative review'}

In [27]:
print('Predicted label =  {}: {}'.format(pred, pred_dict[pred]))

Predicted label =  1: Positive review


We can visualize the attributions for the text instance by mapping the values of the attributions onto a matplotlib colormap. Below we define some utility functions for doing this.

In [28]:
from IPython.display import HTML
def  hlstr(string, color='white'):
    """
    Return HTML markup highlighting text with the desired color.
    """
    return f"<mark style=background-color:{color}>{string} </mark>"

In [29]:
def colorize(attrs, cmap='PiYG'):
    """
    Compute hex colors based on the attributions for a single instance.
    Uses a diverging colorscale by default and normalizes and scales
    the colormap so that colors are consistent with the attributions.
    """
    import matplotlib as mpl
    cmap_bound = np.abs(attrs).max()
    norm = mpl.colors.Normalize(vmin=-cmap_bound, vmax=cmap_bound)
    cmap = mpl.cm.get_cmap(cmap)
    
    # now compute hex values of colors
    colors = list(map(lambda x: mpl.colors.rgb2hex(cmap(norm(x))), attrs))
    return colors

Below we visualize the attribution values (highlighted in the text) having the highest positive attributions. Words with high positive attribution are highlighted in shades of green and words with negative attribution in shades of pink. Stronger shading corresponds to higher attribution values. Positive attributions can be interpreted as increase in probability of the predicted class ("Positive sentiment") while negative attributions correspond to decrease in probability of the predicted class.

In [30]:
words = tokenizer.decode(x_i).split()
colors = colorize(attrs_i)

In [31]:
HTML("".join(list(map(hlstr, words, colors))))