# Calculating `Integrated Gradients` in Keras Language Models

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

**`Integrated Gradients` is a method to make a classification model interpretable, proposed in Sundararajan et al., [“Axiomatic Attribution for Deep Networks”](https://arxiv.org/abs/1703.01365). This methodology uses the gradient to determine what _influence_ the individual inputs (_like words in a sentese_) have on the output of a model.**

**This is similar to the concept of saliency maps, something we covered in [this](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/c603bb323ace7876d48abf8234d3f181c4b5d744/ML%20Explainability/CV%20Interpreter/CNN_attribution_maps_with_LIME.ipynb) (Grad-Cam) and [this](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/87b1e85fa62074af1459e863cac986dc973b4666/ML%20Explainability/CV%20Interpreter%20&%20Adversarial/lime_for_CV.ipynb) (LIME) notebook. `Integrated Gradients` is the same method we used to interpret the generated images of the `stable-diffusion` model in [this](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/cd18a9269ab705544f935dcf0a492f3d9222431e/ML%20Explainability/CV%20Interpreter/diffusion_interpreter.ipynb) notebook.**

**We will be using the `alibi` library for this. [Alibi](https://docs.seldon.io/projects/alibi) is _"an open source Python library aimed at machine learning model inspection and interpretation._"**

**In this notebook, we apply the integrated gradients method to a sentiment analysis model trained on a dataset we used in the `adversarial_text_attacks.ipynb'`, found [here](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/cd18a9269ab705544f935dcf0a492f3d9222431e/ML%20Adversarial/adversarial_text_attack.ipynb).**

**The pre-trained model/tokenizer can be found in the `models folder`. The model comes in two versions. one with a sigmoid output, and the other with a softmax. Here we show how to use `IntegratedGradients` on both types of models.**

In [1]:
import json
import torch
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from keras_preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer, tokenizer_from_json

model_path = 'models\senti_model_sigmoid.h5'
#model_path = 'models\senti_model_softmax.h5'

tokenizer_path = 'models\\tokenizer_senti_model_en.json'

model = keras.models.load_model(model_path) 

with open(tokenizer_path) as f:
    data = json.load(f)
    tokenizer = tokenizer_from_json(data)
    word_index = tokenizer.word_index



strings = [
    'is hard to say something about a model so simple',
    'this model is garbage, i wont my money back',
    'is nice to see philosophers doing machine learning',
    'this is a great and wonderful example of NLP',
]

preds = model.predict(
        keras.preprocessing.sequence.pad_sequences(
                                                    tokenizer.texts_to_sequences(strings),
                                                    maxlen=256,
                                                    truncating='post'
                                                ),
    verbose=0)

for i, string in enumerate(strings):
    print(f'{string}\n')
    
    # for sigmoid model
    print(f'Negative Sentiment 😔 {round((1 - preds[i][0]) * 100)}% | Positive Sentiment 😊 {round(preds[i][0] * 100)}%\n{"*" * 50}')

    # for softmax model
    #print(f'Negative Sentiment 😔 {round((preds[i][0]) * 100)}% | Positive Sentiment 😊 {round(preds[i][1] * 100)}%\n{"*" * 50}')

is hard to say something about a model so simple

Negative Sentiment 😔 100% | Positive Sentiment 😊 0%
**************************************************
this model is garbage, i wont my money back

Negative Sentiment 😔 100% | Positive Sentiment 😊 0%
**************************************************
is nice to see philosophers doing machine learning

Negative Sentiment 😔 0% | Positive Sentiment 😊 100%
**************************************************
this is a great and wonderful example of NLP

Negative Sentiment 😔 0% | Positive Sentiment 😊 100%
**************************************************


**In language models, like text classification models, integrated gradients define an attribution value for each word in the input sentence. The attributions are calculated considering the integral of the model gradients with respect to the word embedding layer along a straight path from a baseline instance $x^′$ to the input instance $x$.**

**Thus we can say the attribution given to an input is equal to the difference between the model output at the instance $x$ and the model output at the baseline $x^′$:**

$$
A(x, x') = F(x) -  F(x')
$$

**To utilize the `IntegratedGradients`class from `alibi`, we need to set some arguments first:**

-   `model`: Tensorflow or Keras model.
-   `layer`: Layer with respect to which the gradients are calculated. In the case of our language model, is the `Embedding` layer.
-   `target_fn`: A scalar function that is applied to the predictions of the model (like ` np.argmax(predictions, axis=1)`).
-   `method`: Method for the integral approximation (`riemann_left`,  `riemann_right`,  `riemann_middle`,  `riemann_trapezoid`,  `gausslegendre`).
-   `n_steps`: Number of step in the path integral approximation from the baseline to the input instance.  
-   `internal_batch_size`: Batch size for the internal batching.


**Since the model uses a word to vector embedding with vector dimensionality of 50 and sequence length of 256 words, the dimensionality of the attributions is `(len(x_test_sample), 256, 50)` In order to obtain a single attribution value for each word, we sum all the attribution values for the 50 elements of each word’s vector representation.**

**Bellow we create an `IntegratedGradients` object (`ig`) with these parameters.**

In [61]:

layer = model.layers[1]
n_steps = 50
internal_batch_size = 256

from alibi.explainers import IntegratedGradients

ig  = IntegratedGradients(model,
                        target_fn=None,
                        layer=layer,
                        n_steps=n_steps,
                        method="gausslegendre",
                        internal_batch_size=internal_batch_size)



**The integrated gradient attributions are calculated concerning the embedding layer for the number of samples we defined in our `x_test_sample` list. This could also be a partition if your testing set.**

**With these samples, we use our model to generate a prediction array (`preds`), calling all the functions that turn/pad a `string` into a `sequence of tokens`.**

**`ig.explain` (the actual explanation of our model), requires a list of elements (predicted_classes) of the model's output so it can compute the gradients.** We can achieve this by "argmaxing" the `preds` array, or by passing the `preds.argmax(axis=1)`function as the `target` parameter.

**Here we are using the default baseline (`None`), which equates to a sequence of zeros (_this corresponds to a sequence of padding characters, a.k.a. no input_). The path integral is defined as a straight line from the baseline ($x$) to the input sample ($x'$).**

**If you are using a model with a `softmax` output, you can set the `target` parameter to something like `preds.argmax(axis=1)`. If you are using a model with a `sigmoid` output, some basic list comprehension (`[0 if preds[i][0] < 0.5 else 1 for i in range(len(preds))]`) can give you list of predicted classes for your samples.**

In [62]:
x_test_sample = [
    'One of the weakest entries in the J-horror remake sweepstakes, One Missed Call is undone by bland performances and shopworn shocks.'
]
preds = model.predict(
        keras.preprocessing.sequence.pad_sequences(
                                                    tokenizer.texts_to_sequences(x_test_sample),
                                                    maxlen=256,
                                                    truncating='post'
                                                ),
    verbose=0)

target_function =  [0 if preds[i][0] < 0.5 else 1 for i in range(len(preds))]
#target_function =  preds.argmax(axis=1) 

explanation = ig.explain(keras.preprocessing.sequence.pad_sequences(
                                                    tokenizer.texts_to_sequences(x_test_sample),
                                                    maxlen=256,
                                                    truncating='post'
                                                ),
                         baselines=None,
                         target=target_function,
                         
                         attribute_to_layer_inputs=False)
explanation.meta

{'name': 'IntegratedGradients',
 'type': ['whitebox'],
 'explanations': ['local'],
 'params': {'target_fn': None,
  'method': 'gausslegendre',
  'n_steps': 50,
  'internal_batch_size': 256,
  'layer': 1},
 'version': '0.8.0'}

**From this explanation object, we can recover a lot of useful information.**

In [63]:
print('Target Classes')
print(explanation.target)
print('\nEmbedded inputs')
print(explanation.X)
print('\nBaselines')
print(explanation.baselines)
print('\nThe predictions of the model')
print(explanation.predictions)

Target Classes
[0]

Embedded inputs
[[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0  

**But attributions are what we are interested in. And we can also retrieve this from our `explanation` object. We also need to sum all the attribution scores related to each of our embeddings.**

In [66]:
attrs = explanation.attributions[0]
attrs = attrs.sum(axis=2)
print('Attributions shape:', attrs.shape)

Attributions shape: (1, 256)


**Now we get rid of the padding tokens and take only the attributions that relate to the words in our input. This input (_given by the way the model was constructed_) has a limit of 256 tokens (_which in this case are words_).**

**Since our testing_set has only one sample (`x_test_sample`), the index of this sample is 0.**

In [67]:
sample = 0
words = x_test_sample[sample].split()

if len(words) < len(attrs[sample]):
    atributions = attrs[sample][-len(words):]
else:
    atributions = attrs[sample]
    words = words[-len(atributions):]

len(words), len(atributions)

(21, 21)

**To crate a visually intuitive way of interpreting this model's output, we will assign a color to each of the attribution scores, taking the `max` and `min` values to set a range of predefined colors.**

In [70]:
import matplotlib
import matplotlib.cm as cm
import matplotlib.colors as mcolors
from IPython.display import HTML

minima = min(atributions)
maxima = max(atributions)

norm = matplotlib.colors.Normalize(vmin=minima, vmax=maxima, clip=True)
mapper = cm.ScalarMappable(norm=norm, cmap=cm.YlOrRd)

colors = [mcolors.to_hex(mapper.to_rgba(v)) for v in atributions]

colors

['#fffdc6',
 '#fffdc6',
 '#fffdc6',
 '#fffdc6',
 '#fffdc6',
 '#fffdc6',
 '#fffdc6',
 '#fffdc6',
 '#ffffcc',
 '#fffdc8',
 '#fffcc4',
 '#fffbc2',
 '#fffec9',
 '#fff7b7',
 '#fff9be',
 '#ffeea3',
 '#cd0b22',
 '#800026',
 '#feb852',
 '#febb56',
 '#febb56']

**Below we create a function that maps the generated colors to each work in the text sample, generating a colorful HTML representation of the attributions given to each word. Since we are using the `YlOrRd` color scale, words with _high positive attribution_ are colored in shades of _red_. Words with _middling attributions_ are colored _orange_, while _low attributions_ receive a _pale yellow_.**

In [71]:
text_with_attributions = ' '.join([f'''<span style="color:{colors[i]}"><b>{words[i]}</b></span>''' for i in range(len(words))])

print(f'Sample\n{"*" * 50}\n\n{x_test_sample[sample]}\n')
print(f'Prediction\n{"*" * 50}\n')

# for softmax model
#print(f'Negative Sentiment 😔 {round((explanation.predictions[sample][0]) * 100)}% | Positive Sentiment 😊 {round(explanation.predictions[sample][1] * 100)}%\n\n{"*" * 50}')

# for sigmoid model
print(f'Negative Sentiment 😔 {round((1 - explanation.predictions[sample][0]) * 100)}% | Positive Sentiment 😊 {round(explanation.predictions[sample][0] * 100)}%\n{"*" * 50}')
display(HTML(f'Attributions: {text_with_attributions}.'))

Sample
**************************************************

One of the weakest entries in the J-horror remake sweepstakes, One Missed Call is undone by bland performances and shopworn shocks.

Prediction
**************************************************

Negative Sentiment 😔 100% | Positive Sentiment 😊 0%
**************************************************


**Above you can see what are the tokens that had _a greater influence on the prediction of our sentiment classifier_.** 🎭

----

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).