In [3]:
%load_ext autoreload
%autoreload 2

# Explaining Keras text classifier predictions with Grad-CAM

We will explain text classification predicictions using Grad-CAM.

Grad-CAM shows what's important in input, using a hidden layer and a target class.

## Set up

First some imports

In [4]:
import os

import numpy as np
import pandas as pd
from IPython.display import display, HTML

# you may want to keep logging enabled when doing your own work
import logging
import tensorflow as tf
tf.get_logger().setLevel(logging.ERROR) # disable Tensorflow warnings for this tutorial
import warnings
warnings.simplefilter("ignore") # disable Keras warnings for this tutorial
import keras

import eli5

Using TensorFlow backend.


The rest of what we need in this tutorial is stored in the `tests/estimators` package that you can check for reference. You may need extra steps here to load your custom model and data.

In [5]:
# we need to go back up to top level to import some local ELI5 modules

old = os.getcwd()
os.chdir('..')

## Explaining binary (sentiment) classifications

A binary classification task with only one output. In this case high (1) is positive, low (0) is negative. We will use the IMDB dataset and a recurrent model, with word level tokenization.

Load our model

In [6]:
model = keras.models.load_model('tests/estimators/keras_sentiment_classifier/keras_sentiment_classifier.h5')
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 8)           80000     
_________________________________________________________________
masking_1 (Masking)          (None, None, 8)           0         
_________________________________________________________________
masking_2 (Masking)          (None, None, 8)           0         
_________________________________________________________________
masking_3 (Masking)          (None, None, 8)           0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 128)         37376     
_________________________________________________________________
bidirectional_2 (Bidirection (None, None, 64)          41216     
_________________________________________________________________
bidirectional_3 (Bidirection (None, 32)                10368     
__________

Load some sample data. We have a module that will do preprocessing, etc for us. Check the relevant package to learn more. For your own models you will have to do your own preprocessing

In [7]:
import tests.estimators.keras_sentiment_classifier.keras_sentiment_classifier \
as keras_sentiment_classifier

In [8]:
(x_train, y_train), (x_test, y_test) = keras_sentiment_classifier.prepare_train_test_dataset()

Confirm the accuracy of the model

In [9]:
print(model.metrics_names)
model.evaluate(x_test, y_test)

['loss', 'acc']


[0.4319177031707764, 0.81504]

Looks good? Let's go on and check one of the test samples.

In [10]:
doc = x_test[0:1]
print(doc)

tokens = keras_sentiment_classifier.vectorized_to_tokens(doc)
print(tokens)

[[   1  591  202   14   31    6  717   10   10    2    2    5    4  360
     7    4  177 5760  394  354    4  123    9 1035 1035 1035   10   10
    13   92  124   89  488 7944  100   28 1668   14   31   23   27 7479
    29  220  468    8  124   14  286  170    8  157   46    5   27  239
    16  179    2   38   32   25 7944  451  202   14    6  717    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0]]
[['<START>', 'please', 'give', 'this', 'one', 'a', 'miss', 'br', 'br', '<OOV>', '<OOV>', 'and', 'the', 'rest', 'of', 'the', 'cast', 'rendered', 'terrible', 'performances', 'the', 'show', 'is', 'flat', 'flat', 'flat', 'br', 'br', 'i', "don't", 'know', 'how', 'michael', 'madison', 'could', 'have', 'allowed', 'this', 'one', 'on', 'his', 'p

Check the prediction

In [11]:
model.predict(doc)

array([[0.1622659]], dtype=float32)

As expected, looks pretty low accuracy.

Now let's explain what got us this result with ELI5. We need to pass the model, the input, and the associated tokens that will be highlighted.

In [10]:
eli5.show_prediction(model, doc, tokens=tokens)

What we are seeing is what makes the prediction "go up", i.e. the "positive" words (check the next section to see how to show positive AND negative words with the `relu` argument).

Let's try a custom input

In [16]:
s = "hello this is great but not so great"
doc_s, tokens_s = keras_sentiment_classifier.string_to_vectorized(s)
print(doc_s, tokens_s)

[[   1 4825   14    9   87   21   24   38   87]] [['<START>' 'hello' 'this' 'is' 'great' 'but' 'not' 'so' 'great']]


Notice that this model does not require fixed length input. We do not need to pad this sample.

In [12]:
model.predict(doc_s)

array([[0.5912496]], dtype=float32)

In [13]:
eli5.show_prediction(model, doc_s, tokens=tokens_s)

## Modify explanations with the `relu` and `counterfactual` arguments

What did we see in the last section? Grad-CAM shows what makes a class score "go up". So we are only seeing the "positive" parts.

To "fix" this, we can pass two boolean arguments.

`relu` filters out the negative scores and only shows what makes the predicted score go up (set to `False` to disable).

In [14]:
eli5.show_prediction(model, doc_s, tokens=tokens_s, relu=False)

For the test sample

In [16]:
eli5.show_prediction(model, doc, tokens=tokens, relu=False)

Green is positive, red is negative, white is neutral. We can see what made the network decide that is is a negative example.

`counterfactual` shows the "opposite", what makes the score "go down" (set to `True` to enable).

In [15]:
eli5.show_prediction(model, doc, tokens=tokens, counterfactual=True)

What happens if we pass both `counterfactual` and `relu`?

In [17]:
eli5.show_prediction(model, doc, tokens=tokens, relu=False, counterfactual=True)

Notice how the colors (green and red) are inverted.

## Removing padding with `pad_value` and `padding` arguments

Often when working with text, each example is padded, whether because the model expects input with a certain length, or to have all samples be the same length to put them in a batch.

We can remove padding by specifying two arguments. The first is `pad_value`, the padding token such as `<PAD>` or a numeric value such as `0` for `doc`. The second argument is `padding`, which should be set to either `pre` (padding is done before actual text) or `post` (padding is done after actual text).

In [18]:
eli5.show_prediction(model, doc, tokens=tokens, relu=False, pad_value='<PAD>', padding='post')

Now the explanation is shorter. This is useful if the input has a lot of padding.

We can also pass padding as a number into `doc`.

In [12]:
print(doc)

[[   1  591  202   14   31    6  717   10   10    2    2    5    4  360
     7    4  177 5760  394  354    4  123    9 1035 1035 1035   10   10
    13   92  124   89  488 7944  100   28 1668   14   31   23   27 7479
    29  220  468    8  124   14  286  170    8  157   46    5   27  239
    16  179    2   38   32   25 7944  451  202   14    6  717    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0]]


Notice the number used for padding is '0'.

In [13]:
eli5.show_prediction(model, doc, tokens=tokens, relu=False, pad_value=0, padding='post')

Let's try pre-padding.

In [19]:
from keras.preprocessing.sequence import pad_sequences

In [25]:
tokens_s_padded = pad_sequences(tokens_s, value='<PAD>', padding='pre', maxlen=30, dtype=object)
print(tokens_s_padded)

[['<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<START>' 'hello' 'this' 'is' 'great' 'but'
  'not' 'so' 'great']]


In [27]:
# numericalize the tokens
doc_s_padded = keras_sentiment_classifier.tokens_to_vectorized(tokens_s_padded)
print(doc_s_padded)

[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4825, 14, 9, 87, 21, 24, 38, 87]]


In [31]:
eli5.show_prediction(model, np.array(doc_s_padded), tokens=tokens_s_padded, relu=False, pad_value='<PAD>', padding='pre')

## Choosing a hidden layer to do Grad-CAM on with `layer`

Grad-CAM requires a hidden layer to do its calculations on. This is controlled by the `layer` argument. We can pass the layer (as an int index, string name, or a keras Layer instance) explicitly, or let ELI5 attempt to find a good layer to do Grad-CAM on automatically.

In [19]:
for layer in model.layers:
    name = layer.name
    print(name)
    if 'masking' not in layer.name:
        e = eli5.show_prediction(model,
                                 doc,
                                 tokens=tokens,
                                 layer=layer,
                                 relu=False, 
                                 pad_value='<PAD>', 
                                 padding='post')
        display(e) # if using in a loop, we need these two explicit IPython calls

embedding_1


masking_1
masking_2
masking_3
bidirectional_1


bidirectional_2


bidirectional_3


dense_1


dense_2


If you don't get good explanations from ELI5 out of the box, it may be worth looking into this parameter. We advice to pick layers that contain "spatial or temporal" information, i.e. NOT dense/fully-connected or merge layers.

Notice that when explaining the final dense layer node (there is only 1 output), we get an "all green" explanation. You need to hover over the explanation to see the actual value. It seems off because there are no "negative" values here and the colouring is not gradual.

## Explaining multiclass predictions

A multi-class model trained on the finanial dataset. Character-level tokenization. Convolutional network.

In [32]:
# multiclass model (*target, layer - conv/others, diff. types of expls, padding and its effect)

In [33]:
model2 = keras.models.load_model('tests/estimators/keras_multiclass_text_classifier/keras_multiclass_text_classifier.h5')
model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 3193, 8)           816       
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 3179, 128)         15488     
_________________________________________________________________
dropout_1 (Dropout)          (None, 3179, 128)         0         
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1589, 128)         0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 1580, 128)         163968    
_________________________________________________________________
dropout_2 (Dropout)          (None, 1580, 128)         0         
_________________________________________________________________
average_pooling1d_1 (Average (None, 790, 128)          0         
__________

In [34]:
import tests.estimators.keras_multiclass_text_classifier.keras_multiclass_text_classifier \
as keras_multiclass_text_classifier

In [35]:
(x_train, x_test), (y_train, y_test) = keras_multiclass_text_classifier.prepare_train_test_dataset()

Possible classes

In [36]:
keras_multiclass_text_classifier.labels_index

{'Debt collection': 0,
 'Consumer Loan': 1,
 'Mortgage': 2,
 'Credit card': 3,
 'Credit reporting': 4,
 'Student loan': 5,
 'Bank account or service': 6,
 'Payday loan': 7,
 'Money transfers': 8,
 'Other financial service': 9,
 'Prepaid card': 10}

Again check the metrics.

In [37]:
print(model2.metrics_names)
model2.evaluate(x_test, y_test)

['loss', 'acc']


[0.6319513120651246, 0.7999999990463257]

Let's explain one of the test samples

In [38]:
doc = x_test[0:1]
tokens = keras_multiclass_text_classifier.vectorized_to_tokens(doc)
s = keras_multiclass_text_classifier.tokens_to_string(tokens)

print(len(doc[0]))
limit = 150
print(doc[0, :limit])
print(tokens[0, :limit])
print(s[0][:limit+800])

3193
[38 15 21  3  7  2 20  8  7  5  7 15  8  5 14  2 11  3  9 25  8 15  3 11
  2 15 14  5  8 16 11  2 11  8 16 17 14  4  5  7  3  6 17 11 14 18  2  4
  6  2 12  5 25  3  2  5  2 14  6  5  7  2 21  8  4 12  2 16  3  2 58  2
 13  3 11 19  8  4  3  2 16 18  2  7  3 25  3  9  2 12  5 25  8  7 22  2
 13  6  7  3  2 24 17 11  8  7  3 11 11  2 21  8  4 12  2  4 12  3 16  2
  6  9  2 12  5 25  8  7 22  2 24  3  3  7  2  7  6  4  8 20  8  3 13  2
  6 20  2 11  5  8]
['O' 'c' 'w' 'e' 'n' ' ' 'f' 'i' 'n' 'a' 'n' 'c' 'i' 'a' 'l' ' ' 's' 'e'
 'r' 'v' 'i' 'c' 'e' 's' ' ' 'c' 'l' 'a' 'i' 'm' 's' ' ' 's' 'i' 'm' 'u'
 'l' 't' 'a' 'n' 'e' 'o' 'u' 's' 'l' 'y' ' ' 't' 'o' ' ' 'h' 'a' 'v' 'e'
 ' ' 'a' ' ' 'l' 'o' 'a' 'n' ' ' 'w' 'i' 't' 'h' ' ' 'm' 'e' ' ' '(' ' '
 'd' 'e' 's' 'p' 'i' 't' 'e' ' ' 'm' 'y' ' ' 'n' 'e' 'v' 'e' 'r' ' ' 'h'
 'a' 'v' 'i' 'n' 'g' ' ' 'd' 'o' 'n' 'e' ' ' 'b' 'u' 's' 'i' 'n' 'e' 's'
 's' ' ' 'w' 'i' 't' 'h' ' ' 't' 'h' 'e' 'm' ' ' 'o' 'r' ' ' 'h' 'a' 'v'
 'i' 'n' 'g' ' ' 'b' 'e' '

Notice that the padding length is quite long. We are also dealing with character-level tokenization - our tokens are single characters, not words.

Let's check what the model predicts (to which category the financial complaint belongs).

In [39]:
preds = model2.predict(doc)
print(preds)
y = np.argmax(preds)
print(y)
keras_multiclass_text_classifier.decode_output(y)

[[7.4966592e-03 9.7562626e-08 9.9250317e-01 9.1982411e-12 5.3569739e-08
  4.8417964e-10 9.6964792e-10 4.0114050e-09 5.9291594e-10 3.4063903e-13
  3.9474773e-19]]
2


'Mortgage'

And the ground truth:

In [40]:
y_truth = y_test[0]
print(y_truth)
keras_multiclass_text_classifier.decode_output(y_truth)

[0 0 1 0 0 0 0 0 0 0 0]


'Mortgage'

Now let's explain this prediction with ELI5. Enable relu to not see other classes.

In [41]:
eli5.show_prediction(model2, doc, tokens=tokens, pad_value='<PAD>', padding='post')

Our own example

In [42]:
s = "the IRS is afterr my car loan"
doc_s, tokens_s = keras_multiclass_text_classifier.string_to_vectorized(s)
print(doc_s)
print(tokens_s[0, :50]) # note that this model requires fixed length input

[[ 4 12  3 ...  0  0  0]]
['t' 'h' 'e' ' ' 'I' 'R' 'S' ' ' 'i' 's' ' ' 'a' 'f' 't' 'e' 'r' 'r' ' '
 'm' 'y' ' ' 'c' 'a' 'r' ' ' 'l' 'o' 'a' 'n' '<PAD>' '<PAD>' '<PAD>'
 '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
 '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>']


In [43]:
preds = model2.predict(doc_s)
print(preds)
keras_multiclass_text_classifier.decode_output(preds)

[[0.09576575 0.27872923 0.10852851 0.03327851 0.11653358 0.1867436
  0.02678595 0.13854526 0.00900717 0.00178243 0.00429991]]


'Consumer Loan'

In [44]:
eli5.show_prediction(model2, doc_s, tokens=tokens_s, pad_value='<PAD>', padding='post')

# TODO: would be good to show predicted label

## Choosing a classification target to focus on via `targets`

In [45]:
debt_idx = 0
loan_idx = 1

In [46]:
eli5.show_prediction(model2, doc_s, tokens=tokens_s, pad_value='<PAD>', padding='post', targets=[debt_idx])

Sensible?

In [52]:
from keras.layers import (
    Embedding,
    Conv1D,
    MaxPool1D,
    AveragePooling1D,
    GlobalAveragePooling1D,
)

In [59]:
for layer in model2.layers:
    print(layer.name, layer.output_shape)
    if isinstance(layer, (Embedding, Conv1D, MaxPool1D, AveragePooling1D, GlobalAveragePooling1D)):
        e = eli5.show_prediction(model2,
                                 doc,
                                 tokens=tokens,
                                 layer=layer,
                                 relu=False, 
                                 pad_value='<PAD>', 
                                 padding='post')
        display(e) # if using in a loop, we need these two explicit IPython calls

embedding_1 (None, 3193, 8)


conv1d_1 (None, 3179, 128)


dropout_1 (None, 3179, 128)
max_pooling1d_1 (None, 1589, 128)


conv1d_2 (None, 1580, 128)


dropout_2 (None, 1580, 128)
average_pooling1d_1 (None, 790, 128)


conv1d_3 (None, 786, 128)


dropout_3 (None, 786, 128)
max_pooling1d_2 (None, 393, 128)


global_average_pooling1d_1 (None, 128)


dense_1 (None, 32)
dense_2 (None, 11)


In [None]:
# The results are not so good.

## Resizing the heatmap with the `interpolation_kind` argument

Heatmap does not match shape of tokens. We want to control how the resizing is done.

Getting back to sentiment classification

In [56]:
E = eli5.explain_prediction(model, doc_s, tokens=tokens_s, pad_value='<PAD>', padding='post')

In [57]:
heatmap = E.targets[0].heatmap

In [58]:
print(tokens.shape, len(heatmap))

(1, 3193) 29


In [None]:
model.get_layer(index=3).output_shape

In [None]:
kinds = ['linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'previous', 'next']

In [None]:
for kind in kinds:
    print(kind)
    H = eli5.show_prediction(model2, doc_s, tokens=tokens_s, pad_value='<PAD>', padding='post', 
                             interpolation_kind=kind,
                             )
    display(H)

The results are roughly the same. If highlighting seems off this argument may be a thing to try.

## How it works - `explain_prediction()` and `format_as_html()`.

In [None]:
# heatmap, tokens, weighted_spans, interpolation_kind, etc.

In [None]:
E = eli5.explain_prediction(model2, doc_s, tokens=tokens_s, pad_value='<PAD>', padding='post')

Looking at the `Explanation` object

In [None]:
repr(E)

We can get the predicted class and the value for the prediction

In [None]:
target = E.targets[0]
print(target.target, target.score)

The highlighting for each token is stored in a `WeightedSpans` object (specifically the `DocWeightedSpans` object)

In [None]:
weighted_spans = target.weighted_spans
print(weighted_spans)

doc_ws = weighted_spans.docs_weighted_spans[0]
print(doc_ws)

Observe the `document` attribute and `spans`

In [None]:
print(doc_ws.document)
print(doc_ws.spans)

The `document` is the "stringified" version of `tokens`. If you have a custom "tokens -> string" algorithm you may want to set this attribute yourself.

The `spans` object is a list of weights for each character in `document`. We use the indices in `document` string to indicate which characters should be weighted with a specific value.

The weights come from the `heatmap` object found on each item in `targets`.

In [None]:
heatmap = target.heatmap
print(heatmap)
print(len(heatmap))

print(len(doc_ws.spans))

You can think of this as an array of "importances" in the tokens array (after padding is removed).

Let's format this. HTML formatter is what should be used here.

In [None]:
import eli5.formatters.fields as fields
F = eli5.format_as_html(E, show=fields.WEIGHTS)

We pass a `show` argument to not display the method name or its description (Grad-CAM). See `eli5.format_as_html()` for a list of all supported arguments.

The output is an HTML-encoded string.

In [None]:
repr(F)

Display it in an IPython notebook

In [None]:
display(HTML(F))

## Notes on results

### Multi-label classification

Did not really work according to our manual testing.