In [2]:
%load_ext autoreload
%autoreload 2

# Explaining Keras text classifier predictions with Grad-CAM

We can use ELI5 to explain text-based classifiers, i.e. models that take in a text and assign it to some class. Common examples include sentiment classification, labelling into categories, etc.

The underlying method used is 'Grad-CAM' (https://arxiv.org/abs/1610.02391). This technique shows what parts of the input are the most important to the predicted result, by overlaying a "heatmap" over the original input.

See also the tutorial for images (https://eli5.readthedocs.io/en/latest/tutorials/keras-image-classifiers.html). Certain sections such as 'removing softmax' and 'comparing different models' are relevant for text as well.

## Set up

First some imports

In [3]:
import os
import sys

import numpy as np
import pandas as pd
from IPython.display import display, HTML  # our explanations will be formatted in HTML

# you may want to keep logging enabled when doing your own work
import logging
import tensorflow as tf
tf.get_logger().setLevel(logging.ERROR) # disable Tensorflow warnings for this tutorial
import warnings
warnings.simplefilter("ignore") # disable Keras warnings for this tutorial
import keras
from keras.preprocessing.sequence import pad_sequences

import eli5

Using TensorFlow backend.


In [4]:
# for reproducibility, the tutorial was ran with these Python and package versions
print(sys.version_info, keras.__version__, tf.__version__, sep='\n')

sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)
2.2.4
1.14.0


The rest of what we need in this tutorial is stored in the `tests/estimators` package, whose source you can check for your own reference. You may need extra steps here to load your custom model and data.

## Explaining binary (sentiment) classifications

In binary classification there is only one possible class to which a piece of text can either belong to or not. In sentiment classification, that class is whether the text is "positive" (belongs to the class) or "negative" (doesn't belong to the class).

In this example we will have a recurrent model with word level tokenization, trained on the IMDB dataset (https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification). The model has one output node that gives probabilities. Output close to 1 is positive, and close to 0 is negative.

See https://www.tensorflow.org/beta/tutorials/text/text_classification_rnn for a simple example of how to build such a model and prepare its input.

For exact details of how we trained our model and what data we used see https://www.kaggle.com/tobalt/keras-text-model-sentiment or the `tests/estimators/keras_sentiment_classifier/keras_sentiment_classifier.ipynb` file in the ELI5 repo.

In [5]:
import tests.estimators.keras_sentiment_classifier.keras_sentiment_classifier \
    as keras_sentiment_classifier

Let's load our pre-trained model

In [73]:
binary_model = keras.models.load_model(keras_sentiment_classifier.MODEL)
binary_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 8)           80000     
_________________________________________________________________
masking_1 (Masking)          (None, None, 8)           0         
_________________________________________________________________
masking_2 (Masking)          (None, None, 8)           0         
_________________________________________________________________
masking_3 (Masking)          (None, None, 8)           0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 128)         37376     
_________________________________________________________________
bidirectional_2 (Bidirection (None, None, 64)          41216     
_________________________________________________________________
bidirectional_3 (Bidirection (None, 32)                10368     
__________

Load our test and train data. We have a module that will do preprocessing for us. For your own usage you may have to do your own preprocessing.

In [7]:
(x_train, y_train), (x_test, y_test) = keras_sentiment_classifier.prepare_train_test_dataset()

Confirm the accuracy of the model

In [8]:
# print(binary_model.metrics_names)
# loss, acc = binary_model.evaluate(x_test, y_test)
# print(loss, acc)

# print('Accuracy: ', acc)

Looks good? Let's go on and check one of the test samples.

In [9]:
test_review = x_test[0:1]
print(test_review)

test_review_t = keras_sentiment_classifier.vectorized_to_tokens(test_review)
print(test_review_t)

[[   1  591  202   14   31    6  717   10   10    2    2    5    4  360
     7    4  177 5760  394  354    4  123    9 1035 1035 1035   10   10
    13   92  124   89  488 7944  100   28 1668   14   31   23   27 7479
    29  220  468    8  124   14  286  170    8  157   46    5   27  239
    16  179    2   38   32   25 7944  451  202   14    6  717    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0]]
[['<START>', 'please', 'give', 'this', 'one', 'a', 'miss', 'br', 'br', '<OOV>', '<OOV>', 'and', 'the', 'rest', 'of', 'the', 'cast', 'rendered', 'terrible', 'performances', 'the', 'show', 'is', 'flat', 'flat', 'flat', 'br', 'br', 'i', "don't", 'know', 'how', 'michael', 'madison', 'could', 'have', 'allowed', 'this', 'one', 'on', 'his', 'p

Check the prediction

In [10]:
binary_model.predict(test_review)

array([[0.1622659]], dtype=float32)

As expected, looks pretty low score.

Now let's explain what got us this result with ELI5. We need to pass the model, the input, and the associated tokens that will be highlighted.

In [11]:
eli5.show_prediction(binary_model, test_review, tokens=test_review_t)

What we are seeing is what makes the prediction "go up", i.e. the "positive" words (check the next section to see how to show positive AND negative words with the `relu` argument).

Hover over the highlighted words to see their "weight".

Let's try a custom input

In [16]:
s = "hello this is great but not so great"
# s = 'good and bad'
review, review_t = keras_sentiment_classifier.string_to_vectorized(s)
print(review, review_t, sep='\n')

[[   1 4825   14    9   87   21   24   38   87]]
[['<START>' 'hello' 'this' 'is' 'great' 'but' 'not' 'so' 'great']]


Notice that this model does not require fixed length input. We do not need to pad this sample.

In [13]:
binary_model.predict(review)

array([[0.4038432]], dtype=float32)

Neutral as expected.

What makes the score go up?

In [74]:
eli5.show_prediction(binary_model, review, tokens=review_t)

bidirectional_2


Let's try to add padding to the sample and explain that

In [75]:
review_t_padded = pad_sequences(review_t, maxlen=128, value='<PAD>', dtype=object)
review_padded = keras_sentiment_classifier.tokens_to_vectorized(review_t_padded)
print(review_t_padded, review_padded)

eli5.show_prediction(binary_model, review_padded, tokens=review_t_padded)

[['<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<START>' 'hello' 't

As expected special words like `<PAD>` shouldn't have an effect on the explanation.

## Modify explanations with the `relu` and `counterfactual` arguments

In the last section we only saw the "positive" words in our input, what made the class score "go up". To "fix" this and see the "negative" words too, we can pass two boolean arguments.

`relu` (default `True`) only shows what makes the predicted score go up and discards the effect of counter-evidence or other classes in case of multiclass classification (set to `False` to disable). Under the hood, this discards negative gradients / negative pixels (which are likely to belong to other classes according to the Grad-CAM paper (https://arxiv.org/abs/1610.02391)).

In [76]:
eli5.show_prediction(binary_model, review, tokens=review_t, relu=False)

bidirectional_2


Green is positive, red is negative, white is neutral. We see how the input has conflicting sentiment and thus the model sensibly predicted a score close to 0.5.

And for the test sample

In [77]:
eli5.show_prediction(binary_model, test_review, tokens=test_review_t, relu=False)

bidirectional_2


We can see what made the network decide that is is a negative example.

Another argument `counterfactual` (default `False`) highlights the counter-evidence for a class, or what makes the score "go down" (set to `True` to enable). This is mentioned in the Grad-CAM paper (https://arxiv.org/abs/1610.02391).

In [16]:
eli5.show_prediction(binary_model, test_review, tokens=test_review_t, counterfactual=True)

This shows the "negative" words in green, i.e. inverts the classes.

What happens if we pass both `counterfactual` and `relu`?

In [17]:
eli5.show_prediction(binary_model, test_review, tokens=test_review_t, relu=False, counterfactual=True)

Notice how the colors (green and red) are inverted.

## Removing padding with `pad_value` or `pad_token` arguments

When working with text, often sample input is padded or truncated to a certain length, whether because the model only takes fixed-length input, or because we want to put all the samples in a batch.

We can remove padding by specifying the value used for the padding symbol. We can either specify `pad_value`, a numeric value such as `0` for `doc` input, or `pad_token`, the padding token such as `<PAD>` in `tokens`.

In [18]:
print(test_review_t)

[['<START>', 'please', 'give', 'this', 'one', 'a', 'miss', 'br', 'br', '<OOV>', '<OOV>', 'and', 'the', 'rest', 'of', 'the', 'cast', 'rendered', 'terrible', 'performances', 'the', 'show', 'is', 'flat', 'flat', 'flat', 'br', 'br', 'i', "don't", 'know', 'how', 'michael', 'madison', 'could', 'have', 'allowed', 'this', 'one', 'on', 'his', 'plate', 'he', 'almost', 'seemed', 'to', 'know', 'this', "wasn't", 'going', 'to', 'work', 'out', 'and', 'his', 'performance', 'was', 'quite', '<OOV>', 'so', 'all', 'you', 'madison', 'fans', 'give', 'this', 'a', 'miss', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PAD>', '<PA

Notice that the padding word used here is `<PAD>` and that it comes after the text.

In [19]:
eli5.show_prediction(binary_model, test_review, tokens=test_review_t, 
                    pad_token='<PAD>', relu=False)

Now the explanation is shorter. This is useful if the input has a lot of padding.

We can also pass padding as a number into our input `doc`.

In [21]:
print(test_review)

[[   1  591  202   14   31    6  717   10   10    2    2    5    4  360
     7    4  177 5760  394  354    4  123    9 1035 1035 1035   10   10
    13   92  124   89  488 7944  100   28 1668   14   31   23   27 7479
    29  220  468    8  124   14  286  170    8  157   46    5   27  239
    16  179    2   38   32   25 7944  451  202   14    6  717    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0]]


Notice the number used for padding is `0`.

In [22]:
eli5.show_prediction(binary_model, test_review, tokens=test_review_t, 
                     pad_value=0, relu=False)

Let's try our pre-padded sample

In [23]:
print(review_t_padded)
eli5.show_prediction(binary_model, review_padded, tokens=review_t_padded, 
                     relu=False, pad_token='<PAD>')

[['<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
  '<PAD>' '<PAD>' '<START>' 'hello' 't

Useful!

## Explaining multiclass model predictions

In multiple classification tasks a piece of text is classified into a single class (we still have only one predicted label) from a number of classes (not just one as in binary classification).

In this tutorial we have a multiclass model trained on the US consumer finanial complaints dataset (https://www.kaggle.com/cfpb/us-consumer-finance-complaints). We have used character-level tokenization and a convolutional network that takes fixed-length input. For this model the output will be a vector (since we have many classes). The entry with the highest value will be the "predicted" class.

For full details of how we trained the model and the data check https://www.kaggle.com/tobalt/keras-text-model-multiclass or the `tests/estimators/keras_multiclass_text_classifier/keras_multiclass_text_classifier.ipynb` file in the ELI5 repo.

In [19]:
import tests.estimators.keras_multiclass_text_classifier.keras_multiclass_text_classifier \
    as keras_multiclass_text_classifier

Load the model

In [20]:
multicls_model = keras.models.load_model(keras_multiclass_text_classifier.MODEL)
multicls_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 3193, 8)           816       
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 3179, 128)         15488     
_________________________________________________________________
dropout_1 (Dropout)          (None, 3179, 128)         0         
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1589, 128)         0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 1580, 128)         163968    
_________________________________________________________________
dropout_2 (Dropout)          (None, 1580, 128)         0         
_________________________________________________________________
average_pooling1d_1 (Average (None, 790, 128)          0         
__________

In [21]:
(x_train, x_test), (y_train, y_test) = keras_multiclass_text_classifier.prepare_train_test_dataset()

Again check the metrics.

In [22]:
print(multicls_model.metrics_names)
loss, acc = multicls_model.evaluate(x_test, y_test)
print(loss, acc)

print('Accuracy:', acc)

['loss', 'acc']
0.6319513120651246 0.7999999990463257
Accuracy: 0.7999999990463257


Let's see the possible classes that consumer complaint narratives can fall into

In [23]:
keras_multiclass_text_classifier.labels_index

{'Debt collection': 0,
 'Consumer Loan': 1,
 'Mortgage': 2,
 'Credit card': 3,
 'Credit reporting': 4,
 'Student loan': 5,
 'Bank account or service': 6,
 'Payday loan': 7,
 'Money transfers': 8,
 'Other financial service': 9,
 'Prepaid card': 10}

Let's explain one of the test samples

In [24]:
test_complaint = x_test[0:1]  # we need to keep the batch dimension
test_complaint_t = keras_multiclass_text_classifier.vectorized_to_tokens(test_complaint)
s = keras_multiclass_text_classifier.tokens_to_string(test_complaint_t)

print(len(test_complaint[0]))
limit = 150  # the input is quite long so just print the beginning
print(test_complaint[0, :limit])
print(test_complaint_t[0, :limit])
print(s[0][:limit+800])

3193
[38 15 21  3  7  2 20  8  7  5  7 15  8  5 14  2 11  3  9 25  8 15  3 11
  2 15 14  5  8 16 11  2 11  8 16 17 14  4  5  7  3  6 17 11 14 18  2  4
  6  2 12  5 25  3  2  5  2 14  6  5  7  2 21  8  4 12  2 16  3  2 58  2
 13  3 11 19  8  4  3  2 16 18  2  7  3 25  3  9  2 12  5 25  8  7 22  2
 13  6  7  3  2 24 17 11  8  7  3 11 11  2 21  8  4 12  2  4 12  3 16  2
  6  9  2 12  5 25  8  7 22  2 24  3  3  7  2  7  6  4  8 20  8  3 13  2
  6 20  2 11  5  8]
['O' 'c' 'w' 'e' 'n' ' ' 'f' 'i' 'n' 'a' 'n' 'c' 'i' 'a' 'l' ' ' 's' 'e'
 'r' 'v' 'i' 'c' 'e' 's' ' ' 'c' 'l' 'a' 'i' 'm' 's' ' ' 's' 'i' 'm' 'u'
 'l' 't' 'a' 'n' 'e' 'o' 'u' 's' 'l' 'y' ' ' 't' 'o' ' ' 'h' 'a' 'v' 'e'
 ' ' 'a' ' ' 'l' 'o' 'a' 'n' ' ' 'w' 'i' 't' 'h' ' ' 'm' 'e' ' ' '(' ' '
 'd' 'e' 's' 'p' 'i' 't' 'e' ' ' 'm' 'y' ' ' 'n' 'e' 'v' 'e' 'r' ' ' 'h'
 'a' 'v' 'i' 'n' 'g' ' ' 'd' 'o' 'n' 'e' ' ' 'b' 'u' 's' 'i' 'n' 'e' 's'
 's' ' ' 'w' 'i' 't' 'h' ' ' 't' 'h' 'e' 'm' ' ' 'o' 'r' ' ' 'h' 'a' 'v'
 'i' 'n' 'g' ' ' 'b' 'e' '

Let's check what the model predicts (to which category the financial complaint belongs)

In [25]:
preds = multicls_model.predict(test_complaint)
print(preds)  # score for each class
y = np.argmax(preds)  # take the maximum class
print(y)
keras_multiclass_text_classifier.decode_output(y)

[[7.4966592e-03 9.7562626e-08 9.9250317e-01 9.1982411e-12 5.3569739e-08
  4.8417964e-10 9.6964792e-10 4.0114050e-09 5.9291594e-10 3.4063903e-13
  3.9474773e-19]]
2


'Mortgage'

And the ground truth

In [26]:
y_truth = y_test[0]
print(y_truth)
keras_multiclass_text_classifier.decode_output(y_truth)

[0 0 1 0 0 0 0 0 0 0 0]


'Mortgage'

Seems reasonable!

Now let's explain this prediction with ELI5.

In [58]:
eli5.show_prediction(multicls_model, test_complaint, tokens=test_complaint_t, pad_token='<PAD>')

Note that we do not set `relu` to `False` because then we would see other classes.

Our own example

In [59]:
# s = """first of all I should not be charged and debted for the private car loan"""
s = "mortgage interest and credit card"
complaint, complaint_t = keras_multiclass_text_classifier.string_to_vectorized(s)
print(complaint)
print(complaint_t[0, :50])  # note that this model requires fixed length input

[[16  6  9 ...  0  0  0]]
['m' 'o' 'r' 't' 'g' 'a' 'g' 'e' ' ' 'i' 'n' 't' 'e' 'r' 'e' 's' 't' ' '
 'a' 'n' 'd' ' ' 'c' 'r' 'e' 'd' 'i' 't' ' ' 'c' 'a' 'r' 'd' '<PAD>'
 '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>'
 '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>' '<PAD>']


In [71]:
preds = multicls_model.predict(complaint)
print(preds)
print(keras_multiclass_text_classifier.decode_output(preds))

eli5.show_prediction(multicls_model, complaint, tokens=complaint_t, pad_token='<PAD>', )

[[0.18202755 0.05491637 0.01783363 0.37470257 0.25161844 0.00713009
  0.05360746 0.01181439 0.01064195 0.00603918 0.02966846]]
Credit card
conv1d_3


In [30]:
# TODO: would be good to show predicted label

## Choosing a classification target to focus on via `targets`

In the last text we saw that it could be classified into more than just one category.

We can use ELI5 to "force" the network to classify the input into a certain category, and then highlight evidence for that category.

We use the `targets` argument for this. We pass a list that contains integer indices. Those indices represent a class in the final output layer.

Let's check two sensible categories

In [31]:
debt_idx = 0  # we get this from the labels index
loan_idx = 1

In [69]:
print('debt collection')
display(eli5.show_prediction(multicls_model, complaint, tokens=complaint_t, 
                             targets=[debt_idx], pad_token='<PAD>'))

print('consumer loan')
display(eli5.show_prediction(multicls_model, complaint, tokens=complaint_t, 
                             targets=[loan_idx], pad_token='<PAD>'))

debt collection
conv1d_3


consumer loan
conv1d_3


Sensible at least a little bit?

Note that we can use the IPython `display()` call to render HTML if it is not the last value in a call.

## Choosing a hidden layer to do Grad-CAM on with `layer`

Grad-CAM requires a hidden layer to do its calculations on and produce a heatmap. This is controlled by the `layer` argument. We can pass the layer (as an int index, string name, or a keras Layer instance) explicitly, or let ELI5 attempt to find a good layer for us automatically.

In [33]:
from keras.layers import (  # some of the layers we may want to check
    Embedding,
    Conv1D,
    MaxPool1D,
    AveragePooling1D,
    GlobalAveragePooling1D,
    Dense,
)

In [70]:
desired = (Embedding, Conv1D, MaxPool1D, AveragePooling1D, GlobalAveragePooling1D)

for layer in multicls_model.layers:
    print(layer.name, layer.output_shape)
    if isinstance(layer, desired):
        html = eli5.show_prediction(multicls_model, complaint, tokens=complaint_t, 
                                    layer=layer, pad_token='<PAD>')
        display(html)  # if using a loop we also need a display call

embedding_1 (None, 3193, 8)
embedding_1


conv1d_1 (None, 3179, 128)
conv1d_1


dropout_1 (None, 3179, 128)
max_pooling1d_1 (None, 1589, 128)
max_pooling1d_1


conv1d_2 (None, 1580, 128)
conv1d_2


dropout_2 (None, 1580, 128)
average_pooling1d_1 (None, 790, 128)
average_pooling1d_1


conv1d_3 (None, 786, 128)
conv1d_3


dropout_3 (None, 786, 128)
max_pooling1d_2 (None, 393, 128)
max_pooling1d_2


global_average_pooling1d_1 (None, 128)
global_average_pooling1d_1


dense_1 (None, 32)
dense_2 (None, 11)


Now this looks better. It should make sense for a Convolutional network that later layers pick up "higher level" information than earlier "lower level" layers. If you don't get good explanations from ELI5 out of the box, it may be worth looking into this parameter. We advice to pick layers that contain "spatial or temporal" information, i.e. NOT dense/fully-connected or merge layers, but recurrent, convolutional, or embedding layers.

Let's check the last two dense layers

In [35]:
for layer in multicls_model.layers[-2:]:
    print(layer.name)
    html = eli5.show_prediction(multicls_model, complaint, tokens=complaint_t, layer=layer)
    display(html)

dense_1


dense_2


What's up with the final dense layers? They do not have spatial information so it's mostly a visualization of the activations of each node, ignoring the underlying tokens. Hover over to see the actual values (though some parts seem bright green, they may not have a high weight - the color scale is "relative").

If we do not specify a `layer`, by default ELI5 searches through the flattened list of layers going backwards from the output layer.

## How it works - `explain_prediction()` and `format_as_html()`.

What we have seen so far is calls to `show_prediction()`. What this function actually does is call `explain_prediction()` to produce an `Explanation` object, and then passes that object to `format_as_html()` to produce highlighted HTML.

Let's check each of these steps

In [36]:
E = eli5.explain_prediction(binary_model, review, tokens=review_t)

This is an `Explanation` object

In [37]:
repr(E)

"Explanation(estimator='sequential_1', description='\\nGrad-CAM visualization for classification tasks; \\noutput is explanation object that contains a heatmap.\\n', error='', method='Grad-CAM', is_regression=False, targets=[TargetExplanation(target=0, feature_weights=None, proba=None, score=0.5912496, weighted_spans=WeightedSpans(docs_weighted_spans=[DocWeightedSpans(document='<START> hello this is great but not so great', spans=[('<START>', [(0, 7)], 0.01457466185092926), ('hello', [(8, 13)], 0.0), ('this', [(14, 18)], 0.0030298803467303497), ('is', [(19, 21)], 0.0), ('great', [(22, 27)], 0.002680395031347871), ('but', [(28, 31)], 0.034321276005357504), ('not', [(32, 35)], 0.01620902307331562), ('so', [(36, 38)], 0.034717236703727394), ('great', [(39, 44)], 0.0)], preserve_density=None, vec_name=None)], other=None), heatmap=array([0.01457466, 0.        , 0.00302988, 0.        , 0.0026804 ,\n       0.03432128, 0.01620902, 0.03471724, 0.        ]))], feature_importances=None, decision_

We can check the name of the hidden layer that was used for producing the heatmap

In [38]:
E.layer

'bidirectional_3'

We can get the predicted class and the value for the prediction

In [39]:
target = E.targets[0]
print(target.target, target.score)

0 0.5912496


We can also check the produced Grad-CAM `heatmap` found on each item in `targets`. You can think of this as an array of "importances" for tokens (after padding is removed and the heatmap is resized).

In [40]:
heatmap = target.heatmap
print(heatmap)
print(len(heatmap))

[0.01457466 0.         0.00302988 0.         0.0026804  0.03432128
 0.01620902 0.03471724 0.        ]
9


The highlighting for each token is stored in a `WeightedSpans` object (specifically the `DocWeightedSpans` object)

In [41]:
weighted_spans = target.weighted_spans
print(weighted_spans)

doc_ws = weighted_spans.docs_weighted_spans[0]
print(doc_ws)

WeightedSpans(docs_weighted_spans=[DocWeightedSpans(document='<START> hello this is great but not so great', spans=[('<START>', [(0, 7)], 0.01457466185092926), ('hello', [(8, 13)], 0.0), ('this', [(14, 18)], 0.0030298803467303497), ('is', [(19, 21)], 0.0), ('great', [(22, 27)], 0.002680395031347871), ('but', [(28, 31)], 0.034321276005357504), ('not', [(32, 35)], 0.01620902307331562), ('so', [(36, 38)], 0.034717236703727394), ('great', [(39, 44)], 0.0)], preserve_density=None, vec_name=None)], other=None)
DocWeightedSpans(document='<START> hello this is great but not so great', spans=[('<START>', [(0, 7)], 0.01457466185092926), ('hello', [(8, 13)], 0.0), ('this', [(14, 18)], 0.0030298803467303497), ('is', [(19, 21)], 0.0), ('great', [(22, 27)], 0.002680395031347871), ('but', [(28, 31)], 0.034321276005357504), ('not', [(32, 35)], 0.01620902307331562), ('so', [(36, 38)], 0.034717236703727394), ('great', [(39, 44)], 0.0)], preserve_density=None, vec_name=None)


Observe the `document` attribute and `spans`

In [42]:
print(doc_ws.document)
print(doc_ws.spans)

<START> hello this is great but not so great
[('<START>', [(0, 7)], 0.01457466185092926), ('hello', [(8, 13)], 0.0), ('this', [(14, 18)], 0.0030298803467303497), ('is', [(19, 21)], 0.0), ('great', [(22, 27)], 0.002680395031347871), ('but', [(28, 31)], 0.034321276005357504), ('not', [(32, 35)], 0.01620902307331562), ('so', [(36, 38)], 0.034717236703727394), ('great', [(39, 44)], 0.0)]


The `document` is the "stringified" version of `tokens`. If you have a custom "tokens -> string" algorithm you may want to set this attribute yourself.

The `spans` object is a list of weights for each character in `document`. We use the indices in `document` string to indicate which characters should be weighted with a specific value.

Let's format this. HTML formatter is what should be used here.

In [43]:
import eli5.formatters.fields as fields
F = eli5.format_as_html(E, show=fields.WEIGHTS)

We pass a `show` argument to not display the method name or its description ("Grad-CAM"). See `eli5.format_as_html()` for a list of all supported arguments.

The output is an HTML-encoded string.

In [44]:
repr(F)

'\'\\n    <style>\\n    table.eli5-weights tr:hover {\\n        filter: brightness(85%);\\n    }\\n</style>\\n\\n\\n\\n    \\n\\n    \\n\\n    \\n\\n    \\n\\n    \\n\\n    \\n\\n\\n    \\n\\n    \\n\\n    \\n\\n    \\n        \\n\\n    \\n\\n        \\n            \\n                \\n                \\n            \\n        \\n\\n        \\n\\n\\n    <p style="margin-bottom: 2.5em; margin-top:-0.5em;">\\n        <span style="background-color: hsl(120, 100.00%, 78.21%); opacity: 0.88" title="0.015">&lt;START&gt;</span><span style="opacity: 0.80"> hello </span><span style="background-color: hsl(120, 100.00%, 92.74%); opacity: 0.82" title="0.003">this</span><span style="opacity: 0.80"> is </span><span style="background-color: hsl(120, 100.00%, 95.90%); opacity: 0.81" title="0.001">great</span><span style="opacity: 0.80"> </span><span style="background-color: hsl(120, 100.00%, 60.32%); opacity: 1.00" title="0.034">but</span><span style="opacity: 0.80"> </span><span style="background-co

Convert the string to an HTML object and display it in an IPython notebook

In [45]:
display(HTML(F))

## Notes on results

In general, this is experimental work. Unlike for images, there is not much talk about Grad-CAM applied to text.

`layer` is probably a very important argument as we currently use basic heuristics to pick a suitable layer. Thus explanations may not look as good for your own model.

### Multi-label classification

Did not really work for us. Got non-sensical explanations. Send comment if can do it.