## Explaining Natural Language Processing (NLP) Models with CXPlain

First, we load a number of reviews from the Internet Movie Database (IMDB) dataset which we will use as a training dataset to attempt to recognise the sentiment 
expressed in a given movie review.

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
from cxplain.util.test_util import TestUtil

num_words = 1024
num_samples = 500
(x_train, y_train), (x_test, y_test) = TestUtil.get_imdb(word_dictionary_size=num_words,
                                                         num_subsamples=num_samples)


Next, we fit a review classification pipeline that first transforms the reviews into their term frequency–inverse document 
frequency (tf-idf) vector representation, and then fits a Random Forest classifier to these vector representations
of the training data.

In [2]:
from sklearn.pipeline import Pipeline
from cxplain.util.count_vectoriser import CountVectoriser
from sklearn.ensemble.forest import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfTransformer

explained_model = RandomForestClassifier(n_estimators=64, max_depth=5, random_state=1)

counter = CountVectoriser(num_words)
tfidf_transformer = TfidfTransformer()

explained_model = Pipeline([('counts', counter),
                            ('tfidf', tfidf_transformer),
                            ('model', explained_model)])
explained_model.fit(x_train, y_train);

After fitting the review classification pipeline, we wish to explain its decisions, i.e. what input features were most relevant
for a given pipeline prediction. To do so, we train a causal explanation (CXPlain) model that can learn to explain any
machine-learning model using the same training data. In practice, we have to define:
- `model_builder`: The type of model we want to use as our CXPlain model. In this case we are using a neural explanation model using
a recurrent neural network (RNN) structure. 
- `masking_operation`: The masking operaion used to remove a certain input feature from the set of available input features. In this case we are using word drop masking, i.e. removing a word from the input sequence entirely.
- `loss`: The loss function that we wish to use to measure the impact of removing a certain input feature from the set of available features. In most common use cases, this will be the mean squared error (MSE) for regression problems and the cross-entropy for classification problems.


In [3]:
from tensorflow.python.keras.losses import binary_crossentropy
from cxplain import RNNModelBuilder, WordDropMasking, CXPlain

model_builder = RNNModelBuilder(embedding_size=num_words, with_embedding=True,
                                num_layers=2, num_units=32, activation="relu", p_dropout=0.2, verbose=0,
                                batch_size=32, learning_rate=0.001, num_epochs=2, early_stopping_patience=128)
masking_operation = WordDropMasking()
loss = binary_crossentropy

Using this configuration, we now instantiate a CXPlain model and fit it to the same IMDB data that we used to fit the review classification pipeline model that we wish to explain.

We also pad the movie reviews to the same length prior to fitting the CXPlain model since variable length inputs
are currently not supported in CXPlain.

In [4]:
from tensorflow.python.keras.preprocessing.sequence import pad_sequences

explainer = CXPlain(explained_model, model_builder, masking_operation, loss)

prior_test_lengths = map(len, x_test)
x_train = pad_sequences(x_train, padding="post", truncating="post", dtype=int)
x_test = pad_sequences(x_test, padding="post", truncating="post", dtype=int, maxlen=x_train.shape[1])
explainer.fit(x_train, y_train);

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


We can then use this fitted CXPlain model to explain the predictions of the explained model on the held-out test samples. Note that the importance scores are normalised to sum to a value of 1 and each score therefore represents the relative importance of each respective input word.


(Although it would be possible, we do not request confidence intervals for the provided attributions in this example.)

In [5]:
attributions = explainer.explain(x_test)

We can now visualise the per-word attributions for a specific sample review from the test set using the `Plot` toolset available as part of CXPlain.

Note that we first have to convert our input data from word indices back to actual word strings using `TestUtils.imdb_dictionary_indidces_to_words()`.

In [6]:
from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt
from cxplain.visualisation.plot import Plot

plt.rcdefaults()

np.random.seed(909)
selected_index = np.random.randint(len(x_test))
selected_sample = x_test[selected_index]
importances = attributions[selected_index]
prior_length = prior_test_lengths[selected_index]

# Truncate to original review length prior to padding.
selected_sample = selected_sample[:prior_length]
importances = importances[:prior_length]
words = TestUtil.imdb_dictionary_indidces_to_words(selected_sample)

print(Plot.plot_attribution_nlp(words, importances))


<START> {0.000869052892085} <UNK> {0.00101536838338} watched {0.00123076280579} 8 {0.00117500138003} <UNK> {0.000994433648884} <UNK> {0.000954008253757} <UNK> {0.00104214868043} <UNK> {0.00127266219351} <UNK> {0.00109629391227} very {0.000929234724026} thought {0.000997570343316} <UNK> {0.00123121822253} and {0.00102737860288} very {0.0010709624039} well {0.000990054919384} done {0.00147655024193} movie {0.000871025433298} on {0.0014516452793} the {0.00144975376315} subject {0.00096174213104} of {0.00105090788566} the {0.00113586091902} death {0.00114432512783} <UNK> {0.0010641978588} <UNK> {0.000870429503266} more {0.00124163366854} <UNK> {0.000883863598574} and {0.00128600851167} <UNK> {0.00138137256727} than {0.000877433281858} it {0.00111598963849} <UNK> {0.00113100279123} 
