# Explainable Artificial Intelligence Sample Project
####  8th February 2020


### Background

Explainable Artificial Intelligence (XAI) refers to methods that aim at making Artificial Intelligence systems (AI) more transparent, such that their decisions can be understood by humans. It it the opposite of the concept of "black box", where even the AI developers can't explain why the AI system arrived at a specific decision. We can think if XAI as in implementation of the social right to explanation. 

In this notebook, we are going to use deep learning to classify a chosen text as positive, negative or neutral, on the basis of it's content. After the classification, we are going to use an method called LRP (Layer-wise Relevance Propagation) to explain why the deep learning system came to it's decision, making it an Explainable Artificial Intelligence system (XAI) instead of a black-box system.

The LRP implementation is based on the following papers:
- [https://doi.org/10.1371/journal.pone.0130140](https://doi.org/10.1371/journal.pone.0130140)
- [https://doi.org/10.18653/v1/W17-5221](https://doi.org/10.18653/v1/W17-5221)

# Define the text 
First, we are going to enter the text we want to classify, and make it compatible with the Neural Network. We must do a few imports. Make sure the LSTM_bibi.py and heatmap.py files are in the same folder as this file. 

In [1]:
from LSTM_bidi import *                      # the LSTM_bidi file containts functions for processing a text, performing 
                                             # sentiment analysis with deep learning and explaining the result with LRP
    
from heatmap import html_heatmap             # the heatmap file containt the functions for converting the relevances
                                             # obtained by a LRP to readable heatmaps

import codecs                                # codecs is a package used to code or decodes text to bytes
import numpy as np                           # NumPy is the fundamental package for array computing with Python
from IPython.display import display, HTML    # IPython.display is a public API for display tools in IPython

In [33]:
text = """ We will consider both types of predictors in a generic sense, trying to avoid whenever possible a priori restrictions to specific algorithms or mappings. The next Section Pixel-wise Decomposition as a General Concept will explain the basic approaches underlying the pixel-wise decomposition of classifiers. In Section Bag of Words models revisited, we will give a short recapitulation about Bag of Words features and kernel-based classifiers and summarize related work. Overview of the decomposition steps will discuss the decomposition of a kernel-based classifier into sums of scores over small regions of the image, and the projection down to single pixels.  """ 

In [34]:
text = text.lower()
text = text.replace(",", " ")
text = text.replace(".", " ")
text = text.split()
print(text)

['we', 'will', 'consider', 'both', 'types', 'of', 'predictors', 'in', 'a', 'generic', 'sense', 'trying', 'to', 'avoid', 'whenever', 'possible', 'a', 'priori', 'restrictions', 'to', 'specific', 'algorithms', 'or', 'mappings', 'the', 'next', 'section', 'pixel-wise', 'decomposition', 'as', 'a', 'general', 'concept', 'will', 'explain', 'the', 'basic', 'approaches', 'underlying', 'the', 'pixel-wise', 'decomposition', 'of', 'classifiers', 'in', 'section', 'bag', 'of', 'words', 'models', 'revisited', 'we', 'will', 'give', 'a', 'short', 'recapitulation', 'about', 'bag', 'of', 'words', 'features', 'and', 'kernel-based', 'classifiers', 'and', 'summarize', 'related', 'work', 'overview', 'of', 'the', 'decomposition', 'steps', 'will', 'discuss', 'the', 'decomposition', 'of', 'a', 'kernel-based', 'classifier', 'into', 'sums', 'of', 'scores', 'over', 'small', 'regions', 'of', 'the', 'image', 'and', 'the', 'projection', 'down', 'to', 'single', 'pixels']


In [35]:
print(text)

['we', 'will', 'consider', 'both', 'types', 'of', 'predictors', 'in', 'a', 'generic', 'sense', 'trying', 'to', 'avoid', 'whenever', 'possible', 'a', 'priori', 'restrictions', 'to', 'specific', 'algorithms', 'or', 'mappings', 'the', 'next', 'section', 'pixel-wise', 'decomposition', 'as', 'a', 'general', 'concept', 'will', 'explain', 'the', 'basic', 'approaches', 'underlying', 'the', 'pixel-wise', 'decomposition', 'of', 'classifiers', 'in', 'section', 'bag', 'of', 'words', 'models', 'revisited', 'we', 'will', 'give', 'a', 'short', 'recapitulation', 'about', 'bag', 'of', 'words', 'features', 'and', 'kernel-based', 'classifiers', 'and', 'summarize', 'related', 'work', 'overview', 'of', 'the', 'decomposition', 'steps', 'will', 'discuss', 'the', 'decomposition', 'of', 'a', 'kernel-based', 'classifier', 'into', 'sums', 'of', 'scores', 'over', 'small', 'regions', 'of', 'the', 'image', 'and', 'the', 'projection', 'down', 'to', 'single', 'pixels']


As the Neural Network is trained on the dataset Stanford Sentiment Treebank (SST), it only recognizes words that are in this list. Therefore, our text can not have words that are not in this dataset. We must define a function remove_invalid_words to remove all words in our text that are not found in the Stanford Sentiment Treebank. 

In [36]:
def remove_invalid_words(text):
    """Removes all words from text that are not in the Stanford Sentiment Treebank dataset"""
    net  = LSTM_bidi()         # load in the trained neural network
    words = text.copy()        # create a copy of the text
    for w in text:             # remove all words that are not in the Standord Sentiment Treebank
        if w not in net.voc:
            words.remove(w)
    return(words)

In [37]:
words = remove_invalid_words(text)

In [38]:
print(words)

['we', 'will', 'consider', 'both', 'types', 'of', 'in', 'a', 'generic', 'sense', 'trying', 'to', 'avoid', 'whenever', 'possible', 'a', 'to', 'specific', 'or', 'the', 'next', 'section', 'decomposition', 'as', 'a', 'general', 'concept', 'will', 'explain', 'the', 'basic', 'approaches', 'underlying', 'the', 'decomposition', 'of', 'in', 'section', 'bag', 'of', 'words', 'models', 'revisited', 'we', 'will', 'give', 'a', 'short', 'recapitulation', 'about', 'bag', 'of', 'words', 'features', 'and', 'and', 'related', 'work', 'overview', 'of', 'the', 'decomposition', 'steps', 'will', 'discuss', 'the', 'decomposition', 'of', 'a', 'into', 'sums', 'of', 'scores', 'over', 'small', 'of', 'the', 'image', 'and', 'the', 'projection', 'down', 'to', 'single']


# Perform the text classification with a Neural Network
Now we are going to classify our text with a Neural Network, more specificly a bidirectional Long short-term memory (LSTM). This is an artificial recurrent neural network (RNN) architecture, which are good at processing not only single data points but also entire sequences of data (such as text or video). It is therefore often used for Natural Language Processing (NLP).

First we are going to create the classes in which the text can be classied as. The sentiment classes are encoded the following way:  
**0 = Very negative, 1 = Negative, 2 = Neutral, 3 = Positive, 4 = Very positive**

Create a list called sentiment_coding that consists of the 5 elements with the text "Very negative", "Negative" etc. in the right order from 0-4. You can read how to create a list consisting of text-elements (called strings) here: https://www.w3schools.com/python/python_lists.asp

In [39]:
sentiment_coding = ["Very negative", "Negative", "Neutral", "Positive", "Very positive"]

We define a function *predict* which uses the Neural Network LSTM by calling the function LSTM_bidi(), which we imported in the beginning. We call the LSTM Neural Network *net*, and the network is already trained on the Stanford Sentiment Treebank (SST) dataset.

In [40]:
def predict(words):
    """Returns the classifier's predicted class"""
    net                 = LSTM_bidi()                                   # load trained LSTM model
    w_indices           = [net.voc.index(w) for w in words]             # convert input sentence to word IDs
    net.set_input(w_indices)                                            # set LSTM input sequence
    scores              = net.forward()                                 # classification prediction scores
    return np.argmax(scores)   

Predict the sentiment of your text (now called *words*) by calling the function "predict" defined in the previous box, and name the prediction *predicted_class*. 

In [41]:
predicted_class =  predict(words)                                                   # get predicted class
print(predicted_class)

0


Print out the predicted class of the text. 

In [42]:
print(sentiment_coding[predicted_class])

Very negative


# Explain the text classification by computing LRP relevances
Now we have used a neural network to get the classification/sentiment of our text. Then, we need to find out why the neural network came to this decision, and here we will use a method called Layer-wise Relevance Propagation. We will start by setting it's hyperparameters.

**Tuning**: Here we can tune hyperparameters *eps* or *bias_factor*. *eps* is a threshold value for how large the relevance for the words needs to be in order to be shown. If their relevance is below eps, they are set to zero. *bias_factor* is the size of the bias included in the calculation of the relevance. It is recommended to set this as 0, as this leads to that the total amount of relevance is conserved for each layer of the neural network.

In [57]:
# LRP hyperparameters:
eps                 = 0.001                                                  # small positive number
bias_factor         = 0.0                                                    # recommended value

Load in the trained Neural Network from the import LSTM_bidi() and call it *net*. 

In [58]:
net  = LSTM_bidi()         

The following code performs the LRP on the classification done by the neural network on you text. 

In [59]:
w_indices           = [net.voc.index(w) for w in words]                      # convert input sentence to word IDs
Rx, Rx_rev, R_rest  = net.lrp(w_indices, predicted_class, eps, bias_factor)  # perform LRP
R_words             = np.sum(Rx + Rx_rev, axis=1)                            # compute word-level LRP relevances
scores              = net.s.copy()                                           # classification prediction scores
print(scores)

[ 2.76898674  2.33646075  0.36425571 -1.42664486 -3.3979334 ]


Print out the predicted class of the text (0-4). 

In [60]:
#scores=scores.round()
#scores=abs(scores)

print(scores)

[ 2.76898674  2.33646075  0.36425571 -1.42664486 -3.3979334 ]


The following code prints out the relevance of each word in you text has in the classification. Words marked in red are words that contributed to the predicted class, while words marked in blue are words that had a negative contribution. 

In [49]:
print ("\nLRP heatmap:")    
display(HTML(html_heatmap(words, R_words)))                                    # display the heat map of relevances


LRP heatmap:


In [50]:
target_class = 1

w_indices           = [net.voc.index(w) for w in words]                      # convert input sentence to word IDs
Rx, Rx_rev, R_rest  = net.lrp(w_indices, target_class, eps, bias_factor)     # perform LRP
R_words             = np.sum(Rx + Rx_rev, axis=1)                            # compute word-level LRP relevances

scores              = net.s.copy()                                           # classification prediction scores
print ("\nLRP heatmap:")    
display(HTML(html_heatmap(words, R_words)))                                  # display the heat map of relevances


LRP heatmap:


**Future work**

**Task 1:** LRP can be used on more AI systems than sentiment analysis. Check out this demo, where you can use the same implementation of LRP on Handwriting Classification, Image Classification and Text Classification: https://lrpserver.hhi.fraunhofer.de/

**Task 2:** Use the LRP to identify important words, and then remove them to see how the prediction changes.

**Task 3:** Try other texts with different sentiment and contents. Try to fool the Neural Network with difficult texts, and then use the LRP to see what made it make a wrong prediction. 