# Lime: Explaining predictions 

### What is it ?

Local Interpretable Model-Agnostic Explanations (LIME) is a technique for explaining the decisions made by a machine learning model. The designers of LIME state a goal of identifying an interpretable model over the interpretable representation that is locally faithful to the classifier. The authors define their concept of interpretable representations as human understandable analogs for features used in real world models. With the LIME algorithm the authors hope to “explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model”.  This means that using LIME a simple model that is easy to understand is used to explain the predictions of a more complex model in a localized region. 

## How to use lime 

### Instalation 

LIME is disttributed as a pyhton package instalable with pip or can be downloaded directly from the projects [repository](https://github.com/marcotcr/lime) current versions of the library only support python 3 

~~~ bash
pip install lime 
~~~

### Basic Usage 

LIME has methods for explaining many different types of Model for this basic tutorial we will use the tool to explain the predictions of a random forest classifier. first we have to import LIME in the normal way 

~~~ python
import lime 

~~~

The first step in using LIME is initialising the LimeTextExplainer class with the class_names variable for the different classes the explainer will be identifying. LIME has many explainer methods that can be used to explain different models 


~~~ python 
lime_explainer = lime_text.LimeTextExplainer(class_names=["positive","negative"])
~~~

Once the Explainer is intialized it can be used to explain a prediction made by a model. For some models the data is pre-processed into vectorized sets (this is the case for some sklearn models) in these cases we may need to set up a pipline to get access to the unprocessed text data. The max number of features also needs to be apllied to the explainer.


~~~ python 
x = 42 # row of data to be classified and explained 
explaination = explainer.explain_instance(data[x], classfier, num_features=6)
~~~

The `explaination` object returned by `explain_instance` contains a linnear aproximation of the classifier for the provided data row. This can be used in a number of ways to comunicate the explaination to the user 


### Types of Explainer

As mentioned above there are several different LIME explainer classes used for different types of data. We have already seen `lime_text` , but the class also includes `lime_tabular` and `lime_images` for tabular and image data respectivly. 


###  `lime_images`

We can seperatly import the diferent lime components using the `from` import pattern. Here we import the lime images component.We will use an example from the lime documentation.

~~~ python 
from lime import lime_image
~~~

For the image explainer it is not nessasary to initialize the explainer with lables like the text explainer 

~~~ python 
explainer = lime_image.LimeImageExplainer()
~~~

Building an explaination for an image instance requires different parameters than the explainer for text models.
This example taken from the LIME documentation is explaining a model which idetifies cats and dogs in an image.
we can see that for this explainer we need to pass the image and the model in addition to the `top_labels` parameter which controls the number of labels the explainer will return. `hide_color` which is the colour for a super pixel to be disabled, this value can also be `None`. `num_samples` is the size of the neighborhood to learn the linear model.

~~~ python 
explanation = explainer.explain_instance(image, predict_fn, top_labels=5, hide_color=0, num_samples=1000)
~~~

Like the explain instance for `lime_text` this instance contains a linear model of the local being explained. However, the explaination produced has different methods for displaying the explaination includig mathods for masking and superimposing colour over the classified image. 


###  `lime_tabular`

The `lime_tabular` explainer is used to explain predictions on matrix data. This explainer differs from the `lime_text` and `lime_images` explainers in that it requires a training set in order to instantiate the explainer. we will use an example from the LIME documentation to demonstarte this class. 
first we import the class in the usual way.

~~~ python 
from lime import lime_tabular
~~~

Then we instatniate the explainer the first argument is the training data used to calculate statistics use in processing the model. Then we need to pass the list of feature names coresponding to the columns in the training data and a list of class names. The final argument `discretize_contious` if set to true wil discitize all non-catagorical features.  

~~~ python
explainer = lime_tabular.LimeTabularExplainer(train, feature_names=feature_names, class_names=target_names, discretize_continuous=True)
~~~


finally we create the explaination by calling the explain instace method on the test data and the model, here we also specify the number of features and the top labels to be used in the explaination. 

~~~ python 
explaination = explainer.explain_instance(test[x], model, num_features=2, top_labels=1)
~~~ 

### Explaining Instances 
As we have seen LIME has many methods for explaining different models, the real power of the library comes from the out put of the explainers that cn be presented to users in order to give them a better view of what decisions the model is making and why. All of the explainers that we have seen produce a LIME explaination object which has a number of methods for returning different types of data to the user. 

### `as_list`

This method returns the explaination as a python list of tuples with the feature and the wieght the actual values depend based on the particular explainer. the follwing example from the LIME documentation shows the data from a text explainer on a random forest model. 

~~~ python 

explaination.as_list()
#output 
[(u'Posting', -0.15748303818990594),
 (u'Host', -0.13220892468795911),
 (u'NNTP', -0.097422972255878093),
 (u'edu', -0.051080418945152584),
 (u'have', -0.010616558305370854),
 (u'There', -0.0099743822272458232)]

~~~

we see that a list of features and weights is produced that coresponds to the linear model of the localixed aproximation. This data can be futher processed or simly displayed to the user 

### `as_map`

this method is similar to as list but it returns a python map of the labels to a list of tuples 

~~~ python 
explaination.as_map()


~~~

### `as_pyplot_figure`

This method returns a pyplot barchart figure. This method requires matplotlib to be installed also this method will be ignored for regression explaination instances. The following example comes from the LIME documentation and displays the same data as the `as list` method above. 

~~~ python 
figure = explaination.as_pyplot_figure()
~~~

![images/pyplotLime.png](./images/pyplotLime.png)


we can see that the simple methods availiable in lime for quickly produce a graphical intepretation of the explainer that can be presented to users or used in interactive development.




### `as_html, save_to_file and show_in_notebook`  

Theas methods return the explaination as a html page for easy embeding in web apps. The `save_to_file` method saves the html representation to a file for later use. The `show_in_notebook` method displays the html in an ipython notebook  This method produces bar charts for the different explainations. The following exampls come from the LIME documentation  

~~~ python 
explaination.show_in_notebook(text=False)
~~~


![images/htmlLime.png](./images/htmllime.png)


The previous image is produced by a text explainer, with text explainers it is possible to generate a document with the classified text highlighted 

~~~ python
explaination.show_in_notebook(text=True)
~~~

![images/htmltextLime.png](./images/htmltextlime.png)



### `get_image_and_mask`

LIME image explainers inlclude methods to demonstrate the areas of an image that were used for the classification being explained. Thease methods are incredibly powerful for visually demonstrating how the selected image is being clasisified and can be used to quicly identify surios identifications and other problems. The folowing examples from the LIME documentation show some of the methods that can be used. 


The following example sets the `positive_only` value to true which 
which only displays positive results and `hide_rest` which applies a grey mask to the image only displaying the interesting area. 

~~~ python 
explanation.get_image_and_mask(240, positive_only=True, num_features=5, hide_rest=True)
~~~

![images/dogmasklime.png](./images/dogmasklime.png)


We can also display the same image with no mask on the background image. 

~~~ python 
explanation.get_image_and_mask(240, positive_only=True, num_features=5, hide_rest=False)
~~~

![images/dognomasklime.png](./images/dognomasklime.png)


Finaly we can also generate image that highlight the pro and con areas (or combinations therof) in red and greenby setting the `positive_only` parameter to false. 

~~~ python 

explanation.get_image_and_mask(240, positive_only=False, num_features=10, hide_rest=False)
~~~

![images/dogcatlime.png](./images/dogcatlime.png)

### Use Cases 

We have seen the power that the LIME library has to explain how models are coming to their conclusions and demonstrate this to users. There are many use cases for LIME from including an explaination of a classification to an end user of a product so they can have greater confiddence in the predictions made by the model to developers using LIME to identify issues with their code or assumtions that are negativly affecting their work. LIME is a usefull tool in any datascientists tool kit.  

In [5]:
#load data 
import pandas as pd
import numpy as np
import requests 
import tarfile
import os 
import gensim
import gensim.downloader
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

url = "https://www.cs.jhu.edu/~mdredze/datasets/sentiment/processed_acl.tar.gz"
filename = "processed_acl"
extension = ".tar.gz"

response = requests.get(url, allow_redirects=True)

print("http response:",response.status_code,response.reason)

with open(filename+extension,"wb") as file: 
    file.write(response.content)
    
tar = tarfile.open(filename+extension, "r:gz")
tar.extractall()
tar.close()
# # merge files using shell commands 
print(os.popen("cat " + filename+"/books/negative.review > mixed.txt").read())
print(os.popen("cat " + filename+"/books/positive.review >> mixed.txt").read())
# print(os.popen("cat " + filename+"/dvd/all.review >> mixed.txt").read())
# print(os.popen("cat " + filename+"/dvd/positive.review >> mixed.txt").read())



# parse data into array for word 2 vect 
dataArray = []
labelArray = []

with open("mixed.txt", "r") as datafile: 
    lines = datafile.readlines()
    for e in lines:
        lineList= []
        splitLine = e.split()
        label = splitLine.pop(-1)
        for i in splitLine:
            dictSplit = i.split(':')
            lineList.append(dictSplit[0])
        dataArray.insert(-1, lineList)
        labelArray.insert(-1, (label.split(':')[1]))

#download word2vec model          
word2vec = gensim.downloader.load('fasttext-wiki-news-subwords-300')


# create mean embedding 
embed = np.array([ np.mean([word2vec[w] for w in words if w in word2vec], axis=0) for words in dataArray])

X_train, X_test, y_train, y_test = train_test_split(embed, labelArray, test_size=0.2, random_state=42)

text_classifier = RandomForestClassifier(n_estimators=700, random_state=42)
text_classifier.fit(X_train, y_train)

predictions = text_classifier.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(y_test,predictions))
print(classification_report(y_test,predictions))
print(accuracy_score(y_test, predictions))



http response: 200 OK


[[164  35]
 [ 37 164]]
              precision    recall  f1-score   support

    negative       0.82      0.82      0.82       199
    positive       0.82      0.82      0.82       201

    accuracy                           0.82       400
   macro avg       0.82      0.82      0.82       400
weighted avg       0.82      0.82      0.82       400

0.82


In [None]:
#  dl tubspam data set 
# load into numpy array 
# create embedings on full text 
# use classifier to classify the dataset 


In [None]:
#Lime explainer 