# Explain Machine Learning Models


### Outline

* [Understanding model predictions with SHAP](#SHAP)
* [Understanding model predictions with LIME](#LIME)
* [Understanding models with TED](#TED)


<img src="http://aix360.mybluemix.net/static/images/methods-choice.gif" alt="Tree" style="width: 1000px;" align="left"/>

## Install and import packages

In [None]:
#!pip install numpy --upgrade
!pip install aif360
!pip install aix360

In [None]:
from __future__ import print_function
import warnings
warnings.filterwarnings('ignore')

#import os
#import requests
#import pandas as pd
import numpy as np
#import matplotlib.pyplot as plt
import sklearn
#from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
#import sklearn.datasets
#import sklearn.ensemble
from sklearn import svm     
import time
#np.random.seed(1)

from aif360.metrics import BinaryLabelDatasetMetric
from aif360.datasets.lime_encoder import LimeEncoder
from aif360.datasets import GermanDataset
from aif360.algorithms.preprocessing import Reweighing

#from aix360.algorithms.protodash import ProtodashExplainer
from aix360.algorithms.ted.TED_Cartesian import TED_CartesianExplainer
from aix360.algorithms.shap import KernelExplainer
#from aix360.datasets.cdc_dataset import CDCDataset
from aix360.datasets.ted_dataset import TEDDataset

import lime
import lime.lime_tabular

import shap

from IPython.display import Markdown, display
%matplotlib inline

<a class="anchor" id="SHAP"></a>
# Understanding model predictions with SHAP

[SHAP](https://github.com/slundberg/shap)

<a class="anchor" id="explain-single-shap"></a>
## Explain a single prediction

A simple example with a K nearest neighbors ([knn](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)) classification of the IRIS dataset based on the [original SHAP tutorial](https://slundberg.github.io/shap/notebooks/Iris%20classification%20with%20scikit-learn.html).

Learn more about SHAP in [this chapter](https://christophm.github.io/interpretable-ml-book/shap.html#shap-summary-plot) in the Interpretable Machine Learning by Christoph Molnar.

The goal of SHAP is to explain the prediction of an instance x by computing the contribution of each feature to the prediction. Features with large absolute Shapley values are important. 

In [None]:
shap.datasets.iris()

In [None]:
X_train,X_test,Y_train,Y_test = train_test_split(*shap.datasets.iris(), test_size=0.2, random_state=0)

In [None]:
X_train.head()

In [None]:
Y_train

In [None]:
def print_accuracy(f):
    print("Accuracy = {0}%".format(100*np.sum(f(X_test) == Y_test)/len(Y_test)))
    time.sleep(0.5) # to let the print get out before any progress bars

shap.initjs()

n_neighbors = 5   # default=5
weights='uniform'  # 'uniform' or 'distance'
knn = sklearn.neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
knn.fit(X_train, Y_train)

print_accuracy(knn.predict)

In [None]:
# probability estimates
knn.predict_proba(X_train)

In [None]:
shapexplainer = KernelExplainer(knn.predict_proba, X_train)

In [None]:
# aix360 style for explaining input instances
shap_values = shapexplainer.explain_instance(X_test.iloc[0,:])

In [None]:
shap_values

In [None]:
X_test.iloc[0,:]

### The individual force plot

Red/blue: Features that push the prediction higher (to the right) are shown in red, and those pushing the prediction lower are in blue.

The plot is centered on the x-axis at explainer.expected_value. All SHAP values are relative to the model's expected value like a linear model's effects are relative to the intercept.

In [None]:
shapexplainer.explainer.expected_value[0]

In [None]:
shap_values[0]

In [None]:
shap.force_plot(shapexplainer.explainer.expected_value[0], shap_values[0], X_test.iloc[0,:])

In [None]:
X_test.iloc[23,:]

In [None]:
shap_values = shapexplainer.explain_instance(X_test.iloc[23,:])
shap.force_plot(shapexplainer.explainer.expected_value[0], shap_values[0], X_test.iloc[23,:])

<a class="anchor" id="explain-all-shap"></a>
## Explain all predictions

In [None]:
shap_values_all = shapexplainer.explain_instance(X_test)
shap_values_all

In [None]:
shap.summary_plot(shap_values_all, X_test, plot_type="bar")

In [None]:
# aix360 style for explaining input instances
shap_values = shapexplainer.explain_instance(X_test)
shap.force_plot(shapexplainer.explainer.expected_value[0], shap_values[0], X_test)

<a class="anchor" id="LIME"></a>
# Understanding model predictions with LIME

Local Interpretable Model-Agnostic Explanations

[LIME](https://lime-ml.readthedocs.io/en/latest/)

In [None]:
aif360_location = !python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())"
import os
install_loc = os.path.join(aif360_location[0], "aif360/data/raw/german/")
%cd $install_loc

In [None]:
!wget ftp://ftp.ics.uci.edu/pub/machine-learning-databases/statlog/german/german.data
!wget ftp://ftp.ics.uci.edu/pub/machine-learning-databases/statlog/german/german.doc
%cd -

In [None]:
dataset_german = GermanDataset(protected_attribute_names=['age'],
                    privileged_classes=[lambda x: x >= 25],      
                    features_to_drop=['personal_status', 'sex']) 

dataset_german_train, dataset_german_test = dataset_german.split([0.7], shuffle=True)

privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

In [None]:
# scale data
scale_german = StandardScaler().fit(dataset_german_train.features)

X_train = scale_german.transform(dataset_german_train.features)
y_train = dataset_german_train.labels.ravel()
w_train = dataset_german_train.instance_weights.ravel()

X_test = scale_german.transform(dataset_german_test.features)
y_test = dataset_german_test.labels.ravel()
w_test = dataset_german_test.instance_weights.ravel()

In [None]:
# reweigh the data
RW = Reweighing(unprivileged_groups=unprivileged_groups,
               privileged_groups=privileged_groups)

# compute the weights for reweighing the dataset
RW.fit(dataset_german_train)

# transform the dataset to a new dataset based on the estimated transformation
dataset_transf_train = RW.transform(dataset_german_train)
dataset_transf_test = RW.transform(dataset_german_test)

scale_transf = StandardScaler().fit(dataset_transf_train.features)

X_train_transf = scale_transf.transform(dataset_transf_train.features)
y_train_transf = dataset_transf_train.labels.ravel()
w_train_transf = dataset_transf_train.instance_weights.ravel()

X_test_transf = scale_transf.transform(dataset_transf_test.features)
y_test_transf = dataset_transf_test.labels.ravel()
w_test_transf = dataset_transf_test.instance_weights.ravel()

In [None]:
metric_german_train = BinaryLabelDatasetMetric(dataset_german_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)


display(Markdown("#### BIAS METRICS"))
display(Markdown("#### Original training dataset"))
print("mean_difference = %f" % metric_german_train.mean_difference())
print("disparate_impact = %f" % metric_german_train.disparate_impact())

metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                         unprivileged_groups=unprivileged_groups,
                                         privileged_groups=privileged_groups)

display(Markdown("#### Reweighted training dataset"))
print("mean_difference = %f" % metric_transf_train.mean_difference())
print("disparate_impact = %f" % metric_transf_train.disparate_impact())

In [None]:
lmod_transf = LogisticRegression()

# train the model
lmod_transf.fit(X_train_transf, y_train_transf, 
         sample_weight=dataset_transf_train.instance_weights)

# calculate predicted labels
y_train_pred_transf = lmod_transf.predict(X_train_transf)

# assign positive class index
pos_ind_transf = np.where(lmod_transf.classes_ == dataset_transf_train.favorable_label)[0][0]

# add predicted labels to predictions dataset
dataset_transf_train_pred = dataset_transf_train.copy()
dataset_transf_train_pred.labels = y_train_pred_transf

In [None]:
limeData = LimeEncoder().fit(dataset_german_train)
s_train = limeData.transform(dataset_german_train.features)
s_test = limeData.transform(dataset_german_test.features)

scale = scale_transf
model = lmod_transf   

explainer = lime.lime_tabular.LimeTabularExplainer(s_train ,class_names=limeData.s_class_names, 
                                                   feature_names = limeData.s_feature_names,
                                                   categorical_features=limeData.s_categorical_features, 
                                                   categorical_names=limeData.s_categorical_names, 
                                                   kernel_width=3, verbose=False,discretize_continuous=True)

s_predict_fn = lambda x: model.predict_proba(scale.transform(limeData.inverse_transform(x)))

display(Markdown("#### Reweighted training dataset"))

i1 = 0
exp = explainer.explain_instance(s_test[i1], s_predict_fn, num_features=8)
exp.show_in_notebook(show_all=False)
print("        Actual label: " + str(dataset_german_test.labels[i1]))

In [None]:
i1 = 1
exp = explainer.explain_instance(s_test[i1], s_predict_fn, num_features=8)
exp.show_in_notebook(show_all=False)
print("        Actual label: " + str(dataset_german_test.labels[i1]))

In [None]:
i2 = 23
exp = explainer.explain_instance(s_test[i2], s_predict_fn, num_features=8)
exp.show_in_notebook(show_all=False)
print("        Actual label: " + str(dataset_german_test.labels[i2]))

<a class="anchor" id="TED"></a>
# Understanding models with TED

Most suited for use cases where matching explanations to the mental model of the explanation consumer is the highest priority; i.e., where the explanations are similar to what would be produced by a domain expert.

The **TED_CartesianExplainer** is an implementation of the algorithm in the AIES'19 paper by [Hind et al.](), that takes the Cartesian product of the label and explanation and creates a new label (YE) and uses this to train a (multiclass) classifier. 

This approach can use any classifier (passed as a parameter), as long as it complies with the fit/predict paradigm.

## Synthetic dataset

A synthetically generated [dataset](https://github.com/IBM/AIX360/blob/master/aix360/data/ted_data/Retention.csv) is used generated with this [code](https://github.com/IBM/AIX360/blob/master/aix360/data/ted_data/GenerateData.py) as part of aix360.

### Assigning labels

25 rules are created, for why a retention action is needed to reduce the chances of an employee choosing to leave a fictitious company. These rules are motivated by common scenarios, such as not getting a promotion in a while, not being paid competitively, receiving a disappointing evaluation, being a new employee in certain organizations with inherently high attrition, not having a salary that is consistent with positive evaluations, mid-career crisis, etc.   

Each of these 25 rules would result in the label "Yes"; i.e., the employee is a risk to leave the company. Because the rules capture the reason for the "Yes", we use the rule number as the explanation (E), which is required by the TED framework.

If none of the rules are satisfied, it means the employee is not a candidate for a retention action; i.e., a "No" label is assigned.  

### Dataset characteristics

10,000 fictious employees (X) are generated and the 26 (25 Yes + 1 No) rules are applied to produce Yes/No labels (Y), using these rules as explanations (E).  After applying these rules, the resulting dataset has the following characteristics:
- Yes (33.8%)
- No (66.2%)

In [None]:
# Decompose the dataset into X, Y, E     
X, Y, E = TEDDataset().load_file('Retention.csv')
print("X's shape:", X.shape)
print("Y's shape:", Y.shape)
print("E's shape:", E.shape)
print()

# set up train/test split
X_train, X_test, Y_train, Y_test, E_train, E_test = train_test_split(X, Y, E, test_size=0.20, random_state=0)
print("X_train shape:", X_train.shape, ", X_test shape:", X_test.shape)
print("Y_train shape:", Y_train.shape, ", Y_test shape:", Y_test.shape)
print("E_train shape:", E_train.shape, ", E_test shape:", E_test.shape)

In [None]:
# Create classifier and pass to TED_CartesianExplainer
estimator = svm.SVC(kernel='linear')
# estimator = DecisionTreeClassifier()
# estimator = RandomForestClassifier()
# estimator = AdaBoostClassifier()

ted = TED_CartesianExplainer(estimator)

In [None]:
print("Training the classifier")

ted.fit(X_train, Y_train, E_train)   # train classifier

<a class="anchor" id="ted1"></a>
## Explain a single prediction

The trained TED classifier is now ready for predictions with explanations.   We construct some raw feature vectors, created from the original dataset, and ask for a label (Y) prediction and its explanation (E).

In [None]:
# Create an instance level example 
X1 = [[1, 2, -11, -3, -2, -2,  22, 22]]

# correct answers:  Y:-10; E:13
Y1, E1 = ted.predict_explain(X1)
print("Predicting for feature vector:")
print(" ", X1[0])
print("\t\t      Predicted \tCorrect")
print("Label(Y)\t\t " + np.array2string(Y1[0]) + "\t\t   -10")
print("Explanation (E) \t " + np.array2string(E1[0]) + "\t\t   13")
print()

In [None]:
X2 = [[3, 1, -11, -2, -2, -2, 296, 0]]

## correct answers: Y:-11, E:25
Y2, E2 = ted.predict_explain(X2)
print("Predicting for feature vector:")
print(" ", X2[0])

print("\t\t      Predicted \tCorrect")
print("Label(Y)\t\t " + np.array2string(Y2[0]) + "\t\t   -11")
print("Explanation (E) \t " + np.array2string(E2[0]) + "\t\t   25")

### Create a more relevant human interface

TED_CaresianExplainer can produce the correct explanation for a feature vector, but simply producing "3" as an explanation is not sufficient in most uses. This section shows one way to implement the mapping of real explanations to the explanation IDs that TED requires. This is inspired by the [FICO reason codes](https://www.fico.com/en/latest-thinking/product-sheet/us-fico-score-reason-codes), which are explanations for a FICO credit score.  

In this case the explanations are text, but the same idea can be used to map explanation IDs to other formats, such as a file name containing an audio or video explanation.

In [None]:
Label_Strings =["IS", "Approved for"]
def labelToString(label) :
    if label == -10 :
        return "IS"
    else :
        return "IS NOT"

Explanation_Strings = [
    "Seeking Higher Salary in Org 1",
    "Promotion Lag, Org 1, Position 1",
    "Promotion Lag, Org 1, Position 2",
    "Promotion Lag, Org 1, Position 3",
    "Promotion Lag, Org 2, Position 1",
    "Promotion Lag, Org 2, Position 2",
    "Promotion Lag, Org 2, Position 3",
    "Promotion Lag, Org 3, Position 1",
    "Promotion Lag, Org 3, Position 2",
    "Promotion Lag, Org 3, Position 3",
    "New employee, Org 1, Position 1",
    "New employee, Org 1, Position 2",
    "New employee, Org 1, Position 3",
    "New employee, Org 2, Position 1",
    "New employee, Org 2, Position 2",
    "Disappointing evaluation, Org 1",
    "Disappointing evaluation, Org 2",
    "Compensation does not match evaluations, Med rating",
    "Compensation does not match evaluations, High rating",
    "Compensation does not match evaluations, Org 1, Med rating",
    "Compensation does not match evaluations, Org 2, Med rating",
    "Compensation does not match evaluations, Org 1, High rating",
    "Compensation does not match evaluations, Org 2, High rating",
    "Mid-career crisis, Org 1",
    "Mid-career crisis, Org 2",
    "Did not match any retention risk rules"]


print("Employee #1 " + labelToString(Y1[0]) + " a retention risk with explanation: " + Explanation_Strings[E1[0]])
print()
print("Employee #2 " + labelToString(Y2[0]) + " a retention risk with explanation: " + Explanation_Strings[E2[0]])

<a class="anchor" id="ted2"></a>
## Overall model accuracy

How well does TED_Cartesian do in predicting all test labels (Y) and explanations (E)?

The "score" method of TED_Cartesian calculates this. The accuracy of predicting the combined YE labels could be of interest to researchers who want to better understand the inner workings of TED_Cartesian.


In [None]:
YE_accuracy, Y_accuracy, E_accuracy = ted.score(X_test, Y_test, E_test)    # evaluate the classifier
print("Evaluating accuracy of TED-enhanced classifier on test data")
print(' Accuracy of predicting Y labels: %.2f%%' % (100*Y_accuracy))
print(' Accuracy of predicting explanations: %.2f%%' % (100*E_accuracy))
print(' Accuracy of predicting Y + explanations: %.2f%%' % (100*YE_accuracy))

* It is easy to use the TED_CartesianExplainer if you have a training dataset that contains explanations. The framework is general in that it can use any classification technique that follows the fit/predict paradigm, so that if you already have a favorite algorithm, you can use it with the TED framework.
* The main advantage of this algorithm is that the quality of the explanations produced are exactly the same quality as those that the algorithm is trained on.  Thus, if you teach (train) the system well with good training data and good explanations, you will get good explanations out in a language you should understand.
* The downside of this approach is that someone needs to create explanations. This should be straightforward when a domain expert is creating the initial training data: if they decide a loan should be rejected, they should know why, and if they do not, it may not be a good decision.
* However, this may be more of a challenge when a training dataset already exists without explanations and now someone needs to create the explanations.  The original person who did the labeling of decisions may no longer be available, so the explanations for the decisions may not be known.  In this case, we argue, the system is in a dangerous state.  Training data exists that no one understands why it is labeled in a certain way.   Asking the model to explain one of its predictions when no person can explain an instance in the training data does not seem consistent.
* Dealing with this situation is one of the open research problems that comes from the TED approach.