# Assignment 3: Explainable AI tools

The goal of this assignment is to learn how to generate explanations for machine learning model predictions. We will work with the German Credit dataset that we used in the previous assignment, and consider two popular explanation mechanisms-- LIME and SHAP-- for a couple of machine learning models.

This notebook has four stages in which we will: 
1. Generate explanations for a learned logistic regression classifier using LIME.
2. Generate explanations for the same classifier using SHAP.
3. Compare the explanations generated by LIME and SHAP.
4. Explore LIME and SHAP for another classifier. 

In [None]:
!pip install shap==0.39.0
!pip install lime

In [None]:
%matplotlib inline

import sklearn
import sklearn.model_selection
import sklearn.metrics
import sklearn.datasets
import sklearn.ensemble
import sklearn.preprocessing
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split, KFold, cross_val_score

from IPython.display import Markdown, display, Image
import matplotlib.pyplot as plt
import warnings

import lime
import lime.lime_tabular
from lime import submodular_pick
import shap
shap.initjs()

from IPython.core.display import HTML 
import operator
from collections import defaultdict

import json
from collections import OrderedDict

import numpy as np
import pandas as pd

np.random.seed(1)

### Dataset

As in the previous assignments, we will work with the German Credit dataset, which is one of the most popular datasets in the XAI literature. (More information about the dataset can be found here: https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data).

The task for this dataset is to predict whether an individual is a good credit risk or a bad credit risk.

In [None]:
cols = ['status', 'duration', 'credit_hist', 'purpose', 'credit_amt', 'savings', 'employment',\
            'install_rate', 'personal_status', 'debtors', 'residence', 'property', 'age', 'install_plans',\
            'housing', 'num_credits', 'job', 'num_liable', 'telephone', 'foreign_worker', 'credit']
data_df = pd.read_table('german.data', names=cols, sep=" ", index_col=False)
data_df['credit'] = data_df['credit'].replace(2, 0) #1 = Good, 2= Bad credit risk
y = data_df['credit']

print("Shape: ", data_df.shape)
data_df.head()

In [None]:
# Get a list of feature names (excluding the outcome variable)
feature_names = data_df.columns[:-1]

### Data preprocessing

In this assignment, we will utilize sklearn's built-in encoders for data preprocessing:

In [None]:
# Label encoding
labels = data_df.iloc[:,-1]
le= sklearn.preprocessing.LabelEncoder()
le.fit(labels)
labels = le.transform(labels)
class_names = le.classes_
data = data_df.iloc[:,:-1]
le_label_mapping = dict(zip(le.classes_, le.transform(le.classes_)))
print("Class names: ", class_names)
print("Label mapping: ", le_label_mapping)

In [None]:
# Check if there are categorical varibles that we need to make dummies for
print(data.dtypes)

# Get a list of which variables are categorical
categorical_features  = [i for i in range(len(data.dtypes)) if data.dtypes[i]=='object']
print("Indices of categorical features: ", categorical_features)

Our explainers will require us to provide categorical features as a single column, not as dummies, so we cannot just explode these columns using one-hot encoding the way we normally would during pre-processing.

Instead, we will use some sklearn tools to take the following steps:
<ul>
<li>Encode the existing feature values with a number corresponding to each feature value.
<li>Make a dictionary storing the relationship between the original string feature value and the number we have replaced it with (categorical_names).
<li>Make a function that we can use later to transform categorical features into dummies.
</ul>

In [None]:
categorical_names = {}
for feature in categorical_features:
    print("Feature: ", feature)
    # Use label encoder to map categories to numbers
    le = sklearn.preprocessing.LabelEncoder()
    le.fit(data.iloc[:, feature])
    # Replace the feature values with corresponding numbers in the original data
    data.iloc[:, feature] = le.transform(data.iloc[:, feature])
    # Store and print the mappings for reference later
    categorical_names[feature] = le.classes_
    print(categorical_names[feature])
    print("==================================================")
    
# This variable stores the original names of each feature value for each feature
categorical_names

In [None]:
# We will use the encoder function to transform the categorical columns into dummies-- 
# but we cannot do that to the original dataset if we want to use LIME
encoder = ColumnTransformer(transformers=[('get_dummies', OneHotEncoder(), categorical_features)], remainder='passthrough')
encoder = encoder.fit(data)

### Split the data into training and test subsets

In [None]:
train, test, labels_train, labels_test = sklearn.model_selection.train_test_split(data, labels, train_size=0.80, random_state=10)
print("Train shape: ", train.shape)
print("Test shape: ", test.shape)

### Fit the model and make predictions

In [None]:
model = sklearn.linear_model.LogisticRegression(penalty="l2", C=0.1)
model.fit(train, labels_train)

# model predictions
pred_labels_test = model.predict((test))
print("Test set accuracy: ", sklearn.metrics.accuracy_score(labels_test, pred_labels_test))

 The prediction function `predict_fn` below, takes an instance and outputs prediction probabilities predicted by the learned model.

In [None]:
predict_fn = lambda x: model.predict_proba(x).astype(float)

## Section 1: LIME Explanations
We went over LIME (Local Interpretable Model-agnostic Explanations) in class. In this section, we will utilize the lime package to generate local and global explanations for a learned model.

### Generate the LIME explainer

LIME can be used for structured (tabular), unstructured (text) and image data. In this assignment, we will explore LIME's tabular explainers, which require a training set (as seen in the first parameter in the code below). The training data is used to create perturbed instances with which the learned model is probed.

In [None]:
explainer = lime.lime_tabular.LimeTabularExplainer(train.values, 
                                                   feature_names=feature_names,
                                                   class_names=class_names,
                                                   categorical_features=categorical_features, 
                                                   categorical_names=categorical_names)

### LIME's local explanations

We will now explain a single test instance by using LIME's explainer created above. Select the test instance to explain by setting `idx_to_explain` below. You can also select the maximum number of features that you want to see in the  explanation by setting the `num_features` parameter.

In [None]:
idx_to_explain = # index of the instance
print('Actual class: ', labels_test[idx_to_explain])

# Get explanation
exp = explainer.explain_instance((test.values[idx_to_explain]), predict_fn, num_features=5)

### Visualize the explanation 
Now that the explanation is ready, let's visualize it. 

In [None]:
%matplotlib inline
fig = exp.as_pyplot_figure()

In [None]:
exp.show_in_notebook(show_all=True)

#### Q1. Explain the LIME output above. In this example, which features have the biggest impact?

**Your analysis here:**

You can also see the explanations as a list:

In [None]:
exp.as_list()

#### Q2. Select another test instance and explain its LIME prediction.Identify the features that have a positive and those that have a negative contribution toward the outcome.

**Your analysis here:**

### LIME' global explanations

As we just saw, explanations can vary a lot depending on what instance we pick. While this is great for explaining a single prediction, it makes it hard to give someone an intuition for "how the model makes decisions" in general. 

That is where the **Submodular picker** comes in. It picks useful, representative examples that together give a **global explanation** for the model's behavior. (We briefly discussed submodular picker in class).

The algorithm generates global explanations using submodular picker in the following steps:
<ul>
    <li>Compute explanations for each of the examples in the dataset.</li>
    <li>The <i>coverage</i> step that we discussed in class determine which features are important in explaining a lot of the predictions i.e., features that seem <i>globally</i> important. The idea is to identify instances that cover a lot of the globally important features. </li>
    <li>The above is done by greedily selecting an example where the top globally important feature is part of the local explanation for that one example's prediction.</li>
    <li>Continue selecting examples until we have covered as many of the globally important features as possible, constrained by the number of features that the user wants returned (num_exps_desired).</li>
    </ul>

You can read the details of how this is done in the paper. More details on the function can be found here (https://lime-ml.readthedocs.io/en/latest/lime.html#lime-submodular-pick-module).

In [None]:
sp_obj = submodular_pick.SubmodularPick(explainer, train.values, predict_fn, sample_size=10, 
                                        num_features=5, num_exps_desired=5)

The sp_obj object contains the `num_exps_desired` explanation objects chosen by the submodular pick algorithm. Let's take a look at it:

In [None]:
sp_obj.V

Now for each of the chosen instance, let's see if the model predicted the correct label and the explanations generated for each of the instance.

In [None]:
for ind in sp_obj.V:
    exp = explainer.explain_instance(test.values[ind], predict_fn, num_features=5)
    print("Actual class: ", labels_test[ind])
    exp.show_in_notebook(show_all=False)
    print("==========================")

#### Q3. Based on these chosen examples, what can you say about how the trained model makes decisions?

**Your analysis here:**

# Section 2: SHAP explanations

For the same learned model, let's explain the model predictions using the SHAP implementation (https://shap.readthedocs.io/en/latest/).

### Initialize the explainer

In [None]:
explainer = shap.LinearExplainer(model, train, feature_perturbation="interventional")
shap_values = explainer.shap_values(test)

...and apply to test data.

In [None]:
X_test_array = test.to_numpy() # We need to provide the data in dense format, not sparse
shap.summary_plot(shap_values, X_test_array, feature_names=feature_names)

#### SHAP's local explanations

For the same test data instance that you chose in Section 1 above (`idx_to_explain`), generate SHAP's explanations

In [None]:
shap.initjs()
print("Credit risk: Good" if labels_test[idx_to_explain] else "Credit risk: Bad")
print(test.iloc[idx_to_explain])
shap.force_plot(
    explainer.expected_value, shap_values[idx_to_explain,:], X_test_array[idx_to_explain,:],
    feature_names=feature_names
)

#### Q4. Explain the SHAP output above. Which features have the highest contribution toward the outcome?

**Your analysis here:**

We can also check the features that contribute to the positive and negative classes in the following manner. Check if what you obtain below is consistent with your observations in Q4.

In [None]:
# Checking features that contribute to the positive and negative classification of the instance
vals = shap_values[idx_to_explain,:]
positive_class_weight = defaultdict(float)
negative_class_weight = defaultdict(float)
feats = feature_names

for feat_i, val_i in zip(feats, vals):
  if val_i > 0:
    positive_class_weight[feat_i] += val_i
  elif val_i < 0:
    negative_class_weight[feat_i] += val_i

In [None]:
list(sorted(positive_class_weight.items(), key=operator.itemgetter(1), reverse=True))[:10]

In [None]:
list(sorted(negative_class_weight.items(), key=operator.itemgetter(1)))[:10]

#### Q5. Select another instance and explain its prediction using SHAP. Also, identify the features that have positive and those that have negative contribution toward the outcome. 

**Your analysis here:**

#### SHAP's Global explanations

We can use SHAP to identify features that the model deems important on the entire data by using the `shap.plots.bar()` function.



In [None]:
shap.plots.bar(shap_values)

#### Q6. Based on the above plot, what can you say about how the trained model makes decisions?

**Your analysis here:**

# Section 3: Comparing SHAP and LIME's explanations
In this section, you will compare the explanations generated by SHAP and LIME and comment on their agreement, ease of understanding and stability over different instances.

#### Q7. For the same instance (say, `idx_to_explain` above), how do the local explanations generated by SHAP compare to that of LIME?

**Your analysis here:**

#### Q8. Select another instance and compare the local explanations of LIME and SHAP. Do the explanations agree? Did you come across examples where the explanations did not agree with each other. In general, what can you comment about the stability of the explanations generated by the two methods?

**Your analysis here:**

#### Q9. How do the global explanations of SHAP compare with LIME's global explanations?

**Your analysis here:**

# Section 4: Generating explanations for a non-linear classifier

So far in this assignment, you have worked with logistic regression which is a linear classifier. In this section, we will explore SHAP and LIME for a non-linear model such as XGBoost, random forest classifiers or neural networks.  

#### Q10. Select a test instance to explain and compare the local explanations over its model predictions using SHAP and LIME. 

**Your analysis here:**

#### Q11. Compare the global explanations for this model as generated by SHAP and LIME. 

**Your analysis here:**

#### Q12. Provide a discussion of how well the two methods (SHAP and LIME) generalize to different models.

**Your analysis here:**

# Submitting this Assignment Notebook

Once complete, please submit your assignment notebook as an attachment under \"Assignments > Assignment 3\" on Brightspace. You can download a copy of your notebook using ```File > Download .ipynb```. Please ensure you submit the `.ipynb` file (and not a `.py` file)."