[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/alexwolson/postdocbootcamp2023/blob/main/lab_3_2_interpretability.ipynb)

# UofT DSI-CARTE ML Bootcamp
#### July 30th, 2023
#### Interpretability - Lab 2, Day 3
#### Teaching team: Alex Olson, Nakul Upadhya, Shehnaz Islam
##### Lab author: Nakul Upadhya

Machine learning models have revolutionized numerous fields by delivering remarkable predictive capabilities. However, as these models become increasingly ubiquitous, the need to ensure fairness and interpretability has emerged as a critical concern. In this lab, we will show how you can analyze and interpret your models.


The main packages we will be using in this lab is `shap` and `captum` along with all the other packages we have previously used.



In [None]:
!pip install shap
!pip install captum
!pip install torch torchvision torchaudio
!pip install tqdm

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


## Data
For this analysis, we will be working with the COMPAS dataset, a dataset used by the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system. This is a popular tool used in the United States to assess the risk of recidivism (likelihood of re-offending) for individuals involved in the criminal justice system. ***The system is currently actively employed by judges and parole boards to make decisions about bail, sentencing, and parole.***

The COMPAS system has been criticized for exhibiting racial and gender biases. Many studies have suggested that the tool is more likely to label Black defendants as high-risk and White defendants as low-risk, even when controlling for other factors. This has raised concerns about fairness and potential discrimination in decision-making processes. For more information about COMPAS, I highly encourage reading the [Propublica article that broke the story](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing).

This highly controversial system and the data it uses perfectly demonstrates the need for interpretability in machine learning. Lets start by reading and preprocessing the data. Some of the code for preprocessing has been borrowed from [this notebook.](https://github.com/tsotne95/FairnessCompas/blob/master/Fairness_in_Classification_on_the_COMPAS_dataset.ipynb)

In [None]:
from sklearn.model_selection import train_test_split

df = pd.read_csv("https://raw.githubusercontent.com/tsotne95/FairnessCompas/master/compas-scores-two-years.csv")
print("Original Entries in dataset")
print(df.shape)

df = df.dropna(subset=["days_b_screening_arrest"]) # dropping missing vals
df = df[(df.days_b_screening_arrest <= 30) & (df.days_b_screening_arrest >= -30) & (df.is_recid != -1) & (df.c_charge_degree != 'O') & (df.score_text != 'N/A') ]
df.reset_index(inplace=True, drop=True) # renumber the rows from 0 again

## keeping relevant columns

df = df[['sex','age','race','juv_fel_count','juv_misd_count','juv_other_count',
         'priors_count','two_year_recid', 'c_charge_degree']]
df['sex'] = df['sex'].replace({
    'Male': 1,
    'Female': 0
})
df.rename({'c_charge_degree':'felony'}, axis=1, inplace=True)
df['felony'] = df['felony'].replace({
    "F": 1,
    "M": 0
})
race_df = pd.get_dummies(df[['race']])
race_df.columns = [x[5:] for x in race_df.columns]
no_race_df = df.drop(columns=['race'])
df = no_race_df.join(race_df)
df = df.dropna()
print("Entries in dataset after preprocessing")
print(df.shape)

X = df.drop(columns = ['two_year_recid'])
y = df['two_year_recid']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Print information about the final dataset
print(f"There are {X_train.shape[0]} training data points and {X_test.shape[0]} testing points")
print(f"There are {X_train.shape[1]} features in the dataset")

This data has already been pre-processed for us, so lets simply split it up into train and test datasets.

## Inherent Interpretability
When dealing with sensitive information, it is almost always recommended to use *inherently interpretable* (also known as glassbox) models. These are models whos decision making mechanisms are exposed to the user, therefore any systemic issues present in the model can be found much more easily. The most common glassbox models are Decision Trees and Linear/Logistic Regression.

With Logistic Regression, the weights assigned to each feature can be easily visuaized, allowing users to understand which features the model found important which features are less significant.

Decision trees also have a similar benefit as one can visualize the tree and simply read the decision splits in each node. This allows for a similar sense of feature importance. One downside of decision trees however is that this glassbox nature significantly goes down as you increase the tree depth.

Both of these models have both local interpretability (it is easy to explain why a single prediction was made) and global interpretability (one can explain patterns in decision mechanisms for all the points).

**Your Turn**
1. In the cell below, change the max_depth of the tree to a depth you think would be interpretable to a majority of stakeholders. Use the tree visualization code provided to gauge your own ability to interpret the model.
2. Looking at the generated tree, are there any significant issues or problems you notice with the prediction mechanisms?
3. If we were to deploy the trained model below, what potential ethical and legal concerns do you think we need to consider?

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

clf = DecisionTreeClassifier(max_depth = 2) # CHANGE TO YOUR OWN VALUE
clf.fit(X_train, y_train)

# Calculate and print testing accuracy
test_predictions = clf.predict(X_test)
accuracy = accuracy_score(y_test, test_predictions)
print(f"Testing Accuracy: {accuracy * 100:.2f}%")

In [None]:
from sklearn.tree import plot_tree
fig, ax = plt.subplots(figsize = (20,10))
plot_tree(clf,
          feature_names = X_test.columns,
          filled = True,
          ax = ax,
          impurity = False,
          class_names = ["False","True"],
          fontsize = 6
          );
## This tree is shaded based on the number of data points that fall into that node
## and the label of those datapoints. Ex. A dark blue node means that most of
## training datapoints that fall into that node did commit a crime within
## two years of release. Dark orange is the opposite.

## Post-Hoc Interpretability
In an ideal world, all problems could be solved via interpretable models. Unfortunately, these models are often sparse and may not have a significant amount of predictive power. In these cases, we may want to use *black-box* models whose internal works are unknown to the users and instead try and *estimate* what features are being used by the model. These explanations are called *post-hoc* explanations.

Before that, lets do one more pre-processing step. All the models following this will perform better when scaled.


In [None]:
# from sklearn.preprocessing import StandardScaler
# scaler = StandardScaler()
# X_train[X_train.columns] = scaler.fit_transform(X_train[X_train.columns])
# X_test[X_test.columns] = scaler.transform(X_test[X_test.columns])



### SHAP
One post-hoc interpretability tool is SHAP (Shapely Additive Values). SHAP provides feature importance by using methods from game theory to estimate the contribution of each feature towards the final prediction.

SHAP can provide a sense of both local (explaining a single prediction) and global (explaining general prediction trends) interpretability.

To start, we first need to train a model to explain. For this exercise, we will use a random forest.

In [None]:
from sklearn.ensemble import RandomForestClassifier
rf_clf = RandomForestClassifier(max_depth = 2)
rf_clf.fit(X_train, y_train)
accuracy = accuracy_score(y_test, rf_clf.predict(X_test)) * 100
print(f"Test Accuracy: {accuracy :.2f}%")

Now to start our explanation process. We start off by first creating a summary of our dataset (this is to make SHAP run faster) and creating our explainer object.

In [None]:
import shap
# rather than use the whole training set to estimate expected values, we summarize with
# a set of weighted kmeans, each weighted by the number of points they represent.
# this helps everything run faster
X_train_summary = shap.kmeans(X_train, 7)

# Create the shap explainer by passing in our model's predict function and
# the summarized training set
ex = shap.KernelExplainer(rf_clf.predict, X_train_summary, feature_names = X_train.columns)

# We are also only going to look at 100 points (to make it easier to visualize)
X_test_subset = X_test.sample(100, random_state = 42).reset_index(drop=True)

#### Local Explainability
Let's first look into the local explainability provided by SHAP by examining what contributes to the predictions of the first datapoint in our testing subset.

In [None]:
shap.initjs()
first_datapoint = X_test_subset.iloc[0]
single_point_shap_values = ex.shap_values(first_datapoint)
shap.force_plot(ex.expected_value, single_point_shap_values, X_test_subset.iloc[0])

In the plot above, feature values that increased the chance of the model predicting a repeat crime are in red and have arrows that point to the right (they provide a positive force) and feature values that detract from the probability of recidivism are in blue and point to the left. The larger the arrow, the larger the contribution.

**Your Turn**

* What features seem to have the most negative impact to the end prediction of the data point you chose? What about the one with the most positive impact? *YOUR ANSWER HERE*
* Choose a different data point and see if you see any similarities in the features used and their impact towards the end prediction. *YOUR ANSWER HERE*


#### Global Explanability
We can get a sense of global interpretability from SHAP by examining trends in the SHAP values across the variable values. To do this, we can generate a summary plot of the calculated values. NOTE: This may take a while.....

In [None]:
shap.initjs()
shap_values = ex.shap_values(X_test_subset)
shap.summary_plot(shap_values, X_test_subset)

The color of the point reflects the value of a given feature in a given data point. For example, a red point in `priors_count` means that the feature took a value of 1 (true) and a blue point means a value of 0 (false). The X-axis of this plot represents SHAP contribution (the estimated impact on the end model prediction). By examining the distribution of the feature values across the x-axis, we can find what features may heavily impact the end prediction

In [None]:
import plotly.express as px

abs_shap_values = np.abs(shap_values)
shap_sums = np.sum(abs_shap_values, axis=0)
importances = pd.DataFrame()
importances['feature'] = X_train.columns
importances['importance'] = shap_sums
importances.sort_values(by = 'importance', inplace =True, ascending = False)
importances = importances[importances['importance'] > 0]
px.bar(importances, x = 'feature', y = 'importance')

**Your Turn**

* What features seem important to the XGBoost model according to SHAP? *YOUR ANSWER HERE*
* Do these SHAP importances align with the Decision Tree Importances? *YOUR ANSWER HERE*
* Do you see any concerning trends?

#### Notes about SHAP
SHAP is an incredibly powerful tool to understand what your model may be doing, ***however it is only an estimate***. The SHAP value calculations only examine your model's behavior and do not dive into the internals of the model, therefore these values should not be taken at face value. Additionally, as you may have noticed in the plots above, SHAP values do not reflect interacting effects between features, something that most models do in fact use. This extends to other post-hoc interpretability methods as well.

As such, it is highly encouraged to use innately interpretable models whenever possible. For a more rigourous justification, please read Cynthia Rudin's paper on the subject after the lab.

Additionally, quoting the SHAP documentation:


> Predictive machine learning models like XGBoost become even more powerful when paired with interpretability tools like SHAP. These tools identify the most informative relationships between the input features and the predicted outcome, which is useful for explaining what the model is doing, getting stakeholder buy-in, and diagnosing potential problems. It is tempting to take this analysis one step further and assume that interpretation tools can also identify what features decision makers should manipulate if they want to change outcomes in the future. However, in [this article](https://shap.readthedocs.io/en/latest/example_notebooks/overviews/Be%20careful%20when%20interpreting%20predictive%20models%20in%20search%20of%20causal%C2%A0insights.html), we discuss how using predictive models to guide this kind of policy choice can often be misleading.

> *Eleanor Dillon, Jacob LaRiviere, Scott Lundberg, Jonathan Roth, and Vasilis Syrgkanis from Microsoft.*

### Captum for Neural Network Interpretability

Post-Hoc explanations can also be generated for Neural Networks. One common method for generating these is through leveraging the gradients present in the model and is aptly named Input x Gradient. By multiplying the input by the gradients of the neurons in the network, we can get a sense of the importance the network attributes to the input features. The `captum` package provides us with this functionality.


#### MLP Network
First lets create and train a basic classifier.

In [None]:
import torch
from torch import nn
from torch.utils.data import TensorDataset, DataLoader
from tqdm.notebook import trange
class CompasMLP(nn.Module):
    """
    three-layer network for the COMPAS dataset.
    """
    def __init__(self, input_dim, hidden_dim, batch_size, learning_rate, epochs):
        super(CompasMLP, self).__init__()

        # Parameters
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.epochs = epochs

        self.bn = nn.BatchNorm1d(num_features = input_dim) # scales input data

        # Define the forward pass layers
        self.forward_pass = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 1)
        )

    def forward(self, x):
        """
        Perform the forward pass.
        """
        return self.forward_pass(self.bn(x))

    def fit(self, X, y):
        """
        Train the model.
        """
        self.train()

        # Create tensors
        X_tensor = torch.Tensor(X.values).float()
        Y_tensor = torch.Tensor(y.values).float()

        # Create DataLoader
        train_dataset = TensorDataset(X_tensor, Y_tensor)
        train_loader = DataLoader(dataset=train_dataset, batch_size=self.batch_size, shuffle=True)

        # Define loss function and optimizer
        criterion = nn.BCEWithLogitsLoss()
        optimizer = torch.optim.SGD(self.parameters(), lr=self.learning_rate)
        pbar = trange(self.epochs)
        # Training loop
        for epoch in pbar:
          losses = []
          for batch_idx, (features, target) in enumerate(train_loader):
              optimizer.zero_grad()  # reset gradients
              outputs = self.forward(features)  # forward pass
              loss = criterion(torch.squeeze(outputs), torch.squeeze(target))  # calculate loss
              loss.backward()  # backpropagation
              optimizer.step()  # update weights
              losses.append(loss.item())
    def predict(self, X):
        """
        Predict the class of the input data X.
        """
        self.eval()  # switch to evaluation mode
        with torch.no_grad():
            X_tensor = torch.Tensor(X.values)
            y_pred = torch.sigmoid(self.forward(X_tensor))  # apply sigmoid for binary output
            y_pred = torch.round(y_pred).squeeze().numpy()  # round to nearest integer (0 or 1) and convert to numpy array
        return y_pred

In [None]:
mlp_clf = CompasMLP(X_train.shape[1],
                    64,
                    512,
                    1e-3,
                    1000)
mlp_clf.fit(X_train, y_train)

y_pred = mlp_clf.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)

print(f"Testing Accuracy: {test_accuracy * 100:.2f}%")

#### Using Captum
So now lets calculate the model explanation for the first datapoint in the set.
We can simply call the InputXGradient explainer on any datapoint

In [None]:
from captum.attr import InputXGradient

mlp_clf.eval()

# Pass our model into the explainer
input_x_grad = InputXGradient(mlp_clf)

# get the first datapoint
datapoint = X_test_subset.iloc[0]
input = torch.Tensor(datapoint.values).unsqueeze(0)
input.requires_grad = True

# Get our prediction
prediction = mlp_clf(input)

# Get the attribution
attribution = input_x_grad.attribute(input).detach().numpy()[0]
print(f"Prediction: {np.round(torch.sigmoid(prediction).item())}")
for i in range(len(datapoint)):
  print(f"{datapoint.index[i]} = {datapoint[i]} ===> {attribution[i]:.2f}")

In this explanation, a negative value means that the value of the datapoint decreased the models prediction and made it less likely to predict recidivism. A positive value is the opposite.

**Your Turn**
1. For the datapoint, which features were used and did the features align with the SHAP values for the same point above?


Input x Gradient provides local interpretability. If we want to get a sense of global interpretability and global feature importance, we can pass in multiple values at once and take the absolute sum of the columns.




In [None]:
datapoints = X_test_subset
input = torch.Tensor(datapoints.values)
input.requires_grad = True

attribution = input_x_grad.attribute(input).detach().numpy()
average_attr = np.sum(np.abs(attribution), axis =0)

importances = pd.DataFrame()
importances['feature'] = X_test_subset.columns
importances['importance'] = average_attr
importances.sort_values(by = 'importance', inplace =True, ascending = False)
importances = importances[importances['importance'] > 0]
px.bar(importances, x = 'feature', y = 'importance')

**Your Turn**
1. Now that we have explained all the models. Which model and explanation do you trust the most? How did you decide this?