# Deep Learning
## Exercise 12 - Interpretability

In this notebook we will look at different interpretability techniques and principles. The code examples are based on Captum tutorials you can find at https://captum.ai/tutorials/.

### 1. Tabular Data

The first task uses feature vectors. We use datapoints about the survivors and victims of the Titanic crash to find out which properties predict a high likelihood of survival.

First we prepare the data set:

In [None]:
import pandas as pd
import torch
import numpy as np
from sklearn.model_selection import train_test_split

# For reproducibility
np.random.seed(12345)
torch.manual_seed(1)
df_train = pd.read_csv("https://huggingface.co/datasets/Javitron4257/Titanic-Dataset/resolve/main/titanic_train.csv")


# Convert categorical data to one-hot encoding
df_train = pd.concat([df_train,
                              pd.get_dummies(df_train['Sex']),
                              pd.get_dummies(df_train['Embarked'],prefix="Embark"),
                              pd.get_dummies(df_train['Pclass'],prefix="Class")], axis=1)



# Replace unknown age values with mean of the dataset
df_train["Age"] = df_train["Age"].fillna(df_train["Age"].mean())

rel_columns = ['Age', 'SibSp', 'Parch',
               'male', 'female',
               'Embark_S', 'Embark_Q', 'Embark_C',
               'Class_1', 'Class_2', 'Class_3']
print(df_train[rel_columns])

train_labels = df_train['Survived'].to_numpy()
train_features = df_train[rel_columns].to_numpy()

train_features, test_features, train_labels, test_labels = train_test_split(train_features, train_labels,
                                                                            train_size=0.7, stratify=train_labels)


After this, each vector in the dataset contains:

* age of the passenger
* number of siblings and spouses on board
* number of parents and children on board
* one-hot-encoded sex of the passenger (male or female)
* one-hot-encoded place of embarkment (Cherbourgh, Queenstown or Southhampton)
* one-hot-encoded class in which the passenger traveled (first to third class)

#### 1. Build and Train a NN
Build a simple Neural network consisting of three fully connected layers going from 11 to 16, 16 to 16 and 16 to 1 dimensions. Use sigmoid as an activation after each fully connected layer. Train the model for 200 epochs, using binary cross-entropy loss and the adam optimizer with a learning rate of 0.1.

In [None]:
import torch.nn as nn

#ToDo: fill the __init__ and forward functions. Add arguments if needed.
class TitanicModel(nn.Module):
    def __init__(self):
        super().__init__()
        
    def forward(self, x):

        return x
    

In [None]:
import torch.nn as nn

class TitanicModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(11, 16)
        self.fc2 = nn.Linear(16, 16)
        self.fc3 = nn.Linear(16, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.sigmoid(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        x = self.fc3(x)
        x = self.sigmoid(x)
        
        return x.flatten()     

In [None]:
#ToDo: Train the model and test it on the test set

In [None]:
tm = TitanicModel()
loss_function = nn.BCELoss()
optimizer = torch.optim.Adam(tm.parameters(), lr=0.1)

train_f = torch.tensor(train_features, dtype=torch.float)
train_l = torch.tensor(train_labels, dtype=torch.float)

test_f = torch.tensor(test_features, dtype=torch.float)
test_l = torch.tensor(test_labels, dtype=torch.float)

def train(num_epochs, model, train_features, train_labels, opimizer, loss_function):
    for epoch in range(num_epochs):
        model.train()
        output = model(train_features)
        loss = loss_function(output, train_labels)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        if epoch % 10 == 0:
            print(f"Epoch {epoch} \t -- \t Loss {loss:.4f}")

def evaluate(model, features, labels, loss_function):
    model.eval()
    with torch.no_grad():
        output = model(features)
        loss = loss_function(output, labels)
        
        predictions = (output > 0.5)
        accuracy = (predictions==labels).sum()/labels.shape[0]
    return loss.item(), accuracy

train(200, tm, train_f, train_l, optimizer, loss_function)
test_loss, test_accuracy = evaluate(tm, test_f, test_l, loss_function)
print(f'Test Accuracy: {test_accuracy:.4f}')
        

#### 2. Integrated Gradients

Integrated gradients is a feature attribution method based on the gradients with respect to the input. It is smoother than just the gradients as it integrates the gradients starting from a base representation to the actual input.

Use `captum.attr.IntegratedGradients` to extract feature attributions on the test set. Visualize the average attributions and the attributions for the first instance in the test set.

*Hint*: As the method is based on gradients with respect to the input, your input now requires gradients.

In [None]:
from captum.attr import IntegratedGradients
import matplotlib.pyplot as plt

#ToDo: Apply IntegratedGradients to the model and visualize the results.


In [None]:
from captum.attr import IntegratedGradients
import matplotlib.pyplot as plt

ig = IntegratedGradients(tm)

# integrated gradients requires the input gradient, so enable that in the input tensor
test_f.requires_grad_()

# retrieve attributions
attr = ig.attribute(test_f).detach().numpy()

# Helper method to print importances and visualize distribution
def visualize_importances(net, feature_names, importances, title="Average Feature Importances", axis_title="Features"):
    x_pos = (np.arange(len(feature_names)))
    plt.figure(figsize=(12,6))
    plt.bar(x_pos, importances, align='center')
    plt.xticks(x_pos, feature_names, wrap=True)
    plt.xlabel(axis_title)
    plt.title(title)

visualize_importances(tm, rel_columns, np.mean(attr, axis=0))
passnum = 0
for name, value in zip(rel_columns, test_f[passnum]):
    print(f"Attribute: {name} - Value: {value}")
print(f'Predicted survival probability: {tm(test_f[passnum].unsqueeze(dim=0)).item():.4f} - Actual Survival: {test_labels[passnum] == 1}')
visualize_importances(tm, rel_columns, attr[passnum], title='Passenger {}'.format(passnum))

How do the individual features impact the model on average? Which are the most relevant ones? What do you get from one specific passenger?

You will note that many features have an attribution score of zero. Integrated gradients as a change between the model output for the baseline (in this case the zero vector) and the data point. If the feature value is identical to the baseline, it is attributed no score.


#### 3. KernelSHAP

KernelSHAP is a different feature attribution technique. It is based on [Shapley values](https://en.wikipedia.org/wiki/Shapley_value), which are grounded in game theory. The idea behind Shapley values is to determine in a cooperative game, how much each player contributed to the outcome (e.g. was player 1 very helpful while it wouldn't have made a difference if player 2 hadn't participated). More on the adaptation to explainability can be found in [this paper](https://papers.nips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf).

Use `captum.attr.KernelShap` to extract feature attributions on the test set. Visualize the average attributions and the attributions for the first instance in the test set.

In [None]:
from captum.attr import KernelShap

In [None]:
# Apply KernelSHAP to the model and visualize the resutls.

In [None]:
from captum.attr import KernelShap

shap = KernelShap(tm)

shapattr = shap.attribute(test_f)
shapattr = shapattr.detach().numpy()

visualize_importances(tm, rel_columns, np.mean(shapattr, axis=0))
visualize_importances(tm, rel_columns, shapattr[passnum], title='Passenger {}'.format(passnum))


How do the attribution scores compare to the ones derived by Integrated Gradients? Are they similar or different? Can you think about why?

### 2. Image Data

Now we want to also look at image data. We want to explain a large, pre-trained model - ResNet, which was trained on the ImageNet data.

This part of the exercise is based on a tutorial from [captum](https://captum.ai/tutorials/TorchVision_Interpret). 

First we will load the pre-trained model and the descriptions of the labels contained in ImageNet.

In [None]:
import torchvision
from torchvision import models
import json
import requests

model = models.resnet18(pretrained=True)
model = model.eval()

imagenet_labels_raw = requests.get('https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json').text
imagenet_labels = json.loads(imagenet_labels_raw)

Then download an image of a Ford Model T from Wikipedia and transform it into a tensor that resnet can handle:

In [None]:
from torchvision import transforms
from PIL import Image

raw_image = Image.open(requests.get('https://upload.wikimedia.org/wikipedia/commons/1/15/Late_model_Ford_Model_T.jpg', stream=True).raw)

display(raw_image)

# Resize image to fit the resnet's expected input resolution
# Transform in two steps because we need the unnormalized image for visualization below
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor()
])

# Normalize the color channels to fit the color channel distributions of the imagenet training set.
transform_normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

transformed_img = transform(raw_image)
input_img = transform_normalize(transformed_img)
input_batch = input_img.unsqueeze(0)

print(input_batch)

Next we'll extract the model prediction. The model predicts that the image shows a Model T with 97.8% confidence.

In [None]:
import torch.nn.functional as F
import torch

model_out = model(input_batch)
output = F.softmax(model_out, dim = 1)
prediction_score, pred_label_idx = torch.topk(output, 1)

pred_label_idx.squeeze_()
prediction_score.squeeze_()

predicted_label = imagenet_labels[str(pred_label_idx.item())][1]

print("Prediction:", predicted_label)
print("Confidence: ", prediction_score.item())

#### 1. Integrated Gradients

Compute the feature attributions using Integrated Gradients. Visualize the output using `captum.attr.visualization.visualize_image_attr()`.

In [None]:
from captum.attr import IntegratedGradients
from captum.attr import visualization as viz
import numpy as np

#ToDo: calculate attributions and visualize them

In [None]:
from captum.attr import IntegratedGradients
from captum.attr import visualization as viz
import numpy as np

ig = IntegratedGradients(model)

# integrated gradients requires the input gradient, so enable that in the input tensor
input_batch.requires_grad_()

# retrieve attributions
attr = ig.attribute(input_batch, target=pred_label_idx).detach().numpy()

_ = viz.visualize_image_attr(np.transpose(attr.squeeze(), (1,2,0)),
                             np.transpose(input_img.cpu().detach().numpy(), (1,2,0)),
                             method='heat_map',
                             show_colorbar=True,
                             cmap='Reds',
                             sign='positive',
                             outlier_perc=1)

#### 2. Improving Integrated Gradients

As you can see, the attribution result isn't that comprehensible. We want to improve the result by using `captum.attr.NoiseTunnel`, to calculate the attribution not just for one image, but 10 noisy variants of the same image.

In [None]:
from captum.attr import IntegratedGradients
from captum.attr import NoiseTunnel
from captum.attr import visualization as viz
import numpy as np

#ToDo: Calculate the attributions and visualize them

In [None]:
from captum.attr import IntegratedGradients
from captum.attr import NoiseTunnel
from captum.attr import visualization as viz
import numpy as np


ig = NoiseTunnel(IntegratedGradients(model))

# integrated gradients requires the input gradient, so enable that in the input tensor
input_batch.requires_grad_()

# retrieve attributions
attr = ig.attribute(input_batch, target=pred_label_idx, nt_type='smoothgrad_sq', nt_samples=10).detach().numpy()

_ = viz.visualize_image_attr(np.transpose(attr.squeeze(), (1,2,0)),
                             np.transpose(input_img.cpu().detach().numpy(), (1,2,0)),
                             method='heat_map',
                             show_colorbar=True,
                             cmap='Reds',
                             sign='positive',
                             outlier_perc=1)

Can you now see which parts of the image have the most influence on the models decision? Try higher numbers of noisy images (be careful of the runtime), to get even clearer images.