CS4001/4042 Assignment 1, Part B, Q3
---

Besides ensuring that your neural network performs well, it is important to be able to explain the model’s decision. **Captum** is a very handy library that helps you to do so for PyTorch models.

Many model explainability algorithms for deep learning models are available in Captum. These algorithms are often used to generate an attribution score for each feature. Features with larger scores are more ‘important’ and some algorithms also provide information about directionality (i.e. a feature with very negative attribution scores means the larger the value of that feature, the lower the value of the output).

In general, these algorithms can be grouped into two paradigms:
- **perturbation based approaches** (e.g. Feature Ablation)
- **gradient / backpropagation based approaches** (e.g. Saliency)

The former adopts a brute-force approach of removing / permuting features one by one and does not scale up well. The latter depends on gradients and they can be computed relatively quickly. But unlike how backpropagation computes gradients with respect to weights, gradients here are computed **with respect to the input**. This gives us a sense of how much a change in the input affects the model’s outputs.





---



---



In [1]:
!pip install captum



In [2]:
SEED = 42

import os

import random
random.seed(SEED)

import numpy as np
np.random.seed(SEED)

import pandas as pd

import torch
import torch.nn as nn

from captum.attr import Saliency, InputXGradient, IntegratedGradients, GradientShap, FeatureAblation

> First, load the dataset following the splits in Question B1. To keep things simple, we will **limit our analysis to numeric / continuous features only**. Drop all categorical features from the dataframes. Do not standardise the numerical features for now.



In [3]:
# TODO: Enter your code here
df = pd.read_csv('hdb_price_prediction.csv')

train_data = df[df['year'] <= 2019]
val_data = df[df['year'] == 2020]
test_data = df[df['year'] == 2021]

target = ['resale_price']


numeric_features = ['dist_to_nearest_stn', 'dist_to_dhoby', 'degree_centrality', 
                    'eigenvector_centrality', 'remaining_lease_years', 'floor_area_sqm']

train_data_numeric = train_data[numeric_features]
val_data_numeric = val_data[numeric_features]
test_data_numeric = test_data[numeric_features]


> Follow this tutorial to generate the plot from various model explainability algorithms (https://captum.ai/tutorials/House_Prices_Regression_Interpret).
Specifically, make the following changes:
- Use a feedforward neural network with 3 hidden layers, each having 5 neurons. Train using Adam optimiser with learning rate of 0.001.
- Use Saliency, Input x Gradients, Integrated Gradients, GradientSHAP, Feature Ablation


In [4]:
# TODO: Enter your code here
X_train = torch.Tensor(train_data_numeric.values)
y_train = torch.Tensor(train_data[target].values)

batch_size = 1024
num_epochs = 100
learning_rate = 0.001
size_hidden1 = 100
size_hidden2 = 50
size_hidden3 = 10
size_hidden4 = 1

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.hidden1 = nn.Linear(6, 5)
        self.hidden2 = nn.Linear(5, 5)
        self.hidden3 = nn.Linear(5, 5)
        self.output = nn.Linear(5, 1)

    def forward(self, x):
        x = torch.relu(self.hidden1(x))
        x = torch.relu(self.hidden2(x))
        x = torch.relu(self.hidden3(x))
        x = self.output(x)
        return x

model = NeuralNetwork()
model.train()

criterion = nn.MSELoss(reduction='sum')

# Train the model
def train(model_inp, num_epochs = num_epochs):
    optimizer = torch.optim.Adam(model_standardized.parameters(), lr=0.001)
    for epoch in range(num_epochs):  # loop over the dataset multiple times
        running_loss = 0.0
        for inputs, labels in train_iter:
            outputs = model_inp(inputs)
            loss = criterion(outputs, labels)
            optimizer.zero_grad()
            loss.backward()
            running_loss += loss.item()
            optimizer.step()
        if epoch % 20 == 0:    
            print('Epoch [%d]/[%d] running accumulative loss across all batches: %.3f' %
                  (epoch + 1, num_epochs, running_loss))
        running_loss = 0.0
        

In [5]:
X_test = torch.Tensor(test_data_numeric.values)

# Saliency
saliency = Saliency(model)
attributions_saliency = saliency.attribute(X_test[:1000], target=0)
print('Saliency=', attributions_saliency)


# Input x Gradients
input_x_grad = InputXGradient(model)
attributions_input_x_grad = input_x_grad.attribute(X_test[:1000], target=0)
print('Input x Gradients=', attributions_input_x_grad)


# Integrated Gradients
integrated_grad = IntegratedGradients(model)
attributions_integrated_grad = integrated_grad.attribute(X_test[:1000], target=0)
print('Integrated Gradients=', attributions_integrated_grad)


# GradientSHAP
gradient_shap = GradientShap(model)
baselines = torch.zeros(X_test[:1000].shape)  
attributions_gradient_shap = gradient_shap.attribute(X_test[:1000], baselines=baselines, target=0)
print('GradientSHAP=', attributions_gradient_shap)

# Feature Ablation
feature_ablation = FeatureAblation(model)
attributions_feature_ablation = feature_ablation.attribute(X_test[:1000], target=0)
print('Feature Ablation=', attributions_feature_ablation)



Saliency= tensor([[0.0013, 0.0003, 0.0010, 0.0009, 0.0008, 0.0027],
        [0.0013, 0.0003, 0.0010, 0.0009, 0.0008, 0.0027],
        [0.0013, 0.0003, 0.0010, 0.0009, 0.0008, 0.0027],
        ...,
        [0.0013, 0.0003, 0.0010, 0.0009, 0.0008, 0.0027],
        [0.0013, 0.0003, 0.0010, 0.0009, 0.0008, 0.0027],
        [0.0013, 0.0003, 0.0010, 0.0009, 0.0008, 0.0027]])
Input x Gradients= tensor([[ 1.6157e-03,  2.8778e-03,  1.6520e-05,  2.2612e-06, -5.1981e-02,
          1.2159e-01],
        [ 1.6157e-03,  2.8778e-03,  1.6520e-05,  2.2612e-06, -5.1981e-02,
          1.2159e-01],
        [ 1.1197e-03,  2.4091e-03,  1.6520e-05,  5.7417e-06, -4.7858e-02,
          1.8373e-01],
        ...,
        [ 6.9358e-04,  3.1515e-03,  1.6520e-05,  2.2612e-06, -4.6911e-02,
          1.8373e-01],
        [ 9.7030e-04,  2.6001e-03,  1.6520e-05,  5.7417e-06, -4.6100e-02,
          1.8103e-01],
        [ 8.8964e-04,  2.8909e-03,  1.6520e-05,  5.7417e-06, -4.6438e-02,
          1.8373e-01]], grad_fn=<MulB

> Train a separate model with the same configuration but now standardise the features via **StandardScaler** (fit to training set, then transform all). State your observations with respect to GradientShap and explain why it has occurred.
(Hint: Many gradient-based approaches depend on a baseline, which is an important choice to be made. Check the default baseline settings carefully.)


In [6]:
# TODO: Enter your code here
from sklearn.preprocessing import StandardScaler

# Standardize the features
scaler = StandardScaler()
X_train_standardized = scaler.fit_transform(train_data_numeric)
X_test_standardized = scaler.transform(test_data_numeric)

X_train_standardized = torch.Tensor(X_train_standardized)
X_test_standardized = torch.Tensor(X_test_standardized)
y_train = torch.Tensor(train_data[target].values)

model_standardized = NeuralNetwork()

model_standardized.train()

criterion = nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(model_standardized.parameters(), lr=0.001)

# Train the model
for epoch in range(num_epochs):
    optimizer.zero_grad()
    outputs = model_standardized(X_train_standardized)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

    if epoch % 20 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')


# Saliency
saliency_standardized = Saliency(model_standardized)
attributions_saliency_standardized = saliency_standardized.attribute(X_test_standardized[:1000], target=0)
print('Saliency=', attributions_saliency_standardized)
                                                                     

# Input x Gradients
input_x_grad_standardized = InputXGradient(model_standardized)
attributions_input_x_grad_standardized = input_x_grad_standardized.attribute(X_test_standardized[:1000], target=0)
print('Input x Gradients=', attributions_input_x_grad_standardized)

# Integrated Gradients
integrated_grad_standardized = IntegratedGradients(model_standardized)
attributions_integrated_grad_standardized = integrated_grad_standardized.attribute(X_test_standardized[:1000], target=0)
print('Integrated Gradients=', attributions_integrated_grad_standardized)

# GradientSHAP
gradient_shap_standardized = GradientShap(model_standardized)
baselines_standardized = torch.zeros(X_test_standardized[:1000].shape)  
attributions_gradient_shap_standardized = gradient_shap_standardized.attribute(X_test_standardized[:1000], baselines=baselines_standardized, target=0)
print('GradientSHAP=', attributions_gradient_shap_standardized)

# Feature Ablation
feature_ablation_standardized = FeatureAblation(model_standardized)
attributions_feature_ablation_standardized = feature_ablation_standardized.attribute(X_test_standardized[:1000], target=0)
print('Feature Ablation=', attributions_feature_ablation_standardized)

Epoch [1/100], Loss: 13859052010340352.0000
Epoch [21/100], Loss: 13859048789114880.0000
Epoch [41/100], Loss: 13859046641631232.0000
Epoch [61/100], Loss: 13859044494147584.0000
Epoch [81/100], Loss: 13859039125438464.0000




Saliency= tensor([[0.0063, 0.0442, 0.0438, 0.0468, 0.0240, 0.0284],
        [0.0063, 0.0442, 0.0438, 0.0468, 0.0240, 0.0284],
        [0.0055, 0.0248, 0.0256, 0.0335, 0.0204, 0.0219],
        ...,
        [0.0015, 0.0544, 0.0440, 0.0394, 0.0245, 0.0282],
        [0.0015, 0.0544, 0.0440, 0.0394, 0.0245, 0.0282],
        [0.0015, 0.0544, 0.0440, 0.0394, 0.0245, 0.0282]])
Input x Gradients= tensor([[-0.0065,  0.0301, -0.0063, -0.0096,  0.0201,  0.0618],
        [-0.0065,  0.0301, -0.0063, -0.0096,  0.0201,  0.0618],
        [-0.0010,  0.0249, -0.0037, -0.0009,  0.0254,  0.0268],
        ...,
        [-0.0008,  0.0269, -0.0063, -0.0081,  0.0328,  0.0346],
        [-0.0001,  0.0474, -0.0063, -0.0010,  0.0348,  0.0358],
        [-0.0003,  0.0366, -0.0063, -0.0010,  0.0340,  0.0346]],
       grad_fn=<MulBackward0>)
Integrated Gradients= tensor([[-0.0077,  0.0295, -0.0066, -0.0055,  0.0108,  0.0180],
        [-0.0077,  0.0295, -0.0066, -0.0055,  0.0108,  0.0180],
        [-0.0016,  0.0505, -0.

When standardized, GradientShap provides relatively coherent and interpretable attribution scores, indicating the importance of features for predicting the target variable. However, in the non-scaled version, the attribution scores provided by GradientShap are less interpretable and seem inconsistent. This discrepancy occurs because GradientShap relies on a baseline value (in this case, a tensor of zeros) to compute the attributions. When features are not scaled, the variation in the feature values can be substantial, making it difficult to attribute changes in the output solely to changes in a specific feature. This can result in less meaningful and less reliable attribution scores from GradientShap.



Read https://distill.pub/2020/attribution-baselines/ to build up your understanding of Integrated Gradients (IG). Reading the sections before the section on ‘Game Theory and Missingness’ will be sufficient. Keep in mind that this article mainly focuses on classification problems. You might find the following [descriptions](https://captum.ai/docs/attribution_algorithms) and [comparisons](https://captum.ai/docs/algorithms_comparison_matrix) in Captum useful as well.


Then, answer the following questions in the context of our dataset:

> Why did Saliency produce scores similar to IG?


\# TODO: \<Enter your answer here\> 

Saliency computes gradients with respect to the input features (input attribution), while IG is a method that approximates integrated gradients using a series of linear interpolations between a baseline and the input (input attribution as well).

When standardized features are used, the feature values have been scaled to a similar range. This scaling makes the feature contributions to the model's output more consistent and less sensitive to the magnitude of the feature values. As a result, Saliency and IG produce similar attribution scores because they are both measuring how a change in each feature impacts the model's output, and the scaling aligns their attributions.

In the non-standardized case, the attribution scores may differ because feature values have different scales, and the models' sensitivities to the individual features can vary significantly. This leads to different attribution patterns between Saliency and IG, where Saliency might be more affected by the magnitude of feature values while IG is less sensitive due to the integration step.


> Why did Input x Gradients give the same attribution scores as IG?


\# TODO: \<Enter your answer here\>

Input x Gradients is another gradient-based algorithm that computes the gradient of the model's output with respect to the input features (input attribution).

When standardized features are used, both Input x Gradients and IG provide similar attribution scores because they both rely on gradient computations with respect to the input. Standardization helps ensure that the gradients are consistent in terms of feature magnitudes, leading to similar attribution patterns between the two methods.

In the non-standardized case, the attribution scores may also differ because Input x Gradients is directly affected by the magnitude of the feature values, whereas IG mitigates this sensitivity by integrating the gradient across the entire feature space. Consequently, if the features have different scales, Input x Gradients can give higher importance to features with larger values, while IG normalizes these variations.