<a href="https://colab.research.google.com/github/Adyboy1/Quantus/blob/main/tutorials/Tutorial_Getting_Started_with_Tabular_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Getting Started with tabular data!
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/understandable-machine-intelligence-lab/Quantus/main?labpath=tutorials%2FTutorial_Getting_Started_with_Tabular_Data.ipynb)


This notebook shows how to get started with Quantus using tabular data. For this purpose, we use the classic Titanic tabular dataset (Frank E. Harrell Jr., Thomas Cason):

https://www.openml.org/d/40945

The model in this notebook is taken from "Getting started with Captum - Titanic Data Analysis" provided by Captum:

https://captum.ai/tutorials/Titanic_Basic_Interpret

In [4]:
!python -m venv my_env
!source my_env/bin/activate
!pip install --upgrade --force-reinstall --no-cache-dir numpy pandas quantus zennit torch torchvision captum matplotlib seaborn

Error: Command '['/content/my_env/bin/python3', '-m', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1.
/bin/bash: line 1: my_env/bin/activate: No such file or directory
Collecting numpy
  Downloading numpy-2.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pandas
  Downloading pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.9/89.9 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting quantus
  Downloading quantus-0.5.3-py3-none-any.whl.metadata (34 kB)
Collecting zennit
  Downloading zennit-0.5.1-py3-none-any.whl.metadata (7.5 kB)
Collecting torch
  Downloading torch-2.6.0-cp311-cp311-manylinux1_x86_64.whl.metadata (28 kB)
Collecting torchvision
  Downloading torchvision-0.21.0

In [3]:
from IPython.display import clear_output
clear_output()

In [4]:
import pathlib
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

import quantus
from captum.attr import IntegratedGradients

import torch
import torch.nn as nn

torch.manual_seed(27)

clear_output()

np.random.seed(27)

## 1) Preliminaries

### 1.1 Load datasets

We load the dataset using the tensorflow-datasets library. Alternatively, it can be downloaded directly from the OpenML website: https://www.openml.org/d/40945

In [7]:
# Load datasets
df = pd.read_csv("assets/titanic3.csv")
df = df[["age", "embarked", "fare", "parch", "pclass", "sex", "sibsp", "survived"]]
df["age"] = df["age"].fillna(df["age"].mean())
df["fare"] = df["fare"].fillna(df["fare"].mean())

In [8]:
# Data statistics
df.describe()

Unnamed: 0,age,fare,parch,pclass,sibsp,survived
count,1309.0,1309.0,1309.0,1309.0,1309.0,1309.0
mean,29.881138,33.295479,0.385027,2.294882,0.498854,0.381971
std,12.883193,51.738879,0.86556,0.837836,1.041658,0.486055
min,0.17,0.0,0.0,1.0,0.0,0.0
25%,22.0,7.8958,0.0,2.0,0.0,0.0
50%,29.881138,14.4542,0.0,3.0,0.0,0.0
75%,35.0,31.275,0.0,3.0,1.0,1.0
max,80.0,512.3292,9.0,3.0,8.0,1.0


In [9]:
# One-hot encode categorical variables
df_enc = pd.get_dummies(df, columns=["embarked", "pclass", "sex"]).sample(frac=1)

In [10]:
# Pandas dataframes to numpy arrays
X = df_enc.drop(["survived"], axis=1).values
Y = df_enc["survived"].values

In [21]:
train_features, test_features, train_labels, test_labels = train_test_split(
    X.astype(np.float32), Y, test_size=0.3  # Convert X to float32
)

### 1.2 Train a model

The model is based on "Getting started with Captum - Titanic Data Analysis" provided by Captum:

https://captum.ai/tutorials/Titanic_Basic_Interpret

In [22]:
class TitanicSimpleNNModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(12, 12)
        self.sigmoid1 = nn.Sigmoid()
        self.linear2 = nn.Linear(12, 8)
        self.sigmoid2 = nn.Sigmoid()
        self.linear3 = nn.Linear(8, 2)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        lin1_out = self.linear1(x)
        sigmoid_out1 = self.sigmoid1(lin1_out)
        sigmoid_out2 = self.sigmoid2(self.linear2(sigmoid_out1))
        return self.softmax(self.linear3(sigmoid_out2))

In [23]:
net = TitanicSimpleNNModel()

criterion = nn.CrossEntropyLoss()
num_epochs = 200
# Create train and test set
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
input_tensor = torch.from_numpy(train_features).type(torch.FloatTensor)
label_tensor = torch.from_numpy(train_labels)
for epoch in range(num_epochs):
    output = net(input_tensor)
    loss = criterion(output, label_tensor)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if epoch % 20 == 0:
        print("Epoch {}/{} => Loss: {:.2f}".format(epoch + 1, num_epochs, loss.item()))

Epoch 1/200 => Loss: 0.69
Epoch 21/200 => Loss: 0.57
Epoch 41/200 => Loss: 0.53
Epoch 61/200 => Loss: 0.50
Epoch 81/200 => Loss: 0.49
Epoch 101/200 => Loss: 0.49
Epoch 121/200 => Loss: 0.49
Epoch 141/200 => Loss: 0.48
Epoch 161/200 => Loss: 0.48
Epoch 181/200 => Loss: 0.47


In [24]:
out_probs = net(input_tensor).detach().numpy()
out_classes = np.argmax(out_probs, axis=1)
print("Train Accuracy:", sum(out_classes == train_labels) / len(train_labels))

Train Accuracy: 0.834061135371179


In [25]:
test_input_tensor = torch.from_numpy(test_features).type(torch.FloatTensor)
out_probs = net(test_input_tensor).detach().numpy()
out_classes = np.argmax(out_probs, axis=1)
print("Test Accuracy:", sum(out_classes == test_labels) / len(test_labels))

Test Accuracy: 0.7913486005089059


### 1.3 Generate explanations

In this example, we rely on the `captum` library. We use the Integrated Gradients method.

In [26]:
ig = IntegratedGradients(net)

In [27]:
test_input_tensor.requires_grad_()
attr, delta = ig.attribute(test_input_tensor, target=1, return_convergence_delta=True)
attr = attr.detach().numpy()

## 2) Quantative evaluation using Quantus

We can evaluate our explanations on a variety of quantuative criteria but as a motivating example we test the ModelParameterRandomisation scores by Adebayo et al., 2018 and Complexity Bhatt et al., 2020.

The ModelParameterRandomisation metric measures the distance between the original attribution and a newly computed attribution throughout the process of cascadingly/independently randomizing the model parameters of one layer at a time.

The Complexity of attributions is defined as the entropy of the fractional contribution of feature x_i to the total
magnitude of the attribution.

In [28]:
# Return ModelParameterRandomisation scores for Integrated Gradients.
scores_intgrad = quantus.ModelParameterRandomisation(
    similarity_func=quantus.similarity_func.correlation_spearman,
    return_sample_correlation=True,
    return_aggregate=True,
    aggregate_func=np.mean,
    layer_order="independent",
    disable_warnings=True,
    normalise=True,
    abs=True,
    display_progressbar=True,
)(
    model=net,
    x_batch=test_features,
    y_batch=test_labels,
    a_batch=None,
    explain_func=quantus.explain,
    explain_func_kwargs={"method": "IntegratedGradients", "reduce_axes": ()},
)
print(
    f"ModelParameterRandomisation scores by Adebayo et al., 2018\n"
    f"\n • Integrated Gradient = ",
    scores_intgrad,
)

ModelParameterRandomisation metric has been renamed to MPRT and will be removed in future releases. Please call quantus.MPRT() instead.
This change is effective from Quantus version 0.5.0. Note: MPRT is functionally identical to ModelParameterRandomisation and can be used in the same way.
'return_sample_correlation' parameter is deprecated and will be removed in future versions. Please use 'return_average_correlation' instead. Setting 'return_average_correlation' to True.


  0%|          | 0/1179 [00:00<?, ?it/s]

ModelParameterRandomisation scores by Adebayo et al., 2018

 • Integrated Gradient =  [0.96246443421195]


In [29]:
complexity_intgrad = quantus.Complexity(
    normalise=True,
    abs=True,
    disable_warnings=True,
    display_progressbar=True,
    return_aggregate=True
)(
    model=net,
    x_batch=test_features,
    y_batch=test_labels,
    a_batch=None,
    explain_func=quantus.explain,
    explain_func_kwargs={"method": "IntegratedGradients", "reduce_axes": ()},
)

print(
    f"Complexity Bhatt et al., 2020.\n"
    f"\n • Integrated Gradient = ",
    complexity_intgrad,
)

  0%|          | 0/7.0 [00:00<?, ?it/s]

Complexity Bhatt et al., 2020.

 • Integrated Gradient =  [1.3425012006920165]
