# Integrated Gradients to explain DNN Models on Tabular Data


This example illustrates the use of Integrated Gradients on Tabular Data. 
**Integrated Gradients** is an attribution method designed to explain the predictions of deep neural networks by assigning an importance score to each input feature. In other words, it helps answer the question: “How much did each feature contribute to this prediction?”

### Basic Idea: 
 We measure the cumulative effect of each input feature by "integrating" the gradients of the model's output  w.r.t the input along the path of some baseline (usually one that represents absence of features). 

 Definition of IG:

Let:
-	 F: $\mathbb{R}^n \rightarrow \mathbb{R}$  be the model (typically a deep neural network) that produces a prediction.
-	 $x \in \mathbb{R}^n$  be the actual input.
-	 $x^{\prime} \in \mathbb{R}^n$  be a baseline input (e.g., a vector of zeros or some neutral reference point).

The integrated gradient for the i-th input feature is defined as:

 $$
\text{IG}_i(x) = (x_i - x'_i) \times \int_{0}^{1} \frac{\partial F\big(x' + \alpha (x - x')\big)}{\partial x_i} \, d\alpha
$$

In this example, we'll use integrated gradients as provided by the Captum library for PyTorch. 

In [1]:

import matplotlib.pyplot as plt 
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np 
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler,OneHotEncoder
from os import path 

from captum.attr import IntegratedGradients


torch.manual_seed(34)
np.random.seed(34)

# uncomment this code to check if GPU is available
'''
if torch.backends.mps.is_available():
    device = torch.device("mps")
    x = torch.ones(1, device=device)
    print (x)
elif torch.cuda.is_available():
    device = torch.device("cuda")
    x = torch.ones(1, device=device)
else:
    print ("GPU device not found.")
    device = torch.device("cpu")
'''    

device = torch.device("cpu")

## **1. Load the Portuguese bank dataset**

- We will use the Portuguese bank dataset that we used for the Accumulated Local Effects (ALE) example. 
- To recap,these are 16 direct marketing (phone calls related) features from 45,211 clients and task is to  classify if client will subscribe a term deposit or not. 
- Dataset URL:  https://archive.ics.uci.edu/static/public/222/bank+marketing.zip


In [None]:
df = pd.read_csv('../acc_local_effects/datasets/bank-full.csv', delimiter=';')   
display(df.head())
print(df.shape)
print(df['y'].value_counts()) # let's see if there is data imbalance in the target variable


## **2.Preprocess data and create Pytorch tensors**

We have categorical features and numerical features. we transform categorical features to on-hot encoding and numerical features are scaled accordingly. 



In [None]:


# Automatically determine categorical and numerical features
categorical_features = df.select_dtypes(include=["object", "category"]).columns.tolist()
numerical_features = df.select_dtypes(include=["number"]).columns.tolist()

print("Categorical Features:", categorical_features)
print("Numerical Features:", numerical_features)


# Remove the target variable (if included in the dataset)
target_variable = "y"
if target_variable in categorical_features:
    categorical_features.remove(target_variable)
if target_variable in numerical_features:
    numerical_features.remove(target_variable)
    
# Define the ColumnTransformer
one_hot_encoder = OneHotEncoder(sparse_output=False)
scaler = StandardScaler()

preprocessor = ColumnTransformer(
    transformers=[
        ("num", scaler, numerical_features),   # Scale numerical features
        ("cat", one_hot_encoder, categorical_features)  # One-hot encode categorical features
    ]
)

# Separate features and target
X = df.drop(target_variable, axis=1)
y = df[target_variable]

# Convert true labels to binary format
y= (y == 'yes').astype(int)

# Apply the transformation to features
processed_X = preprocessor.fit_transform(X)

# Extract the fitted OneHotEncoder
one_hot_encoder = preprocessor.named_transformers_["cat"]

# Get one-hot encoded column names
one_hot_encoded_columns = one_hot_encoder.get_feature_names_out(categorical_features)

# Combine with scaled numerical column names
final_column_names = numerical_features + list(one_hot_encoded_columns)
print(final_column_names)

# Create a DataFrame with the transformed data
processed_X_df = pd.DataFrame(processed_X, columns=final_column_names)

# Display the DataFrame
display(processed_X_df.head())


X_train, X_test, y_train, y_test = train_test_split(processed_X_df, y, test_size=0.2, stratify=y, random_state=42)

X_train = torch.tensor(X_train.values).float()
y_train = torch.tensor(y_train.values).view(-1, 1).float()

X_test = torch.tensor(X_test.values).float()
y_test = torch.tensor(y_test.values).view(-1, 1).float()

datasets = torch.utils.data.TensorDataset(X_train, y_train)
train_iter = torch.utils.data.DataLoader(datasets, batch_size=10, shuffle=True)

## **3. Define the Model**

For this example, we will use a 4-layered MLP network with dropout. This is pretty straightforward

In [4]:
# create a four layer neural network with ReLU activation functions

batch_size = 64
num_epochs = 120
learning_rate = 0.001
size_hidden1 = 80
size_hidden2 = 30
size_hidden3 = 10
size_output = 1

class DNNModel(nn.Module):
    def __init__(self, dropout_rate=0.3):
        super(DNNModel, self).__init__()
        self.fc1 = nn.Linear(51, size_hidden1)
        self.relu1 = nn.ReLU()
        self.drop1 = nn.Dropout(p=dropout_rate)  

        self.fc2 = nn.Linear(size_hidden1, size_hidden2)
        self.relu2 = nn.ReLU()
        self.drop2 = nn.Dropout(p=dropout_rate)  

        self.fc3 = nn.Linear(size_hidden2, size_hidden3)
        self.relu3 = nn.ReLU()
        self.drop3 = nn.Dropout(p=dropout_rate)  

        self.fc4 = nn.Linear(size_hidden3, size_output)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.drop1(x) 

        x = self.fc2(x)
        x = self.relu2(x)
        x = self.drop2(x)  

        x = self.fc3(x)
        x = self.relu3(x)
        x = self.drop3(x)  
        x = self.fc4(x)
        #x = self.sigmoid(x) # no need to apply sigmoid here, as we will use BCEWithLogitsLoss
        return x


## **4. Train the model**

Let's train the model for 120 epochs. For simplicity, we will not be doing any hyperparameter optimzation or k-fold cross validation. To save time, we have pre-trained the model and saved the weights. The function will train if no weights are present, otherwise, it will simply load the pretrained weights or checkpoint.

The dataset is highly imbalanced with 39922 examples with negative outcome and only 5289 calls with positive outcomes (subscribed to term deposits) -- a ratio of 7.55. To compensate for this imbalance we use weighted loss with a pos_weight of 7.55. To improve training, we use xavier initialization and low lr.


In [None]:
model = DNNModel()
criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor(7.55))
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

import torch.nn.init as init

def initialize_weights(m):
    if isinstance(m, nn.Linear):  # Apply to Linear layers only
        init.xavier_uniform_(m.weight)  # Xavier (Glorot) initialization
        if m.bias is not None:
            init.zeros_(m.bias)  # Initialize biases to zero

# Apply initialization to the model
model.apply(initialize_weights)

#run model on device
model.to(device)

def train(model, num_epochs, criterion, optimizer, train_iter, device):
    model.to(device)  # Move model to device
    for epoch in range(num_epochs):
        running_loss = 0.0
        for i, data in enumerate(train_iter, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)  # Move to GPU/CPU

            optimizer.zero_grad()
            outputs = model(inputs)  # Forward pass
            loss = criterion(outputs, labels)  # Compute loss

            loss.backward()  # Backpropagation
            optimizer.step()  # Update weights

            running_loss += loss.item()
        
        # Print average loss every 10 epochs
        if (epoch + 1) % 10 == 0:
            avg_loss = running_loss / len(train_iter)
            print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}')

def train_load_save_model(model_obj, model_path,num_epochs,criterion, optimizer, train_iter, device):
    if path.isfile(model_path):
        # load model
        print('Loading pre-trained model from: {}'.format(model_path))
        model_obj.load_state_dict(torch.load(model_path))
    else:    
        # train model
        # Move model to device
        model_obj = model_obj.to(device)
        train(model_obj, num_epochs, criterion, optimizer, train_iter, device)
        print('Finished training the model. Saving the model to the path: {}'.format(model_path))
        torch.save(model_obj.state_dict(), model_path)
        
        
SAVED_MODEL_PATH = 'models/bank_model.pt'
# Move input tensors to device
X_train = X_train.to(device)
y_train = y_train.to(device)
X_test = X_test.to(device)
y_test = y_test.to(device)

# Move datasets to device
datasets = torch.utils.data.TensorDataset(X_train, y_train)

# Ensure data loader is using tensors on the correct device
train_iter = torch.utils.data.DataLoader(datasets, batch_size=10, shuffle=True)

train_load_save_model(model, SAVED_MODEL_PATH, num_epochs, criterion, optimizer, train_iter,device)

## **5. Training Results**

Of course, with hyperparameter optimization we could possibly get better results, but the objective here is to illustrate IG as a post-hoc explainability method, not model optimization. Let's proceed with this model. 


In [None]:

from sklearn.metrics import roc_auc_score, auc, RocCurveDisplay, confusion_matrix,  ConfusionMatrixDisplay
import seaborn as sns
model.eval()
outputs = model(X_test)
predicted = outputs > 0.5
total = y_test.cpu().size(0)
correct = (predicted.float() == y_test).sum().item()
accuracy = correct / total
print('Accuracy: ', accuracy)
roc_auc = roc_auc_score(y_test.cpu(), outputs.detach().cpu().numpy())
print('ROC AUC:', roc_auc)
conf_mat = confusion_matrix(y_test.cpu(), predicted.cpu())
# Plot confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=conf_mat)
disp.plot(cmap='Blues', values_format='d')
plt.title("Confusion Matrix")
plt.show()



##  **6. Compute Global explanations using Integrated Gradients**

- Here, we can compute feature attribution for all datapoints in the test set and visualize individual feature contributions.
- This process is usually time consuming as for each row in the test set, the IG computation has to be repeated n_steps times.
- For sake of time, we pre-compute and save the attributions.

In [None]:
import os

def compute_and_save_ig_attributions(model, X_test, final_column_names, 
                                     ig_attr_path="models/ig_attr_test.npy", 
                                     ig_attr_norm_sum_path="models/ig_attr_test_norm_sum.npy", 
                                     feature_names_path="models/feature_names.txt"):
    if not os.path.exists(ig_attr_path) or not os.path.exists(ig_attr_norm_sum_path) or not os.path.exists(feature_names_path):
        # Compute IG attributions
        ig = IntegratedGradients(model)
        ig_attr_test = ig.attribute(X_test, n_steps=50)

        # Convert to numpy for saving
        ig_attr_test_np = ig_attr_test.detach().numpy()

        # Save as a .npy file for fast loading
        np.save(ig_attr_path, ig_attr_test_np)

        # Also save the summed and normalized version
        ig_attr_test_sum = ig_attr_test_np.sum(0)
        ig_attr_test_norm_sum = ig_attr_test_sum / np.linalg.norm(ig_attr_test_sum, ord=1)
        np.save(ig_attr_norm_sum_path, ig_attr_test_norm_sum)

        # Save feature names for reference
        with open(feature_names_path, "w") as f:
            for name in final_column_names:
                f.write(f"{name}\n")
        print("IG attributions computed and saved.")
    else:
        print("IG attributions already exist. Skipping computation.")

# Call the function and only run IG computations if the precomputed attributions don't exist
compute_and_save_ig_attributions(model, X_test, final_column_names)


In [None]:

# Load precomputed IG attributions
ig_attr_test_np = np.load("models/ig_attr_test.npy")

# Load summed and normalized IG attributions
ig_attr_test_norm_sum = np.load("models/ig_attr_test_norm_sum.npy")

# Load feature names
with open("models/feature_names.txt", "r") as f:
    feature_names = [line.strip() for line in f]

# Verify data shape
print("IG Attribution Shape:", ig_attr_test_np.shape)
print("Feature Names:", feature_names)
# Create a bar plot for feature attributions
x_axis_data = np.arange(len(feature_names))
plt.figure(figsize=(12,6))
plt.bar(x_axis_data, ig_attr_test_norm_sum, color='salmon', label="Int Grads")
plt.xticks(x_axis_data, feature_names, rotation=90, fontsize=10)
plt.ylabel("Attributions")
plt.title("Precomputed Integrated Gradients Feature Importances")
plt.legend()

# Draw dotted lines from xticks labels to the y-axis
for i in range(len(feature_names)):
    plt.axvline(x=i, color='gray', linestyle='dotted', linewidth=0.5)

# Draw y-axis
plt.axhline(y=0, color='black', linewidth=0.5)

plt.show()

### Global explanations

From the bar plot, it can be noted that the features `default_no`, `loan_no` are significant indicators of a positive outcome in the trained model. This means if the client has not defaulted before and if they do not have any loans, the likelihood of these clients to subscribe to a term deposit is high.

We also see that the month of May, is attributed to negative outcome. Similarly, if the client is a blue collar worker and is either in the management or technician role leads to negative outcomes. 

To dig deeper why the month of May leads to negative outcome, let's check the distribution of calls across the months in a year in the dataset.

In [None]:
df['month'].value_counts().plot(kind='bar') # let's see the distribution of the month column

Okay, here we see that the majority of the calls were made in the month of May! 	
-	Since a large number of calls happen in May, many clients might have already been contacted multiple times, leading to customer fatigue.
-	Customers may perceive these calls as too aggressive or repetitive, making them less likely to subscribe.

### Local explanations

Let's plot the IG attributions for a specific client:

In [None]:
# plot IG attribution for client 100 in the test set. we'll load the data and the model, and then compute the IG attributions for the client
client_idx = 100
client_data = X_test[client_idx].unsqueeze(0)
client_output = model(client_data)
client_output_prob = torch.sigmoid(client_output).item()
client_output_class = 1 if client_output_prob > 0.5 else 0
print("Predicted Probability:", client_output_prob)
print("Predicted Class:", client_output_class)
print("True Class:", y_test[client_idx].item())

#plot IG attributions
ig = IntegratedGradients(model)
ig_attr_client = ig.attribute(client_data, n_steps=50).squeeze(0).detach().numpy()
ig_attr_client_norm = ig_attr_client / np.linalg.norm(ig_attr_client, ord=1)
plt.figure(figsize=(12,6))
plt.bar(x_axis_data, ig_attr_client_norm, color='salmon', label="Int Grads")
plt.xticks(x_axis_data, feature_names, rotation=90, fontsize=10)
plt.ylabel("Attributions")
plt.title("Integrated Gradients Feature Importances for Client 100")
plt.legend()

# Draw dotted lines from xticks labels to the y-axis
for i in range(len(feature_names)):
    plt.axvline(x=i, color='gray', linestyle='dotted', linewidth=0.5)

# Draw y-axis
plt.axhline(y=0, color='black', linewidth=0.5)

plt.show()

**job_blue_collar (Most Negative Attribution)**
-	Being in a blue-collar job significantly decreased the likelihood of subscribing.
-	Possible reasons:
	-	Blue-collar workers may have less financial flexibility to invest in term deposits.
	-	Their income levels and job stability might not align with long-term savings products.
	-	Historically, in banking datasets like UCI’s, blue-collar workers tend to have lower subscription rates.

**month_apr (Very Negative Attribution)**
- If the marketing call happened in April, the client was significantly less likely to subscribe.
- Possible reasons:
	-	April could be a time when people prioritize tax payments and avoid new financial commitments.
	-	Seasonality effects—April might not be an ideal month for term deposits compared to months like September or December.

**education_secondary (Negative Attribution)**
-	Having secondary education instead of higher education might have slightly decreased the likelihood of subscribing.
-	Possible reasons:
	-	Clients with higher education levels (college/university) may be more financially literate and more likely to invest in term deposits.
	-	Secondary-educated individuals may prefer short-term savings over fixed-term deposits.


**Conclusions**

- Consider targeting white-collar jobs more aggressively: 
	- Blue-collar workers show less interest in term deposits, so marketing strategies might need more tailored financial education.
-	April campaigns might not be effective: If April shows consistently negative attributions across multiple clients, it might not be an ideal marketing period.
-	Education matters in financial product marketing: Clients with only secondary education may need simpler messaging or better financial awareness campaigns to improve conversion rates.