Integrated Gradients (IG) tells us how much the inputs contribited to the final output. By using Layer Integrated Gradients (LIG) , we can find out how much each neuron of that layer for the same input. LIG can be made equal to IG by using principles of chain rule for backpropogation. In this notebook, I will attempt to show this relation. As with other notebooks, we will start with simple NN , run both LIG and IG and show how they are related <br>
TODO : add more information on how the relation works using back propogation

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import grad
import numpy as np
from captum.attr import LayerIntegratedGradients,IntegratedGradients
from IPython.display import Image

In [2]:
input_dim = 2
inData = torch.rand((1,input_dim),requires_grad=True) # shape is [1,3] Imagine this to be a word with 3 dimensional vector [batch_size,num_dim]
baseline = torch.rand((1,input_dim),requires_grad=True)
target_index = 1 # index of second class
num_steps = 10

In [3]:
class simpleNN(nn.Module):
    def __init__(self):
        super(simpleNN,self).__init__()
        self.input_linear = nn.Linear(in_features=input_dim,out_features=3,bias=True) # takes 2 features and outputs 3 
        self.hidden= nn.Linear(in_features=3,out_features=2,bias=True) #one output for each class

        
    def forward(self,x):
        x = F.relu(self.input_linear(x))
        x= torch.sigmoid(self.hidden(x))
        x = F.softmax(x,dim=1)
        return x

In [4]:
def predict_i(inputs):
    logits = net(inputs) # I cant pass the model itself as a parameter to IntegratedGradients
    return logits

In [5]:
net = simpleNN()
net

simpleNN(
  (input_linear): Linear(in_features=2, out_features=3, bias=True)
  (hidden): Linear(in_features=3, out_features=2, bias=True)
)

## Layer Integrated Gradients

In [6]:
lig = LayerIntegratedGradients(predict_i,net.input_linear)
attributions_dim_linear_layer = lig.attribute(inputs=(inData),
                                  baselines=(baseline),
                                   n_steps=num_steps,
                                   target =target_index)

In [8]:
print("Attribution of each nueron in the linear layer")

for att in attributions_dim_linear_layer[0]:
    print("\t", round(att.item(),5))

Attribution of each nueron in the linear layer
	 -0.00461
	 -0.0
	 0.00245


## Integrated Gradients

In [10]:
ig = IntegratedGradients(forward_func=predict_i)
attributions_summary = ig.attribute(inputs=(inData),
                                  baselines=(baseline),
                                   n_steps=num_steps,
                                   target =target_index)
print("Attributions associated with each dimension of the word are")
for att in attributions_summary[0]:
    print("\t", round(att.item(),5))

Attributions associated with each dimension of the word are
	 -0.00015
	 -0.00201


## Relating IG and LIG

In [11]:
print("attribution sum of each embedding dimension of the input from IG")
print(round(torch.sum(attributions_summary[0]).item(),5))

attribution sum of each embedding dimension of the input from IG
-0.00216


In [12]:
print("attribution sum of each neuron in linear layer from LIG")
print(round(torch.sum(attributions_summary[0]).item(),5))

attribution sum of each neuron in linear layer from LIG
-0.00216


In [13]:
assert round(torch.sum(attributions_summary[0]).item(),5) == round(torch.sum(attributions_dim_linear_layer).item(),5)

**Thus proved .. Voila !**

By principles of back propogation , we know that the gradients (indirectly attributions) present in linear layers are divided among each dimension of the input according to how much each dimension contributed.   