In this notebook, I am going to show 
1. Results from library method of Layer Integrated Gradients
2. A barebones implementation of the logic for layer Integrated Gradients
3. show a simplified implementation of library code <br>
I will start with a simple Neural Net and work my way. The final cell shows the results from all the 3 methods. We will be using hooks. To get an understanding on hooks , I will leave some study materials link towards the end

In [1]:
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import grad
import numpy as np
from captum.attr import LayerIntegratedGradients,IntegratedGradients
from IPython.display import Image
from torch.nn import Module
from typing import Any, Callable, Dict, List, Tuple, Union, cast, overload
from torch import Tensor, device
import threading
import numpy as np
import torch

## Simple Neural Net

Using Layer Integrated gradients we will try and find out how much each neuron of linear_layer contributes to the final output. For sake of simplicity, we are assuming a single cpu 


In [2]:
input_dim = 2

In [3]:
class simpleNN(nn.Module):
    def __init__(self):
        super(simpleNN,self).__init__()
        self.input_linear = nn.Linear(in_features=input_dim,out_features=3,bias=True) # takes 2 features and outputs 3 
        self.hidden= nn.Linear(in_features=3,out_features=2,bias=True) #one output for each class

        
    def forward(self,x):
        x = F.relu(self.input_linear(x))
        x= torch.sigmoid(self.hidden(x))
        x = F.softmax(x,dim=1)
        return x

In [4]:
inData = torch.rand((1,input_dim),requires_grad=True) # shape is [1,3] Imagine this to be a word with 3 dimensional vector [batch_size,num_dim]
baseline = torch.rand((1,input_dim),requires_grad=True)
target_index = 1 # index of second class
num_steps = 10

In [5]:
def predict_i(inputs):
    logits = net(inputs) # I cant pass the model itself as a parameter to IntegratedGradients
    return logits

In [6]:
net = simpleNN()
net

simpleNN(
  (input_linear): Linear(in_features=2, out_features=3, bias=True)
  (hidden): Linear(in_features=3, out_features=2, bias=True)
)

## Captum Library Code

In [7]:
lig = LayerIntegratedGradients(predict_i,net.input_linear)

In [8]:
attributions_dim_linear_layer = lig.attribute(inputs=(inData),
                                  baselines=(baseline),
                                   n_steps=num_steps,
                                   target =target_index)

In [9]:
print("Attribution of each nueron in the linear layer")
print (attributions_dim_linear_layer)

Attribution of each nueron in the linear layer
tensor([[-0.0020, -0.0035, -0.0004]], dtype=torch.float64,
       grad_fn=<MulBackward0>)


## BareBones LayerIntegratedGradients Method

1. take the input and run it through a forward network and get the value of the neuron for that input. 
2. Mimic integrated gradient but instead of running through the entire network, short circuit it with a hook so that we run the network only upto linear layer. This means instead of generating values for the neuron at the linear layer, we will provide values for this neural network.

In [44]:
# this function returns the ouput of the linear layer 
def run_forward_get_layer_output(forward_fn,inputs,layer_which_needs_hook):
    saved_layer = {}
    lock = threading.Lock()

    def forward_hook(module, inp, out=None): # hook is created on a module and is automatically fed input and output of that module
        eval_tsrs = (out,)
        with lock: # this is used for help with distributed computing. When we are using one device like now, this becomes non-essential
            nonlocal saved_layer # TODO: Check if nonlocals must be bound by a function
            saved_layer[eval_tsrs[0].device] = tuple(
                        eval_tsr.clone() for eval_tsr in eval_tsrs
                    ) # we are saving the output of the module to a global 
    hook = None
    try:
        hook = layer_which_needs_hook.register_forward_hook(forward_hook)
        output = forward_fn(inputs[0])
    finally:
        if hook is not None:
            hook.remove()
    if len(saved_layer) == 0:
        raise AssertionError("Forward hook did not obtain any outputs for given layer")
    return saved_layer
    

In [11]:
inputs = (inData,) # converting this to a tuple
inputs_layer = run_forward_get_layer_output(
                        predict_i,
                        inputs,
                        net.input_linear)
print(inputs_layer)

{device(type='cpu'): (tensor([[ 0.2989,  0.1808, -0.1334]], grad_fn=<CloneBackward>),)}


#### Understanding the first hook

The values can simply be derived as follows

In [12]:
neuron_index = 0
(net.input_linear.weight[neuron_index][0] *inputs[0][0][0] ) + (net.input_linear.weight[neuron_index][1] *inputs[0][0][1] ) \
+ net.input_linear.bias[neuron_index]

tensor(0.2989, grad_fn=<AddBackward0>)

*OR can be simply derived as*

In [13]:
neuron_index = 0
torch.matmul(inputs[0][0],net.input_linear.weight[neuron_index]) + net.input_linear.bias[neuron_index]

tensor(0.2989, grad_fn=<AddBackward0>)

In [14]:
neuron_index = 1
torch.matmul(inputs[0][0],net.input_linear.weight[neuron_index]) + net.input_linear.bias[neuron_index]

tensor(0.1808, grad_fn=<AddBackward0>)

In [15]:
neuron_index = 2
torch.matmul(inputs[0][0],net.input_linear.weight[neuron_index]) + net.input_linear.bias[neuron_index]

tensor(-0.1334, grad_fn=<AddBackward0>)

#### Obtaining output of linear layer  for baseline using first hook

In [16]:
base_inputs = (baseline,) # converting this to a tuple
baselines_layer = run_forward_get_layer_output(
                        predict_i,
                        base_inputs,
                        net.input_linear)
print(baselines_layer)

{device(type='cpu'): (tensor([[-0.0988,  0.0514,  0.1107]], grad_fn=<CloneBackward>),)}


#### Prepping to run Intergrated gradients

In [17]:
dim1 = torch.linspace(0, 1, num_steps).unsqueeze(-1) #[10, 1] i.e 10 columns , equally spaced points between 0 & 1

In [18]:
linear_output = list(inputs_layer.values())[0][0]
print(linear_output)
linear_output_base = list(baselines_layer.values())[0][0]
print(linear_output_base)

tensor([[ 0.2989,  0.1808, -0.1334]], grad_fn=<CloneBackward>)
tensor([[-0.0988,  0.0514,  0.1107]], grad_fn=<CloneBackward>)


In [19]:
delta_points_np = (linear_output_base + dim1 * (linear_output - linear_output_base)).detach().numpy() #(10, num_neuron). Here dim1 is broadcast
print("In the data points shown below , first row is the linear out of baseline and last row is the linear out of input Data")
print(delta_points_np)

In the data points shown below , first row is the linear out of baseline and last row is the linear out of input Data
[[-0.09882952  0.05136591  0.11071467]
 [-0.0546359   0.06574471  0.08358987]
 [-0.01044229  0.08012351  0.05646508]
 [ 0.03375132  0.09450231  0.02934029]
 [ 0.07794493  0.10888111  0.0022155 ]
 [ 0.12213854  0.12325991 -0.0249093 ]
 [ 0.16633216  0.13763872 -0.05203409]
 [ 0.21052575  0.1520175  -0.07915888]
 [ 0.25471938  0.1663963  -0.10628367]
 [ 0.29891297  0.1807751  -0.13340846]]


#### Second Hook for computing gradients for these 10 points 

The model actually takes in the forward. By adding a hook, we shortcuit the forward function of the model. Instead of computing the output of the linear layer, the hook returns the value we ask it return. This value is the one we computed as delta_points_np based on input & baseline's linear layer output

In [20]:
gradient_l = []
with torch.autograd.set_grad_enabled(True):
    # instead of starting from input, this hook here just returns the output of the first linear layer
    def layer_forward_hook(module, hook_inputs, hook_outputs=None):
        device = hook_outputs[0].device #cpu
        return scattered_inputs_dict[device]
    hook = net.input_linear.register_forward_hook(layer_forward_hook)
    for row in delta_points_np:
        delta_input = torch.tensor([row],requires_grad=True) 
        scattered_inputs = (delta_input,)
        scattered_inputs_dict = {
                        scattered_input[0].device: scattered_input
                        for scattered_input in scattered_inputs
                    }
        output = net(inData)
        target_Dp = output[0][target_index].unsqueeze(-1) # this extracts the probability of the class we are interested in 
        grads = torch.autograd.grad(target_Dp.unsqueeze(-1), delta_input)
        gradient_l.append(grads[0])
    hook.remove()

In [21]:
gradient_l

[tensor([[ 0.0000, -0.0266,  0.0033]]),
 tensor([[ 0.0000, -0.0266,  0.0033]]),
 tensor([[ 0.0000, -0.0267,  0.0033]]),
 tensor([[-0.0065, -0.0267,  0.0033]]),
 tensor([[-0.0065, -0.0267,  0.0033]]),
 tensor([[-0.0065, -0.0268,  0.0000]]),
 tensor([[-0.0065, -0.0268,  0.0000]]),
 tensor([[-0.0065, -0.0268,  0.0000]]),
 tensor([[-0.0065, -0.0268,  0.0000]]),
 tensor([[-0.0066, -0.0268,  0.0000]])]

We are creating a function that takes in a list of gradient tensors and returns the mean

In [22]:
grads[0]

tensor([[-0.0066, -0.0268,  0.0000]])

In [23]:
def getTensorMean(TensorList):
    tensor_2D = torch.stack(TensorList)
    return torch.mean(tensor_2D,axis=0)

In [24]:
gradient_l_mean = getTensorMean(gradient_l)
print("The means of gradients of 10 vectors between input and baseline")
print(gradient_l_mean)

The means of gradients of 10 vectors between input and baseline
tensor([[-0.0046, -0.0267,  0.0017]])


Next we take the difference between linear layer of input  and baseline. This is a straight line between input and baseline

In [25]:
diff = (linear_output-linear_output_base)
diff

tensor([[ 0.3977,  0.1294, -0.2441]], grad_fn=<SubBackward0>)

Multiplying it by the difference of the gradients

In [26]:
attributions = gradient_l_mean * diff
print("Attributions from simplified version of the code")
for att in attributions[0]:
    print("\t", round(att.item(),5))

Attributions from simplified version of the code
	 -0.00182
	 -0.00346
	 -0.0004


### Simplified  Library Code

We will be re-using the following from previous sections 
1. linear_output of input and base 
2. layer_forward_hook
Some of the key differences are
1. Instead of taking equally spaced points , we generate these points with gausian legrande function

#### Preparing Datapoints and target vectors 

In [27]:
step_sizes = list(0.5 * np.polynomial.legendre.leggauss(num_steps)[1])
alphas = list(0.5 * (1 + np.polynomial.legendre.leggauss(num_steps)[0]))

In [28]:
target = torch.cat(num_steps* [torch.tensor([target_index])]) #[num_steps] [10]
target = target.reshape(num_steps, 1) #[num_steps,1] [10, 1]

#### Running through Ig

In [29]:
delta_points =torch.cat([linear_output_base + alpha * (linear_output - linear_output_base) for alpha in alphas],dim=0).requires_grad_() #[10,3]

scattered_inputs is the value of the linear layer for each delta_point

In [30]:
scattered_inputs = (delta_points,)
scattered_inputs
scattered_inputs_dict = {
                scattered_input[0].device: scattered_input
                for scattered_input in scattered_inputs
            }
scattered_inputs_dict

{device(type='cpu'): tensor([[-0.0936,  0.0531,  0.1075],
         [-0.0720,  0.0601,  0.0942],
         [-0.0351,  0.0721,  0.0716],
         [ 0.0139,  0.0880,  0.0416],
         [ 0.0704,  0.1064,  0.0068],
         [ 0.1296,  0.1257, -0.0295],
         [ 0.1862,  0.1441, -0.0642],
         [ 0.2352,  0.1600, -0.0943],
         [ 0.2721,  0.1720, -0.1169],
         [ 0.2937,  0.1791, -0.1302]], grad_fn=<CatBackward>)}

As mentioned earlier the model takes actual input with n_dim , we will later shortcircuit it with scattered_inputs by using the hook. So we need to replicate our input n_steps times

In [31]:
inData_rep = torch.cat(10* [inData])

In [32]:
with torch.autograd.set_grad_enabled(True):
        # instead of starting from input, this hook here just returns the output of the first linear layer
        def layer_forward_hook(module, hook_inputs, hook_outputs=None):
            device = None
            if hook_outputs is not None and len(hook_outputs) > 0:
                device = hook_outputs[0].device #cpu
            else:
                params = list(module.parameters())
                device = params[0].device #cpu
            return scattered_inputs_dict[device]
        hook = net.input_linear.register_forward_hook(layer_forward_hook)
        output = net(inData_rep)
        hook.remove()
        target_Dp = torch.gather(output, 1, target)

In [33]:
grads = torch.autograd.grad(torch.unbind(target_Dp), delta_points) # torch.unbind removes one dimension

In [34]:
# flattening grads so that we can multilpy it with step-size
# calling contiguous to avoid `memory whole` problems
n_steps=10
scaled_grads = [
    grad.contiguous().view(n_steps, -1)
    * torch.tensor(step_sizes).view(n_steps, 1).to(grad.device)
    for grad in grads
]
scaled_grads

[tensor([[ 0.0000, -0.0009,  0.0001],
         [ 0.0000, -0.0020,  0.0002],
         [ 0.0000, -0.0029,  0.0004],
         [-0.0009, -0.0036,  0.0004],
         [-0.0010, -0.0039,  0.0005],
         [-0.0010, -0.0040,  0.0000],
         [-0.0009, -0.0036,  0.0000],
         [-0.0007, -0.0029,  0.0000],
         [-0.0005, -0.0020,  0.0000],
         [-0.0002, -0.0009,  0.0000]], dtype=torch.float64)]

In [35]:
from torch import Tensor
from typing import Tuple
def _reshape_and_sum(
    tensor_input: Tensor, num_steps: int, num_examples: int, layer_size: Tuple[int, ...]
) -> Tensor:
    # Used for attribution methods which perform integration
    # Sums across integration steps by reshaping tensor to
    # (num_steps, num_examples, (layer_size)) and summing over
    # dimension 0. Returns a tensor of size (num_examples, (layer_size))
    return torch.sum(
        tensor_input.reshape((num_steps, num_examples) + layer_size), dim=0
    )

In [36]:
# aggregates across all steps for each tensor in the input tuple
# total_grads has the same dimensionality as inputs
total_grads = [
    _reshape_and_sum(
        tensor_input=scaled_grad, num_steps=n_steps, num_examples=grad.shape[0] // n_steps, layer_size=grad.shape[1:] # // is integer division
    )
    for (scaled_grad, grad) in zip(scaled_grads, grads)
]
print ("Total gradients after smoothing it out a bit")
for tg in total_grads[0][0]:
    print("\t",tg.item())

Total gradients after smoothing it out a bit
	 -0.005100629017714776
	 -0.02673791069551692
	 0.0016568139610821965


In [39]:
print ("Comparing it with the mean we generated with barebones method in cell 24")
for tg in gradient_l_mean[0]:
    print("\t",tg.item())

Comparing it with the mean we generated with barebones method in cell 24
	 -0.0045677898451685905
	 -0.026735519990324974
	 0.0016566950362175703


In [40]:
# computes attribution for each tensor in input tuple
# attributions has the same dimensionality as inputs
attributions_expanded_code = tuple(
    total_grad * (inputRow - base)
    for total_grad, inputRow, base in zip(total_grads, linear_output , linear_output_base)
)


In [43]:
print("Attributions from library code")
for att in attributions_dim_linear_layer[0]:
    print("\t", round(att.item(),5))

print("Attributions from simplified version of the code")
for att in attributions[0]:
    print("\t", round(att.item(),5))
    
print("attributions from simplified library code")
for att in attributions_expanded_code[0][0]:
    print("\t", round(att.item(),5))

Attributions from library code
	 -0.00203
	 -0.00346
	 -0.0004
Attributions from simplified version of the code
	 -0.00182
	 -0.00346
	 -0.0004
attributions from simplified library code
	 -0.00203
	 -0.00346
	 -0.0004


## Study Materials
1. https://github.com/Paperspace/PyTorch-101-Tutorial-Series
2. https://blog.paperspace.com/pytorch-101-advanced/ 
3. https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/
4. https://blog.paperspace.com/pytorch-memory-multi-gpu-debugging/
5. https://blog.paperspace.com/pytorch-hooks-gradient-clipping-debugging/