<a href="https://colab.research.google.com/github/bhadreshpsavani/100daysofNLP/blob/main/notebooks/Distilbert_Model_Interpretability_With_Captum.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model Interpretability with Captum

In NLP We build model with great performance but Its important to know how model is making prediction. What kind of text is focused and observed by neuron/layer/model.

If its pytorch model than Captum is decent library.

In [1]:
# install dependency
!pip install -q transformers captum

[K     |████████████████████████████████| 2.6 MB 12.1 MB/s 
[K     |████████████████████████████████| 1.4 MB 36.8 MB/s 
[K     |████████████████████████████████| 3.3 MB 43.5 MB/s 
[K     |████████████████████████████████| 895 kB 46.2 MB/s 
[K     |████████████████████████████████| 636 kB 50.4 MB/s 
[?25h

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import torch
import torch.nn as nn

from transformers import AutoTokenizer, AutoModelForSequenceClassification

from captum.attr import visualization as viz
from captum.attr import LayerConductance, LayerIntegratedGradients

In [3]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

device(type='cuda', index=0)

In [4]:
# load model
model_path = "bhadresh-savani/distilbert-base-uncased-emotion"
model = AutoModelForSequenceClassification.from_pretrained(model_path)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

Downloading:   0%|          | 0.00/783 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/291 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [5]:
text = "It is important to us to include, empower and support humans of all kinds."
inputs = tokenizer(text, return_tensors='pt')
all_tokens = tokenizer.tokenize(text)
inputs, all_tokens

({'input_ids': tensor([[  101,  2009,  2003,  2590,  2000,  2149,  2000,  2421,  1010,  7861,
          11452,  1998,  2490,  4286,  1997,  2035,  7957,  1012,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])},
 ['it',
  'is',
  'important',
  'to',
  'us',
  'to',
  'include',
  ',',
  'em',
  '##power',
  'and',
  'support',
  'humans',
  'of',
  'all',
  'kinds',
  '.'])

In [6]:
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']

In [7]:
def predict(inputs):
  inputs = inputs.to(device)
  output = model(inputs)
  return output['logits']

In [8]:
scores = predict(inputs=input_ids)
scores

tensor([[-1.5874,  6.0608, -0.4590, -1.1665, -2.2359, -2.7324]],
       device='cuda:0', grad_fn=<AddmmBackward>)

In [9]:
lig = LayerIntegratedGradients(predict, model.distilbert.embeddings)

attributions, delta = lig.attribute(inputs=input_ids,
                                  target=1,
                                  return_convergence_delta=True)

In [10]:
# A helper function to summarize attributions for each word token in the sequence.
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    return attributions

In [11]:
attributions_sum = summarize_attributions(attributions)

In [12]:
position_vis = viz.VisualizationDataRecord(
                        attributions_sum,
                        torch.max(torch.softmax(scores[0], dim=0)),
                        torch.argmax(scores),
                        torch.argmax(scores),
                        str(1),
                        attributions_sum.sum(),       
                        all_tokens,
                        delta)

print('\033[1m', 'Visualizations', '\033[0m')
viz.visualize_text([position_vis])

[1m Visualizations [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (1.00),1.0,2.46,"it is important to us to include , em ##power and support humans of all kinds ."
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (1.00),1.0,2.46,"it is important to us to include , em ##power and support humans of all kinds ."
,,,,


This Visulization Shows how importance is given to text with respect to target  

In [13]:
model.config.id2label

{0: 'sadness', 1: 'joy', 2: 'love', 3: 'anger', 4: 'fear', 5: 'surprise'}

## Multi-Embedding attribution:

