<a href="https://colab.research.google.com/github/danielhou13/cogs402longformer/blob/main/src/Attribution_Longformer_aggregate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook performs [Integrated Gradients](https://arxiv.org/abs/1703.01365) over the entire dataset and aggregates all of the attributions with respect to the positive class. We aggregate using either the complete longformer embeddings, or the word and position embeddings. The notebook outputs a csv file containg tokens and the sum of the attributions over the entire dataset and is used in the [longformer embedding](https://colab.research.google.com/drive/15Zquqi72N2NNusEUXRN53bCKE7qj8KAh?usp=sharing) and [word+positon+token_type embeddings](https://colab.research.google.com/drive/1pptTYAJGp7tl0BhVQoTD5RGyQMEVF766) notebooks

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import dependencies

In [None]:
import sys
sys.path.append('/content/drive/My Drive/{}'.format("cogs402longformer/"))

In [None]:
pip install transformers --quiet

[K     |████████████████████████████████| 4.7 MB 4.3 MB/s 
[K     |████████████████████████████████| 596 kB 85.7 MB/s 
[K     |████████████████████████████████| 6.6 MB 55.7 MB/s 
[K     |████████████████████████████████| 101 kB 13.3 MB/s 
[?25h

In [None]:
pip install captum --quiet

[?25l[K     |▎                               | 10 kB 24.2 MB/s eta 0:00:01[K     |▌                               | 20 kB 8.5 MB/s eta 0:00:01[K     |▊                               | 30 kB 7.4 MB/s eta 0:00:01[K     |█                               | 40 kB 7.2 MB/s eta 0:00:01[K     |█▏                              | 51 kB 3.4 MB/s eta 0:00:01[K     |█▍                              | 61 kB 4.1 MB/s eta 0:00:01[K     |█▋                              | 71 kB 4.3 MB/s eta 0:00:01[K     |█▉                              | 81 kB 4.5 MB/s eta 0:00:01[K     |██                              | 92 kB 5.1 MB/s eta 0:00:01[K     |██▎                             | 102 kB 4.1 MB/s eta 0:00:01[K     |██▌                             | 112 kB 4.1 MB/s eta 0:00:01[K     |██▊                             | 122 kB 4.1 MB/s eta 0:00:01[K     |███                             | 133 kB 4.1 MB/s eta 0:00:01[K     |███▏                            | 143 kB 4.1 MB/s eta 0:00:01[K    

In [None]:
pip install datasets --quiet

[K     |████████████████████████████████| 365 kB 4.1 MB/s 
[K     |████████████████████████████████| 141 kB 91.6 MB/s 
[K     |████████████████████████████████| 115 kB 93.6 MB/s 
[K     |████████████████████████████████| 212 kB 71.9 MB/s 
[K     |████████████████████████████████| 127 kB 65.2 MB/s 
[?25h

In [None]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [None]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

## Import model

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
model_path = 'danielhou13/longformer-finetuned_papers_v2'
# model_path = 'danielhou13/longformer-finetuned-news-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")


Downloading config.json:   0%|          | 0.00/0.99k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/567M [00:00<?, ?B/s]

Some weights of the model checkpoint at danielhou13/longformer-finetuned_papers_v2 were not used when initializing LongformerForSequenceClassification: ['longformer.embeddings.position_ids']
- This IS expected if you are initializing LongformerForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading vocab.json:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/694 [00:00<?, ?B/s]

## Import Dataset

Here we import the papers dataset

In [None]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402dataset")["validation"]

Downloading:   0%|          | 0.00/739 [00:00<?, ?B/s]

Using custom data configuration danielhou13--cogs402dataset-144b958ac1a53abb


Downloading and preparing dataset None/None (download: 157.87 MiB, generated: 311.56 MiB, post-processed: Unknown size, total: 469.43 MiB) to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/132M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

Here we import the news dataset

In [None]:
# cogs402_ds = load_dataset("danielhou13/cogs402dataset2")["validation"]

## Getting the Attributions

For our Integrated Gradients, we need to create a custom forward pass of our model. Specifically we want the softmaxed logits which represent the probability of predicting that class.

In [None]:
def predict(inputs, position_ids=None, token_type_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   token_type_ids=token_type_ids,
                   attention_mask=attention_mask)
    return output.logits

In [None]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, token_type_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   token_type_ids=token_type_ids,
                   attention_mask=attention_mask)
    return torch.softmax(preds, dim = 1)

Create functions that give us the input ids, position ids and token_type_ids for the text we want to examine. It also creates a baseline for use in our integrated gradients.

**Note: The function used to create the token type ids is the exact same as the longformer implementation when no token type ids. It is not necessary to create token_type_ids unless you are doing Integrated Gradients using multi-embedding as we need the baselines.**

In [None]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [None]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids

def construct_input_ref_token_type_pair(input_ids):
    seq_len = input_ids.size(1)

    # same as the tensor the model creates when you do not pass in token_type_ids as input.
    token_type_ids = torch.zeros(seq_len, dtype=torch.long, device=device).unsqueeze(0).expand_as(input_ids)
    
    ref_token_type_ids = torch.zeros_like(token_type_ids, device=device)

    return token_type_ids, ref_token_type_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

Perform Layer Integrated Gradients using the longformer's complete embeddings. This can easily be adjusted to use longformer word embeddings, position and token_type embeddings, which we will call multi-embedding.

In [None]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)
lig2 = LayerIntegratedGradients(custom_forward, \
                                [model.longformer.embeddings.word_embeddings, \
                                 model.longformer.embeddings.position_embeddings,\
                                 model.longformer.embeddings.token_type_embeddings])

  "Multiple layers provided. Please ensure that each layer is"


Helper function to sum the attributions and normalize into an array of length (seq_len).

In [None]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.linalg.norm(attributions)
    return attributions

We iterate over the entire dataset, getting the input_ids, the baseline input_ids, the position_ids and the attention mask. We then perform integrated gradients, summing the attributions, and finally creating a dataframe to store the attributions and respective tokens. We remove thhe whitespace character from every token. After we create the dataframe, get the aggregate attributions for each token in the example and save it in a list of dataframes.

In [None]:
from tqdm import tqdm
aggregate_attrib = []
aggregation_function = {'attribution': 'sum'}

for i in tqdm(range(len(cogs402_ds))):

  #get input ids, position ids and attention mask for integrated gradients
  text = cogs402_ds[i]['text']
  input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
  position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
  attention_mask = construct_attention_mask(input_ids)

  indices = input_ids[0].detach().tolist()
  all_tokens = tokenizer.convert_ids_to_tokens(indices)
  
  # perform integrated gradients
  # second additional argument is None because we don't need token_type_ids here
  # allows us to use the same custom forward for complete embedding and multi-embedding
  attributions = lig.attribute(inputs=input_ids,
                                    baselines=ref_input_ids,
                                    additional_forward_args=(position_ids, None, attention_mask),
                                    target=1,
                                    n_steps=50,
                                    internal_batch_size = 2)
  
  #get the attributions
  attributions_sum = summarize_attributions(attributions)
  
  #convert into dataframe
  d = {"tokens":all_tokens, "attribution":attributions_sum[:len(all_tokens)].cpu()}  
  df_attrib = pd.DataFrame(d)

  #aggregate the duplicate tokens
  df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_function)

  #add to list of dataframes
  aggregate_attrib.append(df_attrib)

100%|██████████| 1070/1070 [12:56:06<00:00, 43.52s/it]


Here we have the implementation for the multi-embedding version. The only difference is that we have two attributions that we want to find the aggregate for, the position and word embeddings. 

**Note: despite passing in the token_type_ids and the baseline as inputs, we will not be able to get attributions for it as the input and the baseline are the same. It returns a tensor of nan values.**

We create dataframes for both the word and position attributions to store the attributions and their respective token. We then aggregate the attributions based on the token for both dataframes. Finally we appendthe position and word dataframes in their own separate list of dataframes.

In [None]:
# from tqdm import tqdm
# aggregate_attrib = []
# aggregate_pos = []

# aggregation_function = {'attribution': 'sum'}

# for i in tqdm(range(len(cogs402_ds)), position = 0, leave = True):
  
#   #get input_ids, position_ids, and the attention masks for the integrated gradients
#   text = cogs402_ds[i]['text']

#   input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
#   token_type_ids, ref_token_type_ids = construct_input_ref_token_type_pair(input_ids)
#   position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
#   attention_mask = construct_attention_mask(input_ids)

#   indices = input_ids[0].detach().tolist()
#   all_tokens = tokenizer.convert_ids_to_tokens(indices)

#   # compute integrated gradients
#   attributions2 = lig2.attribute(inputs=(input_ids, position_ids, token_type_ids),
#                                baselines=(ref_input_ids, ref_position_ids, ref_token_type_ids),
#                                target=1,
#                                additional_forward_args=attention_mask,
#                                n_steps=20,
#                                internal_batch_size = 2)
  
#   # get the attributions for the words and position ids
#   attributions_word = summarize_attributions(attributions2[0])
#   attributions_position = summarize_attributions(attributions2[1])

#   # convert them both into dataframes 
#   d = {"tokens":all_tokens, "attribution":attributions_word[:len(all_tokens)].cpu()}  
#   d2 = {"tokens":all_tokens, "attribution":attributions_position[:len(all_tokens)].cpu()}  
  
#   df_attrib = pd.DataFrame(d)
#   df_attrib2 = pd.DataFrame(d2)

#   #aggregate the attributions for duplicate tokens
#   df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_function)
#   df_attrib2 = df_attrib2.groupby(df_attrib2['tokens']).aggregate(aggregation_function)

#   aggregate_attrib.append(df_attrib)
#   aggregate_pos.append(df_attrib2)

To get the aggregate attributions for every token over the entire dataset, we concatenate the list of dataframes we stored, sum up the attributions of duplicate tokens and divide by the number of items in each list.

In [None]:
def combinedataframe(listframes, aggregation_func):
  df_attrib = pd.concat(listframes)
  df_attrib = df_attrib.reset_index(level=0)
  df_attrib['tokens'] = df_attrib['tokens'].str.replace('Ġ', '')
  df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_func)
  df_attrib['attribution'] = df_attrib['attribution'].div(len(listframes))
  highest_attrib_tokens_all = df_attrib.sort_values(by=['attribution'], ascending=False).reset_index()
  return highest_attrib_tokens_all

In [None]:
df_attrib = combinedataframe(aggregate_attrib, aggregation_function)
# df_attrib_pos = combinedataframe(aggregate_pos, aggregation_function)

Here we are only showing the top 15 highest attributions, in other words, the tokens that have the most influence in the model predicting positive. If you are running integrated gradients using the longformer embeddings, this will be attributions for those embeddings. If you are running Integrated Gradients using word, position, and token_type embeddings, these will be the word embeddings.

In [None]:
df_attrib[:15]

Unnamed: 0,tokens,attribution
0,learning,0.270925
1,neural,0.159887
2,data,0.12888
3,.,0.108019
4,",",0.092537
5,AI,0.081176
6,training,0.080792
7,dataset,0.071329
8,the,0.067022
9,algorithms,0.058013


Here we are showing the 15 highest attributions for the position embeddings. Note that running integrated gradients using the longformer embeddings rather than the word, position and token_type embeddings will not have this output.

In [None]:
# df_attrib_pos[:15]

Here we are only showing the top 15 lowest attributions, in other words, the tokens that have the most influence in the model predicting negative. If you are running integrated gradients using the longformer embeddings, this will be attributions for those embeddings. If you are running Integrated Gradients using word, position and token_type embeddings, these will be the word embeddings.

In [None]:
df_attrib[:-14:-1].reset_index(drop=True)

Unnamed: 0,tokens,attribution
0,programming,-0.120735
1,program,-0.088943
2,programs,-0.082175
3,languages,-0.072262
4,language,-0.053761
5,code,-0.052443
6,software,-0.038568
7,compiler,-0.032076
8,Programming,-0.031495
9,syntax,-0.02902


Here we are showing the 15 lowest attributions for the position embeddings. Note that running integrated gradients using the longformer embeddings rather than the word, position and token_type embeddings will not have this output.

In [None]:
# df_attrib_pos[:-14:-1]

Save the pandas dataframe into a csv to access it in the future without having to run through the entire dataset. Change the path and file name to one fitting your project.

In [None]:
# # longformer embeddings
df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/longformer_emb_papers.csv', index=False)
# df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/news/news_attributions/longformer_emb_news.csv')

# # Word + position embeddings for the papers dataset
# df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/word_emb_papers.csv')
# df_attrib_pos.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/pos_emb_papers.csv')

# # # Word + position embeddings for the news dataset
# df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/news/news_attributions/word_emb_news.csv')
# df_attrib_pos.to_csv('/content/drive/MyDrive/cogs402longformer/results/news/news_attributions/pos_emb_news.csv')