<a href="https://colab.research.google.com/github/danielhou13/cogs402longformer/blob/main/src/CaptumLongformerSequenceClassificationaggregate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook performs [Integrated Gradients](https://arxiv.org/abs/1703.01365) over the entire dataset and aggregates all of the attributions with respect to the positive class. The notebook outputs a csv file containg tokens and the sum of the attributions over the entire dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Import dependencies

In [None]:
import sys
sys.path.append('/content/drive/My Drive/{}'.format("cogs402longformer/"))

In [None]:
pip install transformers --quiet

In [None]:
pip install captum --quiet

In [None]:
pip install datasets --quiet

In [None]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [None]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

## Import model

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
# model_path = 'danielhou13/longformer-finetuned_papers_v2'
model_path = 'danielhou13/longformer-finetuned-news-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")


## Import Dataset

Here we import the papers dataset

In [None]:
from datasets import load_dataset
import numpy as np
# cogs402_ds = load_dataset("danielhou13/cogs402dataset")["test"]

In [None]:
cogs402_ds = load_dataset("danielhou13/cogs402dataset2")["validation"]

Using custom data configuration danielhou13--cogs402dataset2-52067477e0d49a06
Reusing dataset parquet (/root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset2-52067477e0d49a06/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8)


  0%|          | 0/2 [00:00<?, ?it/s]

## Getting the Attributions

For our Integrated Gradients, we need to create a custom forward pass of our model. Specifically we want the softmaxed logits which represent the probability of predicting that class.

In [None]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [None]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

Create functions that give us the input ids and the position ids for the text we want to examine. It also creates a baseline for use in our integrated gradients.

In [None]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [None]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id
    
    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

Perform Layer Integrated Gradients using the longformer's embeddings. This can easily be adjusted to use longformer word embeddings and longformer position embeddings.

In [None]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)
lig2 = LayerIntegratedGradients(custom_forward, \
                                [model.longformer.embeddings.word_embeddings, \
                                 model.longformer.embeddings.position_embeddings])

  "Multiple layers provided. Please ensure that each layer is"


Helper function to sum the attributions and normalize into an array of length (seq_len)

In [None]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.linalg.norm(attributions)
    return attributions

We iterate over the entire dataset, getting the input_ids, position_ids and their baselines, performing integrated gradients, summing the attribtuions, and finally creating a dataframe to store the attributions and respective tokens. After we create the dataframe, get the aggregate attributions for each token and save it in a list of dataframes.

In [None]:
# from tqdm import tqdm
# aggregate_attrib = []
# aggregation_function = {'attribution': 'sum'}

# for i in tqdm(range(len(cogs402_ds))):
#   #get input ids, position ids and attention mask for integrated gradients
#   text = cogs402_ds[i]['text']
#   input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
#   position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
#   attention_mask = construct_attention_mask(input_ids)

#   indices = input_ids[0].detach().tolist()
#   all_tokens = tokenizer.convert_ids_to_tokens(indices)

#   # perform integrated gradients
#   attributions = lig.attribute(inputs=input_ids,
#                                     baselines=ref_input_ids,
#                                     additional_forward_args=(position_ids, attention_mask),
#                                     target=1,
#                                     n_steps=25,
#                                     internal_batch_size = 2)
  
#   #get the attributions
#   attributions_sum = summarize_attributions(attributions)
  
#   #convert into dataframe
#   d = {"tokens":all_tokens, "attribution":attributions_sum[:len(all_tokens)].cpu()}  
#   df_attrib = pd.DataFrame(d)

#   #aggregate the duplicate tokens
#   df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_function)

#   #add to list of dataframes
#   aggregate_attrib.append(df_attrib)

Here we have the implementation for the multi-embedding version. The only difference is that we have two attributions, the position and word embeddings. We create dataframes for both attributions and aggregate attributions based on tokens again. We also save the position and word dataframes in their own separate list.

In [None]:
from tqdm import tqdm
aggregate_attrib = []
aggregate_pos = []

aggregation_function = {'attribution': 'sum'}

for i in tqdm(range(len(cogs402_ds)), position = 0, leave = True):
  
  #get input_ids, position_ids, and the attention masks for the integrated gradients
  text = cogs402_ds[i]['text']

  input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
  position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
  attention_mask = construct_attention_mask(input_ids)

  indices = input_ids[0].detach().tolist()
  all_tokens = tokenizer.convert_ids_to_tokens(indices)

  # compute integrated gradients
  attributions2 = lig2.attribute(inputs=(input_ids, position_ids),
                               baselines=(ref_input_ids, ref_position_ids),
                               target=1,
                               additional_forward_args=(attention_mask),
                               n_steps=20,
                               internal_batch_size = 2)
  
  # get the attributions for the words and position ids
  attributions_word = summarize_attributions(attributions2[0])
  attributions_position = summarize_attributions(attributions2[1])

  # convert them both into dataframes 
  d = {"tokens":all_tokens, "attribution":attributions_word[:len(all_tokens)].cpu()}  
  d2 = {"tokens":all_tokens, "attribution":attributions_position[:len(all_tokens)].cpu()}  
  
  df_attrib = pd.DataFrame(d)
  df_attrib2 = pd.DataFrame(d2)

  #aggregate the attributions for duplicate tokens
  df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_function)
  df_attrib2 = df_attrib2.groupby(df_attrib2['tokens']).aggregate(aggregation_function)

  aggregate_attrib.append(df_attrib)
  aggregate_pos.append(df_attrib2)

100%|██████████| 2500/2500 [4:37:40<00:00,  6.66s/it]


To get the aggregate attributions for every token over the entire dataset, we concatenate the list of dataframes we stored, sum up the attributions of duplicate tokens and divide by the number of items in each list.

In [None]:
def combinedataframe(listframes, aggregation_func):
  df_attrib = pd.concat(listframes)
  df_attrib = df_attrib.reset_index(level=0)
  df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_func)
  df_attrib['attribution'] = df_attrib['attribution'].div(len(listframes))
  highest_attrib_tokens_all = df_attrib.sort_values(by=['attribution'], ascending=False)
  return highest_attrib_tokens_all

In [None]:
df_attrib = combinedataframe(aggregate_attrib, aggregation_function)
df_attrib_pos = combinedataframe(aggregate_pos, aggregation_function)

Here we are only showing the top 15 highest attributions, in other words, the tokens that have the most influence in the model predicting positive. If you are running integrated gradients using the longformer embeddings, this will be attributions for those embeddings. If you are running Integrated Gradients using word and position embeddings, these will be the word embeddings.

In [None]:
df_attrib[:15]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
p,0.540027
>,0.337608
Ċ,0.283739
s,0.224551
Ġa,0.198184
a,0.19074
82,0.183168
",",0.105804
Ġof,0.097827
"=""",0.08302


Here we are showing the 15 highest attributions for the position embeddings. Note that running integrated gradients using the longformer embeddings rather than the word and position embeddings will not have this output.

In [None]:
df_attrib_pos[:15]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
",",0.24801
Ġ,0.243847
Ġthe,0.236153
Ċ,0.199232
p,0.159189
<,0.145755
-,0.105586
Ġto,0.09988
82,0.098015
Ġa,0.095155


Here we are only showing the top 15 lowest attributions, in other words, the tokens that have the most influence in the model predicting negative. If you are running integrated gradients using the longformer embeddings, this will be attributions for those embeddings. If you are running Integrated Gradients using word and position embeddings, these will be the word embeddings.

In [None]:
df_attrib[:-14:-1]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
Ġ,-0.646627
Ġto,-0.555069
;,-0.321682
-,-0.308708
<,-0.160664
</,-0.153974
Ġbe,-0.10043
&,-0.06816
external,-0.065767
Ġ&,-0.06432


Here we are showing the 15 lowest attributions for the position embeddings. Note that running integrated gradients using the longformer embeddings rather than the word and position embeddings will not have this output.

In [None]:
df_attrib_pos[:-14:-1]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
<s>,-0.733773
.</,-0.05895
>,-0.050809
&,-0.044064
Ġ..........,-0.041246
"=""",-0.022656
#,-0.016449
""">",-0.013081
external,-0.012884
</s>,-0.008897


Save the pandas dataframe into a csv to access it in the future without having to run through the entire dataset. Change the path and file name to one fitting your project.

In [None]:
# # longformer embeddings
# df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/longformer_emb_papers.csv')
# df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/news/news_attributions/longformer_emb_news.csv')

# # Word + position embeddings for the papers dataset
# df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/word_emb_papers.csv')
# df_attrib_pos.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/pos_emb_papers.csv')

# # Word + position embeddings for the news dataset
df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/news/news_attributions/word_emb_news.csv')
df_attrib_pos.to_csv('/content/drive/MyDrive/cogs402longformer/results/news/news_attributions/pos_emb_news.csv')