<a href="https://colab.research.google.com/github/danielhou13/cogs402longformer/blob/main/src/CaptumLongformerSequenceClassificationaggregate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook finds the aggregate attributions for both the postive and negative class over the entire dataset.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Import dependencies

In [2]:
import sys
sys.path.append('/content/drive/My Drive/{}'.format("cogs402longformer/"))

In [3]:
pip install transformers --quiet

In [4]:
pip install captum --quiet

In [5]:
pip install datasets --quiet

In [6]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [7]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [8]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

In [9]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
model_path = 'danielhou13/longformer-finetuned_papers_v2'
#model_path = 'danielhou13/longformer-finetuned-news-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Create functions that give us the input ids and the position ids for the text we want to examine

In [10]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [11]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [12]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)
    position_ids = torch.arange(seq_length, dtype=torch.long, device=device)
    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

Import dataset and take one example from it for testing purposes

Here we import the papers dataset

In [13]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402dataset")["test"]

Using custom data configuration danielhou13--cogs402dataset-144b958ac1a53abb
Reusing dataset parquet (/root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8)


  0%|          | 0/2 [00:00<?, ?it/s]

In [14]:
# cogs402_ds2 = load_dataset('hyperpartisan_news_detection', 'bypublisher')['validation']
# val_size = 5000
# val_indices = np.random.randint(0, len(cogs402_ds2), val_size)
# val_ds = cogs402_ds2.select(val_indices)
# labels2 = map(int, val_ds['hyperpartisan'])
# labels2 = list(labels2)
# val_ds = val_ds.add_column("labels", labels2)

In [15]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

Perform Layer Integrated Gradients using the longformer's embeddings. This can easily be adjusted to use longformer word embeddings and longformer position embeddings. Note that the longformer does not use token type embeddings.

In [45]:
# lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)
lig2 = LayerIntegratedGradients(custom_forward, \
                                [model.longformer.embeddings.word_embeddings, \
                                 model.longformer.embeddings.position_embeddings])

  "Multiple layers provided. Please ensure that each layer is"


In [17]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.linalg.norm(attributions)
    return attributions

We can find the aggregate total for each token over the entire dataset in order to find which words have the highest and lowest attributions.

In [18]:
# from tqdm import tqdm
# aggregate_attrib = []
# aggregation_function = {'attribution': 'sum'}

# for i in tqdm(range(len(cogs402_ds))):

#   #get input ids, position ids and attention mask for integrated gradients
#   text = cogs402_ds[i]['text']
#   input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
#   position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
#   attention_mask = construct_attention_mask(input_ids)

#   indices = input_ids[0].detach().tolist()
#   all_tokens = tokenizer.convert_ids_to_tokens(indices)

#   # perform integrated gradients
#   attributions = lig.attribute(inputs=input_ids,
#                                     baselines=ref_input_ids,
#                                     additional_forward_args=(position_ids, attention_mask),
#                                     target=1,
#                                     n_steps=20,
#                                     internal_batch_size = 2)
  
#   #get the attributions
#   attributions_sum = summarize_attributions(attributions)
  
#   #convert into dataframe
#   d = {"tokens":all_tokens, "attribution":attributions_sum[:len(all_tokens)].cpu()}  
#   df_attrib = pd.DataFrame(d)

#   #aggregate the duplicate tokens
#   df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_function)

#   #add to list of dataframes
#   aggregate_attrib.append(df_attrib)

100%|██████████| 1070/1070 [3:09:49<00:00, 10.64s/it]


Here we have the implementation for the multi-embedding version

In [46]:
from tqdm import tqdm
aggregate_attrib = []
aggregate_pos = []

aggregation_function = {'attribution': 'sum'}

for i in tqdm(range(len(cogs402_ds)), position = 0, leave = True):
  
  #get input_ids, position_ids, and the attention masks for the integrated gradients
  text = cogs402_ds[i]['text']

  input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
  position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
  attention_mask = construct_attention_mask(input_ids)

  indices = input_ids[0].detach().tolist()
  all_tokens = tokenizer.convert_ids_to_tokens(indices)

  # compute integrated gradients
  attributions2 = lig2.attribute(inputs=(input_ids, position_ids),
                               baselines=(ref_input_ids, ref_position_ids),
                               target=1,
                               additional_forward_args=(attention_mask),
                               n_steps=25,
                               internal_batch_size = 2)
  
  # get the attributions for the words and position ids
  attributions_word = summarize_attributions(attributions2[0])
  attributions_position = summarize_attributions(attributions2[1])

  # convert them both into dataframes 
  d = {"tokens":all_tokens, "attribution":attributions_word[:len(all_tokens)].cpu()}  
  d2 = {"tokens":all_tokens, "attribution":attributions_position[:len(all_tokens)].cpu()}  
  
  df_attrib = pd.DataFrame(d)
  df_attrib2 = pd.DataFrame(d2)

  #aggregate the attributions for duplicate tokens
  df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_function)
  df_attrib2 = df_attrib2.groupby(df_attrib2['tokens']).aggregate(aggregation_function)

  aggregate_attrib.append(df_attrib)
  aggregate_pos.append(df_attrib2)

100%|██████████| 1070/1070 [3:57:06<00:00, 13.30s/it]


To get the aggregate attributions for every token over the entire dataset, we have two groups of lists of dataframes, one of the positive labels, one for the negative labels. We sum up the attributions of duplicate tokens for all examples in both groups of lists and divide by the number of items in each list.

In [20]:
def combinedataframe(listframes, aggregation_func):
  df_attrib = pd.concat(listframes)
  df_attrib = df_attrib.reset_index(level=0)
  df_attrib = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_func)
  df_attrib['attribution'] = df_attrib['attribution'].div(len(listframes))
  highest_attrib_tokens_all = df_attrib.sort_values(by=['attribution'], ascending=False)
  return highest_attrib_tokens_all

In [21]:
# #longformer embeddings/word embeddings if multi-embedding
# df_attrib_zero = combinedataframe(aggregate_attrib_zero, aggregation_function)
# df_attrib_ones = combinedataframe(aggregate_attrib_ones, aggregation_function)

# # position embeddings for multi-embedding
# df_pos_zero = combinedataframe(aggregate_pos_zero, aggregation_function)
# df_pos_ones = combinedataframe(aggregate_pos_ones, aggregation_function)

In [47]:
df_attrib = combinedataframe(aggregate_attrib, aggregation_function)
df_attrib_pos = combinedataframe(aggregate_pos, aggregation_function)

Here we get the attributions for the positive class. Here we are only showing the top 10 higest attributions, in other words, the tokens that have the most influence in the model predicting positive.

In [28]:
df_attrib[:15]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
.,0.28601
Ġlearning,0.161365
Ġneural,0.12117
Ġthe,0.107979
",",0.107825
Ġdata,0.087721
Ġtraining,0.052719
Ġto,0.051323
ĠAI,0.048482
Ġdataset,0.047799


In [None]:
df_attrib[:15]

In [None]:
df_attrib_pos[:15]

Here we get the attributions for the negative class. We are once again only showing the top 10 attributions, the tokens that have the most influence in the model predicting negative.

In [43]:
df_attrib[:-14:-1]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
Ġprogramming,-0.115309
Ġprogram,-0.073491
Ġprograms,-0.069745
Ġlanguages,-0.069667
Ġlanguage,-0.052829
Ġcode,-0.041096
Ġsoftware,-0.035409
Ġ.,-0.034809
ĠProgramming,-0.029539
Ġcompiler,-0.026975


In [None]:
df_attrib[:-14:-1]

In [None]:
df_attrib_pos[:-14:-1]

Note: if you wish to find the aggregate attributions irrespective of the example's class, you can combine the dataframes and use the aggregation function.

Save the pandas dataframe into a csv to access it in the future without having to run through the entire dataset. Change the file name from papers to the dataset used.

In [25]:
# df_attrib_zero.to_csv('/content/drive/MyDrive/cogs402longformer/results/longformer_emb_zero_papers.csv')  
# df_attrib_ones.to_csv('/content/drive/MyDrive/cogs402longformer/results/longformer_emb_ones_papers.csv')  

# # df_attrib_zero.to_csv('/content/drive/MyDrive/cogs402longformer/results/word_emb_attrib_zero_papers.csv')  
# # df_attrib_ones.to_csv('/content/drive/MyDrive/cogs402longformer/results/word_emb_attrib_ones_papers.csv')  
# # df_pos_zero.to_csv('/content/drive/MyDrive/cogs402longformer/results/pos_emb_attrib_zero_papers.csv')  
# # df_pos_ones.to_csv('/content/drive/MyDrive/cogs402longformer/results/pos_emb_attrib_ones_papers.csv')  

In [48]:
# df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/longformer_emb_papers.csv')

df_attrib.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/word_emb_papers.csv')
df_attrib_pos.to_csv('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/pos_emb_papers.csv')