<a href="https://colab.research.google.com/github/danielhou13/cogs402longformer/blob/main/src/CaptumLongformerSequenceClassificationMultiembedding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook adapts the [Captum tutorial for question answering](https://captum.ai/tutorials/Bert_SQUAD_Interpret) and refactors it into the longformer sequence classification task. Specifically, this notebook focuses on using the word embeddings, position embeddings and token type embeddings, referred to as multi-embedding, to get attributions for the examples of your choice, or the entire dataset if needed.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import sys
sys.path.append('/content/drive/My Drive/{}'.format("cogs402longformer/"))

###Import and install dependencies

In [None]:
pip install transformers --quiet

In [None]:
pip install captum --quiet

In [None]:
pip install datasets --quiet

In [None]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification, BertConfig

from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

##Import Model

Here we are importing the model and tokenizer and letting the model use our GPU to run. Please change model path, and tokenizer to whichever one you wish to use.

In [None]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
model_path = 'danielhou13/longformer-finetuned_papers_v2'
#model_path = 'danielhou13/longformer-finetuned-new-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

### Import Dataset

Here we import the papers dataset.

In [None]:
from datasets import load_dataset
cogs402_ds = load_dataset("danielhou13/cogs402dataset")["test"]

Using custom data configuration danielhou13--cogs402dataset-144b958ac1a53abb
Reusing dataset parquet (/root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8)


  0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
testval = 976
text = cogs402_ds['text'][testval]
label = cogs402_ds['labels'][testval]
print(label)

1


## Getting the Attributions

Create functions that give us the input ids, position ids and the token_type_ids for the text we want to examine along with their respective baselines for integrated gradients. While the longformer does not use token_type_ids, we want to create baselines for our model to compare against.

In [None]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [None]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids

def construct_input_ref_token_type_pair(input_ids):
    seq_len = input_ids.size(1)

    # same as the tensor the model creates when you do not pass in token_type_ids as input.
    token_type_ids = torch.zeros(seq_len, dtype=torch.long, device=device).unsqueeze(0).expand_as(input_ids)

    ref_token_type_ids = torch.zeros_like(token_type_ids, device=device)

    return token_type_ids, ref_token_type_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

A custom forward function that returns the softmaxed logits, which are the class probabilities that the model uses for prediction.

In [None]:
def predict(inputs, token_type_ids=None, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   token_type_ids=token_type_ids,
                   attention_mask=attention_mask)
    return output.logits

In [None]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, token_type_ids=None, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   token_type_ids=token_type_ids,
                   attention_mask=attention_mask)
    return torch.softmax(preds, dim = 1)

In [None]:
input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
token_type_ids, ref_token_type_ids = construct_input_ref_token_type_pair(input_ids)
position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
attention_mask = construct_attention_mask(input_ids)

indices = input_ids[0].detach().tolist()
all_tokens = tokenizer.convert_ids_to_tokens(indices)

A helper function to summarize attributions for each word token in the sequence.

In [None]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    return attributions

Perform Layer Integrated Gradients using the longformer's word, position and token_type embedddings.

In [None]:
lig2 = LayerIntegratedGradients(custom_forward, \
                                [model.longformer.embeddings.word_embeddings, \
                                 model.longformer.embeddings.token_type_embeddings, \
                                 model.longformer.embeddings.position_embeddings])

  "Multiple layers provided. Please ensure that each layer is"


In [None]:
attributions = lig2.attribute(inputs=(input_ids, token_type_ids, position_ids),
                               baselines=(ref_input_ids, ref_token_type_ids, ref_position_ids),
                               target=1,
                               additional_forward_args=(attention_mask),
                               n_steps=200,
                               internal_batch_size = 2)

Here we grab the list of attributions for all three types of embeddings.

In [None]:
attributions_word = summarize_attributions(attributions[0])
attributions_position = summarize_attributions(attributions[1])
attributions_token_type = summarize_attributions(attributions[2])

The longformer model only has 1 token type embedding, which means the baseline and the example token_type will be the exact same. When they are the exact same, the model outputs a tensor of NaNs with shape (seq_len), indicating that we do not have attributions for this.

As we can see, we have a tensor of shape seq_len, and if we convert all the nans to 1 representing true, we find that the entire tensor consists of nan values. **As such, moving forward, we will not be displaying anything about the token_type embeddings.**

In [None]:
print(attributions_token_type, attributions_token_type.shape)
print(torch.all(attributions_token_type.nan_to_num(1)))

tensor([ 8.9935e-01,  2.5866e-02,  5.8078e-03,  ..., -3.6926e-03,
        -7.5591e-04, -7.0155e-03], device='cuda:0', dtype=torch.float64) torch.Size([2048])
tensor(True, device='cuda:0')


## Examining the Attributions

These functions return which words had the strongest (most positive and most negative) attributions. Change the number of tokens you wish to visualize for your needs. It takes in a list of attributions and returns 3 lists: the topk (or bottomk) attributions, their respective tokens and their positions.

Note: Remember that the attributions are with respect to the positive class, so the most impact tokens that helped the model predict the negative class will be in the botk attributed tokens.

In [None]:
def get_topk_attributed_tokens(attrs, k=15):
    values, indices = torch.topk(attrs, k)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

In [None]:
def get_botk_attributed_tokens(attrs, k=15):
    values, indices = torch.topk(attrs, k, largest=False)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

Convert the values, index of the values, and the token into a pandas Dataframe for visualization. It will be sorted by highest value for attributions to lowest. Alternatively, if youre looking for the lowest attributions, it goes from lowest to highest.



In [None]:
import pandas as pd
top_words_start, top_words_val_start, top_word_ind_start = get_topk_attributed_tokens(attributions_word)
bot_words_start, bot_words_val_start, bot_word_ind_start = get_botk_attributed_tokens(attributions_word)

top_pos_start, top_pos_val_start, pos_ind_start = get_topk_attributed_tokens(attributions_position)
bot_pos_start, bot_pos_val_start, pos_ind_start2 = get_botk_attributed_tokens(attributions_position)

df_high = pd.DataFrame({'Word(Index), Attribution': ["{} ({}), {}".format(word, pos, round(val.item(),2)) for word, pos, val in zip(top_words_start, top_word_ind_start, top_words_val_start)],
                   'Position(Index), Attribution': ["{} ({}), {}".format(position, pos, round(val.item(),2)) for position, pos, val in zip(top_pos_start, pos_ind_start, top_pos_val_start)]})

df_low = pd.DataFrame({'Word(Index), Attribution': ["{} ({}), {}".format(word, pos, round(val.item(),2)) for word, pos, val in zip(bot_words_start, bot_word_ind_start, bot_words_val_start)],
                   'Position(Index), Attribution': ["{} ({}), {}".format(position, pos, round(val.item(),2)) for position, pos, val in zip(bot_pos_start, pos_ind_start2, bot_pos_val_start)]})
# df_start.style.apply(['cell_ids: False'])

# ['{}({})'.format(token, str(i)) for i, token in enumerate(all_tokens)]

In [None]:
df_high

Unnamed: 0,"Word(Index), Attribution","Position(Index), Attribution"
0,"Ġimage (1024), 0.31","ĠInternational (7), nan"
1,"- (512), 0.13","Ġin (6), nan"
2,"Ġtraining (266), 0.13","Ġconference (4), nan"
3,"Ġbias (170), 0.11","Ġpaper (5), nan"
4,"Ġtraining (223), 0.1","Published (1), nan"
5,"Ġtask (202), 0.09","<s> (0), nan"
6,"Ġobjective (224), 0.09","Ġas (2), nan"
7,"Ġdata (1100), 0.09","Ġa (3), nan"
8,"ing (256), 0.08","ĠVision (11), nan"
9,"Ġtraining (1538), 0.08","ĠComputer (10), nan"


In [None]:
df_low

Unnamed: 0,"Word(Index), Attribution","Position(Index), Attribution"
0,". (251), -0.16","ĠInternational (7), nan"
1,"ĠInternational (7), -0.12","Ġin (6), nan"
2,"Ġto (283), -0.09","Ġconference (4), nan"
3,"ĠThis (152), -0.09","Ġpaper (5), nan"
4,"Ġto (277), -0.08","Published (1), nan"
5,"Ġto (294), -0.07","<s> (0), nan"
6,", (122), -0.07","Ġas (2), nan"
7,"Ġto (236), -0.07","Ġa (3), nan"
8,"chie (83), -0.07","ĠVision (11), nan"
9,"Ġto (156), -0.06","ĠComputer (10), nan"


We notice that there are many repeating tokens in each example that have different positions. While we might want to know how the position plays into the attributions, if we want to know strictly based on what the tokens are, we can add all the duplicate tokens together to get the aggregate attribution for each token. Therefore, we aggregate the attributions strictly based on the tokens.

In [None]:
d = {"tokens":all_tokens, "attribution word":attributions_word[:len(all_tokens)].cpu(), "attribution position":attributions_position[:len(all_tokens)].cpu()}
df_attrib = pd.DataFrame(d)
aggregation_functions = {'attribution word': 'sum'}
aggregation_functions2 = {'attribution position': 'sum'}
df_new = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_functions)
df_new2 = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_functions2)

Here we display the 15 highest and lowest attributions for the word attributions. 

In [None]:
highest_attrib_tokens = df_new.sort_values(by=['attribution word'], ascending=False)
highest_attrib_tokens[:15]

Unnamed: 0_level_0,attribution word
tokens,Unnamed: 1_level_1
Ġtraining,1.337399
Ġcapt,0.83836
-,0.805164
Ġof,0.769997
.,0.734035
Ġimage,0.665893
Ġin,0.516717
Ġ[,0.485355
ing,0.419065
Ġon,0.396159


In [None]:
lowest_attrib_tokens = df_new.sort_values(by=['attribution word'])
lowest_attrib_tokens[:15]

Unnamed: 0_level_0,attribution word
tokens,Unnamed: 1_level_1
Ġto,-0.704773
gram,-0.119656
ĠInternational,-0.119045
Ġthe,-0.102358
Ġcaption,-0.098356
arial,-0.093104
Ġwhich,-0.092439
ĠThis,-0.084309
Ġmachine,-0.07268
Ġgeneration,-0.068081


Here we display the 15 highest and lowest attributions for the position attributions. 

In [None]:
highest_attrib_pos = df_new2.sort_values(by=['attribution position'], ascending=False)
highest_attrib_pos[:15]

Unnamed: 0_level_0,attribution position
tokens,Unnamed: 1_level_1
),0.0
Ġdiscrete,0.0
Ġmakes,0.0
Ġmaking,0.0
Ġman,0.0
Ġmany,0.0
Ġmarked,0.0
Ġmatch,0.0
Ġmatching,0.0
Ġmaximum,0.0


In [None]:
lowest_attrib_pos = df_new2.sort_values(by=['attribution position'])
lowest_attrib_pos[:15]

Unnamed: 0_level_0,attribution position
tokens,Unnamed: 1_level_1
),0.0
Ġmake,0.0
Ġmakes,0.0
Ġmaking,0.0
Ġman,0.0
Ġmany,0.0
Ġmarked,0.0
Ġmatch,0.0
Ġmatching,0.0
Ġmaintaining,0.0


Using the notebook https://colab.research.google.com/drive/1lktilbL1IY4nBanlzCdP8TLsBNfUsl_U?usp=sharing, we can get the files to view the aggregated attributions for the entire dataset for both the positive and negative classes. This means we summed up and averaged the attributions for every instance of any given token throughout the entire dataset (whether or not they have positive or negative attributions) for both the input_id embeddings and the position embeddings.

In [None]:
df_word = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/word_emb_papers.csv")
df_posi = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/pos_emb_papers.csv")

Here we display the positive word attributions over the entire dataset.

In [None]:
df_word[:15]

Unnamed: 0,tokens,attribution
0,.,0.706923
1,Ġof,0.631198
2,-,0.269057
3,Ġ(,0.258001
4,Ġin,0.239166
5,",",0.216959
6,Ġand,0.198603
7,Ġa,0.157613
8,Ġlearning,0.123875
9,Ġ,0.12346


Here we display the positive position attributions over the entire dataset

In [None]:
df_posi[:15]

Unnamed: 0,tokens,attribution
0,<s>,0.581123
1,.,0.349917
2,-,0.102612
3,",",0.077168
4,Ġ(,0.05049
5,Ġof,0.049221
6,Ġthe,0.035754
7,),0.034229
8,:,0.029639
9,].,0.025223


Here we display the most negative word attributions over the entire dataset.

In [None]:
df_word[:-15:-1]

Unnamed: 0,tokens,attribution
30061,Ġto,-0.507627
30060,Ġprogramming,-0.174771
30059,Ġcode,-0.079943
30058,Ġthe,-0.074223
30057,Ġ.,-0.064281
30056,ĠThe,-0.057424
30055,Ġlanguages,-0.056944
30054,Ġlanguage,-0.052695
30053,ĠJava,-0.051785
30052,Ġcompiler,-0.047355


Here we display the most negative position attributions over the entire dataset.

In [None]:
df_posi[:-15:-1]

Unnamed: 0,tokens,attribution
30061,Ġto,-0.066836
30060,Ġa,-0.062148
30059,Ġ,-0.036262
30058,Ġfor,-0.016519
30057,Ġwe,-0.015904
30056,Ġit,-0.015617
30055,ĠThe,-0.012718
30054,ĠIn,-0.011133
30053,Ġan,-0.011044
30052,ĠWe,-0.011023
