<a href="https://colab.research.google.com/github/danielhou13/cogs402longformer/blob/main/src/Attn_attr_cosine_sim_all.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook explores the relation between the model's attributions and attentions for a given example. Historically, we found that attentions are not a feasible method of explanation whereas attributions are, but attributions are also not part of a model's traditional outputs. Therefore it may be interesting to see if we can find anything with attentions by comparing them to a feasible and plausible method of explanation.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import dependencies

In [2]:
pip install transformers --quiet

[K     |████████████████████████████████| 4.4 MB 5.1 MB/s 
[K     |████████████████████████████████| 101 kB 8.4 MB/s 
[K     |████████████████████████████████| 596 kB 49.2 MB/s 
[K     |████████████████████████████████| 6.6 MB 52.4 MB/s 
[?25h

In [3]:
pip install captum --quiet

[?25l[K     |▎                               | 10 kB 20.7 MB/s eta 0:00:01[K     |▌                               | 20 kB 12.4 MB/s eta 0:00:01[K     |▊                               | 30 kB 9.7 MB/s eta 0:00:01[K     |█                               | 40 kB 8.5 MB/s eta 0:00:01[K     |█▏                              | 51 kB 4.6 MB/s eta 0:00:01[K     |█▍                              | 61 kB 5.4 MB/s eta 0:00:01[K     |█▋                              | 71 kB 5.5 MB/s eta 0:00:01[K     |█▉                              | 81 kB 5.4 MB/s eta 0:00:01[K     |██                              | 92 kB 6.0 MB/s eta 0:00:01[K     |██▎                             | 102 kB 5.2 MB/s eta 0:00:01[K     |██▌                             | 112 kB 5.2 MB/s eta 0:00:01[K     |██▊                             | 122 kB 5.2 MB/s eta 0:00:01[K     |███                             | 133 kB 5.2 MB/s eta 0:00:01[K     |███▏                            | 143 kB 5.2 MB/s eta 0:00:01[K   

In [4]:
pip install datasets --quiet

[K     |████████████████████████████████| 362 kB 5.2 MB/s 
[K     |████████████████████████████████| 1.1 MB 35.8 MB/s 
[K     |████████████████████████████████| 212 kB 57.3 MB/s 
[K     |████████████████████████████████| 140 kB 52.4 MB/s 
[K     |████████████████████████████████| 127 kB 46.1 MB/s 
[K     |████████████████████████████████| 271 kB 36.8 MB/s 
[K     |████████████████████████████████| 94 kB 122 kB/s 
[K     |████████████████████████████████| 144 kB 51.9 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
[?25h

In [5]:
pip install rbo

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rbo
  Downloading rbo-0.1.2-py3-none-any.whl (7.5 kB)
Installing collected packages: rbo
Successfully installed rbo-0.1.2


In [6]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [7]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [8]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

In [9]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
model_path = 'danielhou13/longformer-finetuned_papers_v2'
#model_path = 'danielhou13/longformer-finetuned-new-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Downloading:   0%|          | 0.00/0.99k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/567M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/694 [00:00<?, ?B/s]

Create functions that give us the input ids and the position ids for the text we want to examine

In [10]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [11]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [12]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

Import dataset and take a few examples from it for testing purposes

Here we import the papers dataset

In [13]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402dataset")["test"]

Downloading:   0%|          | 0.00/739 [00:00<?, ?B/s]

Using custom data configuration danielhou13--cogs402dataset-144b958ac1a53abb


Downloading and preparing dataset None/None (download: 157.87 MiB, generated: 311.56 MiB, post-processed: Unknown size, total: 469.43 MiB) to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/132M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

Here we import the news dataset

In [None]:
# cogs402_ds2 = load_dataset('hyperpartisan_news_detection', 'bypublisher')['validation']
# val_size = 5000
# val_indices = np.random.randint(0, len(cogs402_ds2), val_size)
# val_ds = cogs402_ds2.select(val_indices)
# labels2 = map(int, val_ds['hyperpartisan'])
# labels2 = list(labels2)
# val_ds = val_ds.add_column("labels", labels2)

In [14]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

Perform Layer Integrated Gradients using the longformer's embeddings

In [15]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    return attributions

In [16]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)

This function will let us get the example and the baseline inputs in order to perform integrated gradients, and add the attributions to our visualization tool. Additionally, we will add the attributions and tokens for each example into an array so we can use them when we want to further example the attributions scores for each example

In [None]:
all_attributions = {}
# all_tokens = {}

In [17]:
all_attributions = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict_all.pt')

In [None]:
# from tqdm import tqdm

# for i in tqdm(range(len(cogs402_ds))):
#   if str(i) not in all_attributions:
#     #get input ids, position ids and attention mask for integrated gradients
#     text = cogs402_ds['text'][i]
#     label = cogs402_ds['labels'][i]

#     input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
#     position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
#     attention_mask = construct_attention_mask(input_ids)

#     attributions = lig.attribute(inputs=input_ids,
#                                       baselines=ref_input_ids,
#                                       additional_forward_args=(position_ids, attention_mask),
#                                       target=1,
#                                       n_steps=50,
#                                       internal_batch_size = 2)

#     attributions_sum = summarize_attributions(attributions)

#     all_attributions[str(i)] = attributions_sum.detach().cpu().numpy()

#     torch.save(all_attributions, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict_all.pt')

We then get the attentions and global attentions so we can compare with the attributions

Since the longformer has a unique attention matrix shape, we convert it into the required sequence length x sequence length matrix

In [18]:
def create_head_matrix(output_attentions, global_attentions):
    new_attention_matrix = torch.zeros((output_attentions.shape[0], 
                                      output_attentions.shape[0]))
    for i in range(output_attentions.shape[0]):
        test_non_zeroes = torch.nonzero(output_attentions[i]).squeeze()
        test2 = output_attentions[i][test_non_zeroes[1:]]
        new_attention_matrix_indices = test_non_zeroes[1:]-257 + i
        new_attention_matrix[i][new_attention_matrix_indices] = test2
        new_attention_matrix[i][0] = output_attentions[i][0]
        new_attention_matrix[0] = global_attentions.squeeze()[:output_attentions.shape[0]]
    return new_attention_matrix


def attentions_all_heads(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = create_head_matrix(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

def all_layers(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = attentions_all_heads(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

We scale the attention matrix by head importance

In [19]:
def scale_by_importance(attention_matrix, head_importance):
  new_matrix = np.zeros_like(attention_matrix)
  for i in range(attention_matrix.shape[0]):
    head_importance_layer = head_importance[i]
    new_matrix[i] = attention_matrix[i] * np.expand_dims(head_importance_layer, axis=(1))
  return new_matrix

In [20]:
head_importance = torch.load("/content/drive/MyDrive/cogs402longformer/t3-visapplication/resources/papers/pretrained/head_importance.pt")
# head_importance = torch.load("/content/drive/MyDrive/cogs402longformer/t3-visapplication/resources/news/head_importance.pt")

In [None]:
# print(all_attentions_final)

The following block of code creates a new dictionary of attention matrices. Each key corresponds to their respective example in the dataset (range 0 - dataset length) and stores a layer x head x seq_len matrix of attentions for each key

In [21]:
all_attentions = {}

The following loads a saved dictionary of attention matrices without having applied head importance to the matrices.

In [68]:
all_attentions = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/papers_atten_summed_dict.pt')

In [23]:
from tqdm import tqdm

# for i in tqdm(range(len(cogs402_ds))):
#   if str(i) not in all_attentions:
#     #get input ids, position ids and attention mask for integrated gradients
#     text = cogs402_ds['text'][i]
#     label = cogs402_ds['labels'][i]

#     input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
#     position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
#     attention_mask = construct_attention_mask(input_ids)

#     output = model(input_ids.cuda(), attention_mask=attention_mask.cuda(), labels=torch.tensor(label).cuda(), output_attentions = True)

#     batch_attn = output[-2]
#     output_attentions = torch.stack(batch_attn).cpu().squeeze()
#     global_attention = output[-1]
#     output_global_attentions = torch.stack(global_attention).cpu().squeeze()

#     converted_mat = all_layers(output_attentions, output_global_attentions).detach().cpu().numpy()
    
#     attention_matrix_summed = converted_mat.sum(axis=2)
#     all_attentions[str(i)] = attention_matrix_summed

#     if i%10 == 9:
#       torch.save(all_attentions, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/papers_atten_summed_dict.pt')

The following are two dictionaries of attention weights for each token (how much each token is attended to), weighted by head importance, for layer 12 and over all layers. The number of keys in the dictionary is 1070 (number of items in the validation set) and each key contains an array of shape (seq_len,).

In [None]:
# all_attentions_final = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_12.pt')
# all_attentions_all = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_all.pt')

Another example of importing dictionaries of attentions. These two dictionaries store the summed attentions for the layers 1-6 and 7-12 respectively. 

In [64]:
# all_attentions_lower = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_lower.pt')
# all_attentions_upper = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_upper.pt')

This block of code iterates through the entire dataset, scales the attention matrix for each example by head importance and summs up the attention. The method of converting a single layer is left as a template in the block of code. Simply replace layer with the layer you wish to assess.

The two dictionaries of attentions are currently labeled _lower and _upper for the use case, but should be changed to fit the task at hand

In [74]:
all_attentions_lower = {}
all_attentions_upper = {}
for i in tqdm(range(len(cogs402_ds))):
  if str(i) not in all_attentions_lower and str(i) not in all_attentions_upper:

    att_mat = all_attentions[str(i)]

    # uncomment the line if you wish to scale by head importance
    att_mat = scale_by_importance(att_mat, head_importance)

    att_mat_low = att_mat[0:6]
    att_mat_up = att_mat[6:]


    # Sum the attentions for the last layer and over all layers
    attention_lower_layer = att_mat_low.sum(axis=1)
    attention_lower_layer = attention_lower_layer.sum(axis=0)
    all_attentions_lower[str(i)] = attention_lower_layer

    attention_upper_layer = att_mat_up.sum(axis=1)
    attention_upper_layer = attention_upper_layer.sum(axis=0)
    all_attentions_upper[str(i)] = attention_upper_layer

    #template for single layer
    # attention_single_layer = att_mat[layer].sum(axis=0)
    # all_attentions_layer[str(i)] = attention_single_layer
    
    torch.save(all_attentions_lower, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_lower.pt')
    torch.save(all_attentions_upper, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_upper.pt')

100%|██████████| 1070/1070 [03:00<00:00,  5.93it/s]


In [28]:
def normalize(data):
    return (data - np.min(data)) / (np. max(data) - np.min(data))

The following block of code iterates through the entire dataset, grabs the appropriate attributions and attentions from their respective dictionaries, and computes the cosine similarities, kendall tau coefficients, and the RBO.

In [29]:
from numpy.linalg import norm
import scipy.stats as stats
import rbo

In [30]:
def get_sim_dataframe(cogs402_ds, all_attentions, all_attributions):

  dataframe = []

  for i in tqdm(range(len(cogs402_ds))):
    exam_attrib = all_attributions[str(i)]
    attention_final_layer = all_attentions[str(i)]
    
    exam_attrib = exam_attrib[:len(all_attentions[str(i)])]

    #raw attributions
    cosine_raw = np.dot(exam_attrib, attention_final_layer) / (norm(exam_attrib)*norm(attention_final_layer))

    attention_final_layer2 = normalize(attention_final_layer)

    exam_attrib2 = np.abs(exam_attrib)
    exam_attrib2 = normalize(exam_attrib2)
    
    #normalized attributions
    cosine = np.dot(exam_attrib2, attention_final_layer2) / (norm(exam_attrib2)*norm(attention_final_layer2))

    exam_attrib3 = np.abs(exam_attrib)
    exam_attrib3 = normalize(exam_attrib3)
    median_exam = np.percentile(exam_attrib3, 50)
    exam_attrib3[exam_attrib3 < median_exam] = 0

    attention_final_layer3 = np.copy(attention_final_layer)
    attention_final_layer3 = normalize(attention_final_layer3)
    median_12 = np.percentile(attention_final_layer3, 50)
    attention_final_layer3[attention_final_layer3 < median_12] = 0


    #sim for attributions and attentions above the median
    cosine_med = np.dot(exam_attrib3, attention_final_layer3) / (norm(exam_attrib3)*norm(attention_final_layer3))

    exam_attrib4 = np.abs(exam_attrib)
    exam_attrib4 = normalize(exam_attrib4)
    mean_exam = np.mean(exam_attrib4)
    exam_attrib4[exam_attrib4 < mean_exam] = 0

    attention_final_layer4 = np.copy(attention_final_layer)
    attention_final_layer4 = normalize(attention_final_layer4)
    mean_12 = np.mean(attention_final_layer4)
    attention_final_layer4[attention_final_layer4 < mean_12] = 0

    #sim for attributions and attentions above the mean
    cosine_mean = np.dot(exam_attrib4, attention_final_layer4) / (norm(exam_attrib4)*norm(attention_final_layer4))

    exam_attrib_rank = np.abs(exam_attrib)
    order_attrib = exam_attrib_rank.argsort()
    ranks_attrib = order_attrib.argsort()

    attention_final_layer_rank = np.copy(attention_final_layer)
    order = attention_final_layer_rank.argsort()
    ranks = order.argsort()

    #sim using the ranks of the tokens
    cosine_rank = np.dot(ranks_attrib, ranks) / (norm(ranks_attrib)*norm(ranks))

    tau, p_value = stats.kendalltau(ranks_attrib, ranks)

    rbo_sim = rbo.RankingSimilarity(order_attrib, order).rbo()

    d = {'example': i, 'similarity normalized': cosine, 'similarity raw': cosine_raw, 'sim_norm w/ median threshold':cosine_med, 'sim_norm w/ mean threshold':cosine_mean, "sim w/ ranks":cosine_rank,
        "kendalltau": tau, "rbo":rbo_sim}
    dataframe.append(d)

  return pd.DataFrame(dataframe)

The attributions and the attentions have different ranges. The attributions could range from -1 to 1 whereas the attentions range from 0 to 1. However, negative attributions would not necessarily mean that they have the lowest attention, rather they might have really high attention as they are more likely to help the model predict the negative class, and might be something the attentions picked up as a feature.

In [75]:
df = get_sim_dataframe(cogs402_ds, all_attentions_lower, all_attributions)

100%|██████████| 1070/1070 [00:06<00:00, 152.93it/s]


In [76]:
df2 = get_sim_dataframe(cogs402_ds, all_attentions_upper, all_attributions)

100%|██████████| 1070/1070 [00:07<00:00, 152.36it/s]


In [77]:
df

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks,kendalltau,rbo
0,0,0.055101,0.059613,0.052692,0.049383,0.771270,0.056225,0.502536
1,1,0.165902,-0.136923,0.150249,0.132758,0.766861,0.046023,0.509705
2,2,0.109427,-0.096491,0.103792,0.096039,0.771890,0.059434,0.509233
3,3,0.076973,0.061766,0.074296,0.065375,0.812320,0.168871,0.550735
4,4,0.151239,-0.114314,0.136400,0.121803,0.775222,0.069042,0.516904
...,...,...,...,...,...,...,...,...
1065,1065,0.090096,-0.052967,0.086513,0.076633,0.785083,0.094114,0.513477
1066,1066,0.057741,-0.033787,0.055671,0.047590,0.786249,0.094733,0.513198
1067,1067,0.074530,-0.061406,0.072185,0.064172,0.842639,0.248469,0.579818
1068,1068,0.146087,-0.094186,0.131836,0.116295,0.763927,0.039321,0.499422


In [72]:
df.max()

example                         1069.000000
similarity normalized              0.285755
similarity raw                     0.167277
sim_norm w/ median threshold       0.276364
sim_norm w/ mean threshold         0.267324
sim w/ ranks                       0.868214
kendalltau                         0.319805
rbo                                0.599629
dtype: float64

In [50]:
df.min()

example                         0.000000
similarity normalized           0.012736
similarity raw                 -0.266211
sim_norm w/ median threshold    0.010828
sim_norm w/ mean threshold      0.003529
sim w/ ranks                    0.727784
kendalltau                     -0.059373
rbo                             0.453769
dtype: float64

In [37]:
df2.max()

example                         1069.000000
similarity normalized              0.547884
similarity raw                     0.429935
sim_norm w/ median threshold       0.541587
sim_norm w/ mean threshold         0.528510
sim w/ ranks                       0.850276
kendalltau                         0.273140
rbo                                0.595922
dtype: float64

In [78]:
df.mean()

example                         534.500000
similarity normalized             0.102567
similarity raw                    0.000039
sim_norm w/ median threshold      0.095146
sim_norm w/ mean threshold        0.083200
sim w/ ranks                      0.782399
kendalltau                        0.088036
rbo                               0.521297
dtype: float64

In [59]:
df2.mean()

example                         534.500000
similarity normalized             0.180892
similarity raw                    0.019859
sim_norm w/ median threshold      0.174104
sim_norm w/ mean threshold        0.157438
sim w/ ranks                      0.782365
kendalltau                        0.089739
rbo                               0.531374
dtype: float64

The following block of code iterates through the entire dataset, grabs the appropriate attributions and attentions from their respective dictionaries, and computes the cosine similarities, kendall tau coefficients, and the RBO. The difference is that this block of code masks the values where the input is a non-alphanumeric token (e.g. ".,][?") before calculating similarities.

In [31]:
def get_sim_dataframe_alpha(cogs402_ds, all_attentions, all_attributions):

  dataframe = []

  for i in tqdm(range(len(cogs402_ds))):
    exam_attrib = all_attributions[str(i)]
    attention_final_layer = all_attentions[str(i)]
    
    exam_attrib = exam_attrib[:len(all_attentions[str(i)])]

    #input_ids
    text = cogs402_ds['text'][i]
    input_ids, _, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
    indices = input_ids[0].detach().tolist()
    all_tokens_curr = tokenizer.convert_ids_to_tokens(indices)

    exam_tokens = all_tokens_curr
    alpha_neumeric_nums = [idx for idx, element in enumerate(exam_tokens) if element.isalnum()]
    mask = np.ones(attention_final_layer.shape,dtype=bool) 
    mask[alpha_neumeric_nums] = False

    attention_final_layer[mask] = 0
    exam_attrib[mask] = 0

    #raw attributions
    cosine_raw = np.dot(exam_attrib, attention_final_layer) / (norm(exam_attrib)*norm(attention_final_layer))

    #normalized attributions
    attention_final_layer2 = normalize(attention_final_layer)

    exam_attrib2 = np.abs(exam_attrib)
    exam_attrib2 = normalize(exam_attrib2)
    
    cosine = np.dot(exam_attrib2, attention_final_layer2) / (norm(exam_attrib2)*norm(attention_final_layer2))

    #sim for attributions and attentions above the median
    exam_attrib3 = np.abs(exam_attrib)
    exam_attrib3 = normalize(exam_attrib3)
    median_exam = np.percentile(exam_attrib3, 50)
    exam_attrib3[exam_attrib3 < median_exam] = 0

    attention_final_layer3 = np.copy(attention_final_layer)
    attention_final_layer3 = normalize(attention_final_layer3)
    median_12 = np.percentile(attention_final_layer3, 50)
    attention_final_layer3[attention_final_layer3 < median_12] = 0

    cosine_med = np.dot(exam_attrib3, attention_final_layer3) / (norm(exam_attrib3)*norm(attention_final_layer3))

    #sim for attributions and attentions above the mean
    exam_attrib4 = np.abs(exam_attrib)
    exam_attrib4 = normalize(exam_attrib4)
    mean_exam = np.mean(exam_attrib4)
    exam_attrib4[exam_attrib4 < mean_exam] = 0

    attention_final_layer4 = np.copy(attention_final_layer)
    attention_final_layer4 = normalize(attention_final_layer4)
    mean_12 = np.mean(attention_final_layer4)
    attention_final_layer4[attention_final_layer4 < mean_12] = 0
    
    cosine_mean = np.dot(exam_attrib4, attention_final_layer4) / (norm(exam_attrib4)*norm(attention_final_layer4))

    #sim using the ranks of the tokens
    exam_attrib_rank = np.abs(exam_attrib)
    order_attrib = exam_attrib_rank.argsort()
    ranks_attrib = order_attrib.argsort()

    attention_final_layer_rank = np.copy(attention_final_layer)
    order = attention_final_layer_rank.argsort()
    ranks = order.argsort()

    
    cosine_rank = np.dot(ranks_attrib, ranks) / (norm(ranks_attrib)*norm(ranks))

    tau, p_value = stats.kendalltau(ranks_attrib, ranks)

    rbo_sim = rbo.RankingSimilarity(order_attrib, order).rbo()

    d = {'example': i, 'similarity normalized': cosine, 'similarity raw': cosine_raw, 'sim_norm w/ median threshold':cosine_med, 'sim_norm w/ mean threshold':cosine_mean, "sim w/ ranks":cosine_rank,
        "kendalltau": tau, "rbo":rbo_sim}
    dataframe.append(d)

  return pd.DataFrame(dataframe)

In [60]:
df_alpha = get_sim_dataframe_alpha(cogs402_ds, all_attentions_lower, all_attributions)

100%|██████████| 1070/1070 [03:21<00:00,  5.31it/s]


In [62]:
df_alpha2 = get_sim_dataframe_alpha(cogs402_ds, all_attentions_upper, all_attributions)

100%|██████████| 1070/1070 [03:21<00:00,  5.31it/s]


In [61]:
df_alpha.mean()

example                         534.500000
similarity normalized             0.287514
similarity raw                   -0.005378
sim_norm w/ median threshold      0.265075
sim_norm w/ mean threshold        0.242415
sim w/ ranks                      0.875731
kendalltau                        0.371863
rbo                               0.783688
dtype: float64

In [63]:
df_alpha2.mean()

example                         534.500000
similarity normalized             0.417837
similarity raw                    0.005613
sim_norm w/ median threshold      0.403510
sim_norm w/ mean threshold        0.380196
sim w/ ranks                      0.877220
kendalltau                        0.376542
rbo                               0.783973
dtype: float64

These two dictionaries store the summed attentions for the layers 1-6 and 7-12 respectively. 

In [None]:
# all_attentions_unscale_lower = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_lower_unscale.pt')
# all_attentions_unscale_upper = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_upper_unscale.pt')

This block of code iterates through the entire dataset, scales the attention matrix for each example by head importance and sums up the attention. The difference between this block and the one above is that this block does not scale the attentions by head importance. The method of converting a single layer is once again left as a template in the block of code. Simply replace layer with the layer you wish to assess.

The two dictionaries of attentions are currently labeled _lower and _upper for the use case, but should be changed to fit the task at hand

In [79]:
all_attentions_unscale_lower = {}
all_attentions_unscale_upper = {}
for i in tqdm(range(len(cogs402_ds))):
  if str(i) not in all_attentions_unscale_lower and str(i) not in all_attentions_unscale_upper:

    att_mat = all_attentions[str(i)]

    att_mat_low = att_mat[0:6]
    att_mat_up = att_mat[6:]

    # Sum the attentions for the last layer and over all layers
    attention_lower_layer = att_mat_low.sum(axis=1)
    attention_lower_layer = attention_lower_layer.sum(axis=0)
    all_attentions_unscale_lower[str(i)] = attention_lower_layer

    attention_upper_layer = att_mat_up.sum(axis=1)
    attention_upper_layer = attention_upper_layer.sum(axis=0)
    all_attentions_unscale_upper[str(i)] = attention_upper_layer
    
    torch.save(all_attentions_unscale_lower, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_lower_unscale.pt')
    torch.save(all_attentions_unscale_upper, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/full_attention_matrices/example_atten_dict_upper_unscale.pt')

100%|██████████| 1070/1070 [02:53<00:00,  6.17it/s]


In [80]:
df_unscale = get_sim_dataframe(cogs402_ds, all_attentions_unscale_lower, all_attributions)
df_unscale2 = get_sim_dataframe(cogs402_ds, all_attentions_unscale_upper, all_attributions)

100%|██████████| 1070/1070 [00:07<00:00, 149.99it/s]
100%|██████████| 1070/1070 [00:07<00:00, 148.52it/s]


In [81]:
df_unscale.mean()

example                         534.500000
similarity normalized             0.044425
similarity raw                   -0.004103
sim_norm w/ median threshold      0.038792
sim_norm w/ mean threshold        0.026743
sim w/ ranks                      0.759812
kendalltau                        0.028209
rbo                               0.510167
dtype: float64

In [82]:
df_unscale2.mean()

example                         534.500000
similarity normalized             0.057689
similarity raw                    0.006596
sim_norm w/ median threshold      0.052625
sim_norm w/ mean threshold        0.037709
sim w/ ranks                      0.783053
kendalltau                        0.092020
rbo                               0.534586
dtype: float64

In [87]:
df_unscale_alpha = get_sim_dataframe_alpha(cogs402_ds, all_attentions_unscale_lower, all_attributions)

100%|██████████| 1070/1070 [03:22<00:00,  5.29it/s]


In [88]:
df_unscale_alpha2 = get_sim_dataframe_alpha(cogs402_ds, all_attentions_unscale_upper, all_attributions)

100%|██████████| 1070/1070 [03:22<00:00,  5.28it/s]


In [89]:
df_unscale_alpha.mean()

example                         534.500000
similarity normalized             0.158132
similarity raw                   -0.011340
sim_norm w/ median threshold      0.136436
sim_norm w/ mean threshold        0.123483
sim w/ ranks                      0.867937
kendalltau                        0.346927
rbo                               0.776589
dtype: float64

In [90]:
df_unscale_alpha2.mean()

example                         534.500000
similarity normalized             0.418432
similarity raw                    0.018761
sim_norm w/ median threshold      0.394978
sim_norm w/ mean threshold        0.369731
sim w/ ranks                      0.877759
kendalltau                        0.378284
rbo                               0.784442
dtype: float64