<a href="https://colab.research.google.com/github/danielhou13/cogs402longformer/blob/main/src/Attention_attribution_cosine_sim.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook explores the relation between the model's attributions and attentions for a given example. Historically, we found that attentions are not a feasible method of explanation whereas attributions are, but attributions are also not part of a model's traditional outputs. Therefore it may be interesting to see if we can find anything with attentions by comparing them to a feasible and plausible method of explanation.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import dependencies

In [None]:
pip install transformers --quiet

[K     |████████████████████████████████| 4.4 MB 8.0 MB/s 
[K     |████████████████████████████████| 101 kB 12.9 MB/s 
[K     |████████████████████████████████| 6.6 MB 59.7 MB/s 
[K     |████████████████████████████████| 596 kB 86.0 MB/s 
[?25h

In [None]:
pip install captum --quiet

[?25l[K     |▎                               | 10 kB 35.9 MB/s eta 0:00:01[K     |▌                               | 20 kB 28.7 MB/s eta 0:00:01[K     |▊                               | 30 kB 18.7 MB/s eta 0:00:01[K     |█                               | 40 kB 8.6 MB/s eta 0:00:01[K     |█▏                              | 51 kB 8.4 MB/s eta 0:00:01[K     |█▍                              | 61 kB 9.8 MB/s eta 0:00:01[K     |█▋                              | 71 kB 10.1 MB/s eta 0:00:01[K     |█▉                              | 81 kB 10.0 MB/s eta 0:00:01[K     |██                              | 92 kB 11.1 MB/s eta 0:00:01[K     |██▎                             | 102 kB 9.7 MB/s eta 0:00:01[K     |██▌                             | 112 kB 9.7 MB/s eta 0:00:01[K     |██▊                             | 122 kB 9.7 MB/s eta 0:00:01[K     |███                             | 133 kB 9.7 MB/s eta 0:00:01[K     |███▏                            | 143 kB 9.7 MB/s eta 0:00:01[

In [None]:
pip install datasets --quiet

[K     |████████████████████████████████| 362 kB 9.7 MB/s 
[K     |████████████████████████████████| 1.1 MB 63.9 MB/s 
[K     |████████████████████████████████| 140 kB 95.4 MB/s 
[K     |████████████████████████████████| 212 kB 79.3 MB/s 
[K     |████████████████████████████████| 127 kB 76.3 MB/s 
[K     |████████████████████████████████| 271 kB 74.2 MB/s 
[K     |████████████████████████████████| 94 kB 113 kB/s 
[K     |████████████████████████████████| 144 kB 58.1 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
[?25h

In [None]:
pip install rbo --quiet

In [None]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [None]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

Replace model_path and tokenizer with the one for your own project

In [None]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig

model_path = 'danielhou13/longformer-finetuned_papers_v2'
#model_path = 'danielhou13/longformer-finetuned-new-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Downloading:   0%|          | 0.00/0.99k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/567M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/694 [00:00<?, ?B/s]

In [None]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

##Import Dataset

Here we import the papers dataset

In [None]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402dataset")["test"]

Downloading:   0%|          | 0.00/739 [00:00<?, ?B/s]

Using custom data configuration danielhou13--cogs402dataset-144b958ac1a53abb


Downloading and preparing dataset None/None (download: 157.87 MiB, generated: 311.56 MiB, post-processed: Unknown size, total: 469.43 MiB) to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/132M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

Here we import the news dataset

In [None]:
# cogs402_ds2 = load_dataset('hyperpartisan_news_detection', 'bypublisher')['validation']
# val_size = 5000
# val_indices = np.random.randint(0, len(cogs402_ds2), val_size)
# val_ds = cogs402_ds2.select(val_indices)
# labels2 = map(int, val_ds['hyperpartisan'])
# labels2 = list(labels2)
# val_ds = val_ds.add_column("labels", labels2)

## Get Attributions

We need to create a custom forward function for use in our [Integrated Gradients](https://arxiv.org/abs/1703.01365) functions. Specifially the output we want from the forward pass of the model is the softmaxed logits, which indicate the probabilities of predicting each class for the given example.

In [None]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [None]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

To get the attributions, we perform Integrated Gradients using the model's embeddings and pass in our custom forward function.

In [None]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)

Here we pick out the example we want to compare the attributions and the attentions for. You should either pick this example at random, or if another part of your project has given some interesting results, you can use that example.

In [None]:
example = 976
text = cogs402_ds['text'][example]
label = cogs402_ds['labels'][example]

Create functions that give us the input ids and the position ids for the text we want to examine. Furthermore, it also returns the baselines we want for integrated gradients. In this case, every token in our baseline, is a padding token.

In [None]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 

    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

We get the inputs, position_ids and the mask along with the baselines. We store the tokens in the dictionary created above for access in future functions.

In [None]:
input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
attention_mask = construct_attention_mask(input_ids)

indices = input_ids[0].detach().tolist()
all_tokens_curr = tokenizer.convert_ids_to_tokens(indices)

all_tokens[str(example)] = all_tokens_curr

The attributions returned has very high dimensionality and we just want a single number for every token in our example, so we sum over the last dimension and squeeze the result to get an array of shape (seq_len). You may notice that we are not normalizing the attributions here. It's okay because we will normalize it later.

In [None]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    return attributions

In [None]:
print(attention_mask.shape)

torch.Size([1, 2048])


For use in later functions, we want to store the attributions we find along with their respective tokens.

In [None]:
all_attributions = {}
all_tokens = {}

On the other hand, if you have a dictionary of attributions already saved, you can import it as follows. Replace the path with a path to your own dictionary.

In [None]:
all_attributions = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict.pt')

This function is where we perform Integrated Gradients, sum the attributions and store the result in the dictionary. We can also save the dictionary is we require. If you have already loaded your attributions, you can skip this step.

Note: the attributions will be with respect to the positive class, meaning positive attributions have more influence in the model predicting positive and negative attributions will be more influential in predicting negative.

In [None]:
# attributions, delta = lig.attribute(inputs=input_ids,
#                                   baselines=ref_input_ids,
#                                   return_convergence_delta=True,
#                                   additional_forward_args=(position_ids, attention_mask),
#                                   target=1,
#                                   n_steps=1500,
#                                   internal_batch_size = 2)

# attributions_sum = summarize_attributions(attributions)

# all_attributions[str(example)] = attributions_sum.detach().cpu().numpy()

In [None]:
# torch.save(all_attributions, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict.pt')

## Grabbing the attentions

We then get the attentions and global attentions so we can compare with the attributions. We stack the attention to get a tensor of shape: (layer, batch, head, seq_len, x + attention_window + 1) and a tensor of shape (layer, batch, head, seq_len, x) where x is the number of global attention tokens.

In [None]:
output = model(input_ids.cuda(), attention_mask=attention_mask.cuda(), labels=torch.tensor(label).cuda(), output_attentions = True)
batch_attn = output[-2]
output_attentions = torch.stack(batch_attn).cpu()
global_attention = output[-1]
output_global_attentions = torch.stack(global_attention).cpu()
print("output_attention.shape", output_attentions.shape)
print("gl_output_attention.shape", output_global_attentions.shape)

output_attention.shape torch.Size([12, 1, 12, 2048, 514])
gl_output_attention.shape torch.Size([12, 1, 12, 2048, 1])


A unique property of the longformer model is that the matrix output for the attention is not a seq_len x seq_len output. Each token can only attend to the preceeding w/2 tokens and the succeeding w/2 tokens, dictated by whatever you choose the model's attention window w to be. Another name for this is called the sliding window attention. Therefore, we need to convert sliding attention matrix to correct seq_len x seq_len matrix to remain consistent with other types of Transformer Neural Networks.

To do so, we run the following 4 functions. Our attentions will change from a tensor of shape (layer, batch, head, seq_len, x + attention_window + 1) to a tensor of shape (layer, batch, head, seq_len, seq_len). More information about the functions can be found [here](https://colab.research.google.com/drive/1Kxx26NtIlUzioRCHpsR8IbSz_DpRFxEZ).

In [None]:
def create_head_matrix(output_attentions, global_attentions):
    new_attention_matrix = torch.zeros((output_attentions.shape[0], 
                                      output_attentions.shape[0]))
    for i in range(output_attentions.shape[0]):
        test_non_zeroes = torch.nonzero(output_attentions[i]).squeeze()
        test2 = output_attentions[i][test_non_zeroes[1:]]
        new_attention_matrix_indices = test_non_zeroes[1:]-257 + i
        new_attention_matrix[i][new_attention_matrix_indices] = test2
        new_attention_matrix[i][0] = output_attentions[i][0]
        new_attention_matrix[0] = global_attentions.squeeze()[:output_attentions.shape[0]]
    return new_attention_matrix


def attentions_all_heads(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = create_head_matrix(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

def all_batches(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = attentions_all_heads(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

def all_layers(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = all_batches(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

We should only run one example at a time, so we will squeeze the result of applying the above 4 functions to get a tensor of shape (layer, head, seq_len, seq_len).

In [None]:
converted_mat = all_layers(output_attentions, output_global_attentions).detach().cpu().numpy()
print(converted_mat.shape)

(12, 1, 12, 2048, 2048)


We get the attentions for each token by summing the converted attention matrix over the first seq_len axis. As such the resulting matrix is of shape (layer, batch, head, seq_len).

In [None]:
attention_matrix_summed = converted_mat.sum(axis=3)

(12, 1, 12, 2048)


Some heads may be more important than others so we scale each attention matrix by their respective head and layer. The notebook used to get head importance is [here](https://colab.research.google.com/drive/1O4QCi8ewBp7asegKqySRflTQZ9HeH8mQ?usp=sharing). However, its possible that you might not want to scale the attentions, in which case you can ignore this section.

In [None]:
head_importance = torch.load("/content/drive/MyDrive/cogs402longformer/t3-visapplication/resources/papers/pretrained/head_importance.pt")
# head_importance = torch.load("/content/drive/MyDrive/cogs402longformer/t3-visapplication/resources/news/head_importance.pt")

In [None]:
def scale_by_importance(attention_matrix, head_importance):
  new_matrix = np.zeros_like(attention_matrix)
  for i in range(attention_matrix.shape[0]):
    head_importance_layer = head_importance[i]
    for j in range(attention_matrix.shape[1]):
      new_matrix[i,j] = attention_matrix[i,j] * np.expand_dims(head_importance_layer, axis=(1))
  return new_matrix

In [None]:
attention_matrix_summed = scale_by_importance(attention_matrix_summed, head_importance)

Here we are using the squeeze function to remove the batch axis, as we are likely only working with one example at a time. After that, we can either select a specific layer we want, or a range of layers we wish to compare. 

In this case, when taking a specific layer, you pick the layer you want (replace 11 with whatever layer you wish) and then we sum over all of the heads.

When taking a range of layers, you either want to specify a range (e.g. attention_matrix_summed[0:6]) or leave as it is to sum over all layers. Then we sum up the layers and the heads.

The result of both versions will be an array of shape (seq_len), the same as our attributions as desired.

In [None]:
attention_final_layer = attention_matrix_summed[11].squeeze().sum(axis=0)
attention_all_layer = attention_matrix_summed.squeeze().sum(axis=1)
attention_all_layer = attention_all_layer.sum(axis=0)
print(attention_all_layer.shape)

(2048,)


## Starting the Comparison

Grab the attributions we stored earlier. Just as an insurance, make sure that the attributions aren't for some reason longer than the attentions



In [None]:
exam_attrib = all_attributions[str(example)]
exam_attrib = exam_attrib[:len(attention_final_layer)]

Since we have the attributions and the attentions, we want to see how the attributions (in terms of magnitude) compares to the attentions.

However, it's probably a good idea to check how the Cosine similarities are when we don't do anything processing by using the raw attributions and attentions

In [None]:
from numpy.linalg import norm
cosine_raw = np.dot(exam_attrib, attention_final_layer) / (norm(exam_attrib)*norm(attention_final_layer))
print("Layer 12 Cosine Similarity raw attrib:\n", cosine_raw)
cosine_all_raw = np.dot(exam_attrib, attention_all_layer) / (norm(exam_attrib)*norm(attention_all_layer))
print("Layer 12 Cosine Similarity raw attrib:\n", cosine_all_raw)

Layer 12 Cosine Similarity raw attrib:
 0.08240282540266103
Layer 12 Cosine Similarity raw attrib:
 0.0967547108684436


The attributions and the attentions have different ranges. The attributions could range from -1 to 1 whereas the attentions range from 0 to 1. However, negative attributions would not necessarily mean that they have the lowest attention, rather they might have really high attention as they are more likely to help the model predict the negative class, and might be something the attentions picked up as a feature. Therefore, we want to absolute value the attributions and then normalize the attentions and the attributions so they have the range of 0 to 1.

In [None]:
def normalize(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))

In [None]:
attention_final_layer2 = normalize(attention_final_layer)
attention_all_layer2 = normalize(attention_all_layer)

In [None]:
exam_attrib2 = np.abs(exam_attrib)
exam_attrib2 = normalize(exam_attrib2)

In [None]:
print(exam_attrib2)

[0.00000000e+00 3.09815393e-03 2.46389383e-04 ... 4.17945981e-05
 1.87499117e-03 0.00000000e+00]


Now we calculate cosine simularity using normalized attentions and attributions

In [None]:
cosine = np.dot(exam_attrib2, attention_final_layer2) / (norm(exam_attrib2)*norm(attention_final_layer2))
print("Layer 12 Cosine Similarity:\n", cosine)
cosine2 = np.dot(exam_attrib2, attention_all_layer2) / (norm(exam_attrib2)*norm(attention_all_layer2))
print("All layer Cosine Similarity:\n", cosine2)

Layer 12 Cosine Similarity:
 0.14192338496973678
All layer Cosine Similarity:
 0.11526970783935846


It might be interesting to know if only the top 50% of tokens share similar attentions and attributions, so after we absolute value and normalize the attentions and attributions, we apply a mask to set the values to 0 if they are not above the median attention or attribution respectively.

In [None]:
exam_attrib3 = np.abs(exam_attrib)
exam_attrib3 = normalize(exam_attrib3)
median_exam = np.percentile(exam_attrib3, 50)
exam_attrib3[exam_attrib3 < median_exam] = 0

In [None]:
attention_final_layer3 = np.copy(attention_final_layer)
attention_final_layer3 = normalize(attention_final_layer3)
median_12 = np.percentile(attention_final_layer3, 50)
attention_final_layer3[attention_final_layer3 < median_12] = 0

attention_all_layer3 = np.copy(attention_all_layer) 
attention_all_layer3 = normalize(attention_all_layer3)
median_all = np.percentile(attention_all_layer3, 50)
attention_all_layer3[attention_all_layer3 < median_all] = 0

Now we calculate cosine similarity for the median masked attributions and attentions.

In [None]:
cosine_med = np.dot(exam_attrib3, attention_final_layer3) / (norm(exam_attrib3)*norm(attention_final_layer3))
print("Layer 12 Cosine Similarity med:\n", cosine_med)
cosine_med2 = np.dot(exam_attrib3, attention_all_layer3) / (norm(exam_attrib3)*norm(attention_all_layer3))
print("All layer Cosine Similarity med:\n", cosine_med2)

Layer 12 Cosine Similarity med:
 0.13743335870599985
All layer Cosine Similarity med:
 0.10930940264629081


Now we do the same as above, but this time we mask all the values that are lower than the mean.

In [None]:
exam_attrib4 = np.abs(exam_attrib)
exam_attrib4 = normalize(exam_attrib4)
mean_exam = np.mean(exam_attrib4)
exam_attrib4[exam_attrib4 < mean_exam] = 0

In [None]:
attention_final_layer4 = np.copy(attention_final_layer)
attention_final_layer4 = normalize(attention_final_layer4)
mean_12 = np.mean(attention_final_layer4)
attention_final_layer4[attention_final_layer4 < mean_12] = 0

attention_all_layer4 = np.copy(attention_all_layer) 
attention_all_layer4 = normalize(attention_all_layer4)
mean_all = np.mean(attention_all_layer4)
attention_all_layer4[attention_all_layer4 < mean_all] = 0

Calculate cosine similarity for our mean-masked attentions and attributions.

In [None]:
cosine_mean = np.dot(exam_attrib4, attention_final_layer4) / (norm(exam_attrib4)*norm(attention_final_layer4))
print("Layer 12 Cosine Similarity mean:\n", cosine_mean)
cosine_mean2 = np.dot(exam_attrib4, attention_all_layer4) / (norm(exam_attrib4)*norm(attention_all_layer4))
print("All layer Cosine Similarity mean:\n", cosine_mean2)

Layer 12 Cosine Similarity mean:
 0.11986848068103072
All layer Cosine Similarity mean:
 0.09766475625066964


With our normalized attributions and attentions, tokens with the same rank in both the attention and attributions arrays can have drastically different values for both. Therefore, even if you have two arrays, when ranked, that have the same ordering, it may return a similarity that is low.

If we convert each value of the both arrays into their ranks w.r.t. their own array, it alleviates this problem as not only do both arrays have the same range, they also all have the exact same set of values (1-2048 or however many your max amount of tokens are). With an exact same set of values, we can make sure that if two tokens are the same rank in both arrays (indiciating that the attentions and the attributions have some degree of similarity), our cosine similarity picks up on that.

In [None]:
exam_attrib_rank = np.abs(exam_attrib)
order_attrib = exam_attrib_rank.argsort()
print(order_attrib)
ranks_attrib = order_attrib.argsort()
print(ranks_attrib)

[   0 2047 2008 ... 1700 1593 1544]
[   0 1055  120 ...   18  769    1]


In [None]:
attention_final_layer_rank = np.copy(attention_final_layer)
order = attention_final_layer_rank.argsort()
ranks = order.argsort()

attention_all_layer_rank = np.copy(attention_all_layer)
order2 = attention_all_layer_rank.argsort()
ranks2 = order2.argsort()

In [None]:
cosine_rank = np.dot(ranks_attrib, ranks) / (norm(ranks_attrib)*norm(ranks))
print("Layer 12 Cosine Similarity rank:\n", cosine_rank)
cosine_rank2 = np.dot(ranks_attrib, ranks2) / (norm(ranks_attrib)*norm(ranks2))
print("All layer Cosine Similarity rank:\n", cosine_rank2)

Layer 12 Cosine Similarity rank:
 0.8231117432528797
All layer Cosine Similarity rank:
 0.8196131959645447


Cosine similarities are not the only similarity metric we can use. Lets evaluate similarity on our example with two other metrics: [Kendalltau](https://www.jstor.org/stable/2332226), and [Rank-biased Overlap (RBO)](https://dl.acm.org/doi/10.1145/1852102.1852106).

With Kendalltau, you compare the similarities by passing in two arrays of rankings, meaning every item in your array is the rank of the item from 1-max_len.

In [None]:
import scipy.stats as stats
tau, p_value = stats.kendalltau(ranks_attrib, ranks)
print("Tau statistic layer 12:", tau, "p value", p_value)
tau2, p_value = stats.kendalltau(ranks_attrib, ranks2)
print("Tau statistic: all layers", tau, "p value", p_value)

Tau statistic layer 12: 0.2001347245969712 p value 5.712101810122379e-42
Tau statistic: all layers 0.18862397716170007 p value 1.787488784499021e-37


With RBO, instead of passing in an array of rankings, you rank each item in the array such that the item at array index 0 is the highest rank item, the item at array index 1 is the second highest, and the one at array index max_len -1 is the lowest. 

In [None]:
import rbo
rbo_1 = rbo.RankingSimilarity(order_attrib, order).rbo()
rbo_2 = rbo.RankingSimilarity(order_attrib, order2).rbo()
print("rbo layer 12", rbo_1)
print("rbo all", rbo_2)

rbo layer 12 0.5839508564662282
rbo all 0.5605875968068141


Here we compile all of the similarities we calculated into one dataframe for easier viewing.

In [None]:
d = {'example': [example], 'similarity normalized': [cosine], 'similarity raw': [cosine_raw], 'sim_norm w/ median threshold': [cosine_med], 'sim_norm w/ mean threshold': [cosine_mean], "sim w/ ranks":[cosine_rank], "kendall_tau":[tau], "RBO":[rbo_1]}
df = pd.DataFrame(data=d)
df

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,976,0.141923,0.082403,0.137433,0.119868,0.823112


We do the same here as we have another set of similarities we want to examine.

In [None]:
d2 = {'example': [example], 'similarity normalized': [cosine2], 'similarity raw': [cosine_all_raw], 'sim_norm w/ median threshold':[cosine_med2], 'sim_norm w/ mean threshold':[cosine_mean2], "sim w/ ranks":[cosine_rank2], "kendall_tau":[tau2], "RBO":[rbo_2]}
df2 = pd.DataFrame(data=d2)
df2

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,976,0.11527,0.096755,0.109309,0.097665,0.819613


While not completely necessary, you can save these dataframes into a csv and add onto it every time you look at a new example.

In [None]:
df_layer12 = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_layer12.csv")
df_all = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_all.csv")

In [None]:
df_layer12

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,148,0.147101,-0.02026,0.14267,0.123295,0.826202
1,589,0.396823,-0.189594,0.392803,0.383813,0.791068
2,605,0.299375,-0.165507,0.288221,0.271128,0.802063
3,891,0.175846,0.022566,0.173801,0.157258,0.842796
4,976,0.141923,0.082403,0.137433,0.119868,0.823112


In [None]:
df_all

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,148,0.140745,0.016048,0.134218,0.123968,0.805108
1,589,0.180767,-0.119204,0.170673,0.159155,0.807287
2,605,0.249735,-0.219944,0.233159,0.219811,0.784505
3,891,0.104379,-0.012725,0.100741,0.089762,0.799092
4,976,0.11527,0.096755,0.109309,0.097665,0.819613


Append the new row into the dataframe.

In [None]:
df_layer12 = pd.concat([df, df_layer12], axis=0)
df_all = pd.concat([df2, df_all], axis=0)

In [None]:
df_layer12

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,976,0.141923,0.082403,0.137433,0.119868,0.823112
0,148,0.147101,-0.02026,0.14267,0.123295,0.826202
1,589,0.396823,-0.189594,0.392803,0.383813,0.791068
2,605,0.299375,-0.165507,0.288221,0.271128,0.802063
3,891,0.175846,0.022566,0.173801,0.157258,0.842796
4,976,0.141923,0.082403,0.137433,0.119868,0.823112


In [None]:
df_all

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,976,0.11527,0.096755,0.109309,0.097665,0.819613
0,148,0.140745,0.016048,0.134218,0.123968,0.805108
1,589,0.180767,-0.119204,0.170673,0.159155,0.807287
2,605,0.249735,-0.219944,0.233159,0.219811,0.784505
3,891,0.104379,-0.012725,0.100741,0.089762,0.799092
4,976,0.11527,0.096755,0.109309,0.097665,0.819613


Sort the rows by example number.

In [None]:
df_layer12 = df_layer12.sort_values(by=['example'])
df_all = df_all.sort_values(by=['example'])

Save the dataframe

In [None]:
# df_layer12.to_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_layer12.csv", index=False)
# df_all.to_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_all.csv", index=False)

## Comparing Only the Highest Attentions and Attributions

With long pieces of text, it is generally unlikely that two tokens will have the same rankings. However, the intuition is that if the tokens have really high attributions, then it might have really high attentions as that might be what the model focused on when doing the predictions. As such, 

As such, we apply the same series of functions as we did for masking all the values below the median, but this time we mask all the values below the 95th percentile. 

In [None]:
attention_final_layer5 = np.copy(attention_final_layer)
attention_final_layer5 = normalize(attention_final_layer5)

attention_all_layer5 = np.copy(attention_all_layer) 
attention_all_layer5 = normalize(attention_all_layer5)

exam_attrib5 = np.abs(exam_attrib)
exam_attrib5 = normalize(exam_attrib5)
print(exam_attrib5)

[0.00000000e+00 3.09815393e-03 2.46389383e-04 ... 4.17945981e-05
 1.87499117e-03 0.00000000e+00]


In [None]:
top_final = np.percentile(attention_final_layer5, 95)
top_all = np.percentile(attention_all_layer5, 95)
top_attrib = np.percentile(exam_attrib5, 95)
print(top_attrib)

0.0348539137587435


In [None]:
attention_final_layer5[attention_final_layer5<top_final] = 0
attention_all_layer5[attention_all_layer5<top_all] = 0
exam_attrib5[exam_attrib5<top_attrib] = 0

In [None]:
print(exam_attrib5)

[0. 0. 0. ... 0. 0. 0.]


Calculating cosine similarities again with our new array.

In [None]:
cosine_thresh = np.dot(exam_attrib5, attention_final_layer5) / (norm(exam_attrib5)*norm(attention_final_layer5))
print("Layer 12 Cosine Similarity 95th:\n", cosine_thresh)
cosine_thresh2 = np.dot(exam_attrib5, attention_all_layer5) / (norm(exam_attrib5)*norm(attention_all_layer5))
print("All layer Cosine Similarity 95th:\n", cosine_thresh2)

Layer 12 Cosine Similarity 95th:
 0.07730618397762203
All layer Cosine Similarity 95th:
 0.06834394255989426


We do the same for our rankings, but we now set all the ranks below our 95th percentile to 0. Then, we calculate cosine similarities.

In [None]:
num = 2048 -np.ceil(2048 * 0.95)
exam_attrib_rank2 = np.copy(ranks_attrib)
exam_attrib_rank2[exam_attrib_rank2 > num] = 0

attention_final_layer_rank2 = np.copy(ranks)
attention_final_layer_rank2[attention_final_layer_rank2 > num] = 0

attention_all_layer_rank2 = np.copy(ranks2)
attention_all_layer_rank2[attention_all_layer_rank2 > num] = 0

In [None]:
cosine_rank_top = np.dot(exam_attrib_rank2, attention_final_layer_rank2) / (norm(exam_attrib_rank2)*norm(attention_final_layer_rank2))
print("Layer 12 Cosine Similarity 95th ranks:\n", cosine_rank_top)
cosine_rank_top2 = np.dot(exam_attrib_rank2, attention_all_layer_rank2) / (norm(exam_attrib_rank2)*norm(attention_all_layer_rank2))
print("All layer Cosine Similarity 95th ranks:\n", cosine_rank_top2)

Layer 12 Cosine Similarity 95th ranks:
 0.09870039419983007
All layer Cosine Similarity 95th ranks:
 0.05474780961401847


Of course, cosine similarity isn't the only metric that exists for similarities, so we try RBO again on our new arrays of ranks.

In [None]:
exam_attrib_order2 = np.copy(order_attrib)

attention_final_layer_order2 = np.copy(order)

attention_all_layer_order2 = np.copy(order2)

In [None]:
print("rbo layer 12 95th", rbo.RankingSimilarity(exam_attrib_order2[:int(num)], attention_final_layer_order2[:int(num)]).rbo())
print("rbo all 95th", rbo.RankingSimilarity(exam_attrib_order2[:int(num)], attention_all_layer_order2[:int(num)]).rbo())

rbo layer 12 95th 0.07311574269242707
rbo all 95th 0.04461643100927445


### Examining the Specifics

While seeing if the model's attributions and attentions are exactly the same is one way of comparing the two arrays, another method of determining whether or not the model puts the most focus onto the same group of tokens.

Here we are taking the set of position ids that make up the top 5 percent of tokens in both the attention and the attribution array. By doing so, we can find out which tokens both arrays have in common, and the tokens that are unique to both arrays. We will be able to identify which tokens are buzzwords in both the attention and the attributions, as well as doing one last similarity metric to check how agreeable the attention and the attributions are.

In [None]:
attention_final_layer_top = np.flatnonzero(attention_final_layer5)
attention_final_layer_top = set(attention_final_layer_top)

attention_all_layer_top = np.flatnonzero(attention_all_layer5)
attention_all_layer_top = set(attention_all_layer_top)

exam_attrib_top = np.flatnonzero(exam_attrib5)
exam_attrib_top = set(exam_attrib_top)
print(exam_attrib_top)

{1536, 1537, 1538, 1539, 1540, 1543, 1544, 1545, 1553, 27, 1571, 1572, 1576, 1579, 1072, 1591, 1593, 1595, 1601, 1606, 594, 1140, 1655, 1656, 1658, 1659, 634, 1666, 1670, 1677, 144, 1168, 1686, 1687, 1697, 1699, 1700, 1701, 1702, 1703, 1705, 181, 1723, 1734, 1735, 1740, 214, 1751, 1752, 1753, 1763, 1252, 1766, 1767, 749, 750, 751, 752, 1778, 757, 247, 1788, 1789, 1791, 1792, 1793, 1794, 1796, 1797, 1798, 266, 1802, 1803, 1804, 1805, 1806, 1808, 1811, 285, 1311, 1833, 1334, 1335, 1352, 1354, 1362, 343, 1944, 1440, 1459, 1460, 1469, 1480, 1484, 1488, 1493, 1505, 1506, 1508, 1527, 1531, 1534, 1023}


Grab the tokens stored in the all tokens dictionary so we can know which tokens we are working with as we currently only have the indices.

In [None]:
exam_tokens = all_tokens[str(example)]

Find out which tokens have the highest attentions but not the highest attributions, and display it in a dataframe with the unmasked attentions and the attributions.

In [None]:
diff = sorted(list(attention_final_layer_top - exam_attrib_top))
print(len(diff))
diff_tokens = [exam_tokens[idx] for idx in diff]
d_diff = {"token": diff_tokens, "position":diff, "attention_norm":attention_final_layer2[diff], "attention_rank": ranks[diff] "attribution_norm":exam_attrib2[diff], "attribution_rank":ranks_attrib[diff]}
df_diff = pd.DataFrame(d_diff)
df_diff

78


Unnamed: 0,token,position,attention_norm,attribution_norm
0,<s>,0,0.079779,0.000000
1,ĠLanguage,21,0.048963,0.002372
2,ĠAI,126,0.097651,0.034024
3,Ġimage,137,0.059384,0.012456
4,Ġcaption,138,0.053254,0.001756
...,...,...,...,...
73,.,1724,0.662315,0.014613
74,.,1892,0.637964,0.025349
75,.,1909,0.612990,0.026341
76,Ġlearns,1993,0.058394,0.007427


Let's check what tokens are different and how many times they appear.

In [None]:
print(df_diff['token'].value_counts())

.               26
Ġcaption         9
Ġtraining        4
Ġadvers          3
ing              3
arial            3
Ġgeneration      2
Ġimage           2
Ġgenerated       2
Ġword            2
Ġthe             2
Ġhuman           1
ĠCaption         1
Ġchallenge       1
Ġdeep            1
truth            1
Ġrecognizing     1
Ġperformance     1
Ġvisual          1
Ġon              1
<s>              1
Ġlanguage        1
Ġmedia           1
Ġskiing          1
ĠLanguage        1
Ġlearning        1
Ġwritten         1
Ġtask            1
Ġvocabulary      1
ĠAI              1
Ġlearns          1
Name: token, dtype: int64


Find out which tokens have the highest attributions but not the highest attentions.

In [None]:
diff2 = sorted(list(exam_attrib_top - attention_final_layer_top))
print(len(diff))
diff_tokens2 = [exam_tokens[idx] for idx in diff2]
d_diff2 = {"token": diff_tokens2, "position":diff2, "attention_norm": attention_final_layer2[diff2], "attention_rank": ranks[diff2], "attribution_norm":exam_attrib2[diff2], "attribution_rank":ranks_attrib[diff2]}
df_diff2 = pd.DataFrame(d_diff2)
df_diff2

78


Unnamed: 0,token,position,attention_norm,attribution_norm
0,ĠHuman,27,0.037913,0.039440
1,Ġhuman,144,0.042408,0.099735
2,Ġhuman,247,0.027774,0.042492
3,Ġhuman,285,0.033306,0.037411
4,Ġhuman,343,0.042498,0.125968
...,...,...,...,...
73,Ġapproaches,1805,0.016504,0.083835
74,Ġto,1806,0.015066,0.040588
75,Ġoptimize,1808,0.017454,0.048320
76,Ġmetrics,1811,0.010456,0.070791


Let's check what tokens are different and how many times they appear.

In [None]:
print(df_diff2['token'].value_counts())

Ġtraining        8
Ġhuman           7
Ġmodel           4
Ġto              4
Ġdiversity       2
Ġa               2
Ġsampling        2
Ġmethod          2
Ġmetrics         2
Ġbias            2
Ġmodels          2
Ġexposure        1
Ġscheme          1
Ġwhich           1
Ġoptimize        1
Ġbegins          1
Ġalgorithm       1
Ġoptimal         1
Ġdata            1
Ġdistribution    1
Ġhave            1
Ġfollowed        1
Ġstandard        1
Ġmaximum         1
Ġapproaches      1
Ġbased           1
ĠSeveral         1
Ġthis            1
Ġother           1
Ġscore           1
ĠHuman           1
Ġduring          1
Ġdifferent       1
Ġeasy            1
Ġfor             1
Ġstatistics      1
Ġstatistical     1
.                1
Ġ[               1
Ġattempts        1
Ġthe             1
Ġloss            1
Ġonly            1
Ġas              1
Ġafter           1
Ġhas             1
Ġsentences       1
-                1
Ġstudied         1
].               1
Ġword            1
Ġdescription     1
Name: token,

Find out which tokens are part of the highest attentions and highest attributions.

In [None]:
same = sorted(list(attention_final_layer_top & exam_attrib_top))
print(len(same))
same_tokens = [exam_tokens[idx] for idx in same]
d_same = {"token": same_tokens, "position":same, "attention_norm": attention_final_layer2[same], "attention_rank": ranks[same] "attribution_norm":exam_attrib2[same], "attribution_rank":ranks_attrib[same]}
df_same = pd.DataFrame(d_same)
df_same

25


Unnamed: 0,token,position,attention_norm,attribution_norm
0,Ġhumans,181,0.050926,0.035158
1,.,214,0.989555,0.049528
2,Ġtraining,266,0.077135,0.063398
3,Ġhumans,751,0.053678,0.158666
4,.,1072,0.85829,0.035006
5,.,1140,0.720484,0.036191
6,.,1168,0.667895,0.044381
7,Ġlanguage,1334,0.076873,0.306745
8,Ġtranslation,1354,0.045904,0.038866
9,Ġlearning,1362,0.085313,0.077581


Let's check what tokens are the same and how many times they appear.

In [None]:
print(df_same['token'].value_counts())

.                 11
Ġtraining          3
Ġlearning          3
Ġhumans            2
Ġlanguage          2
Ġtranslation       1
Ġlinguistic        1
Ġlearn             1
Ġreinforcement     1
Name: token, dtype: int64


Our final measure of similarity between the attention and the attributions uses Jaccard Index, which is the intersection of two sets divided by the union. This gives us an idea of how many tokens in our top 5% are the same and now many are different. 

In [None]:
def jaccard_similarity(set1, set2):
    intersection = len(list(set1.intersection(set2)))
    print(intersection)
    union = (len(set1) + len(set2)) - intersection
    print(union)
    return float(intersection) / union

In [None]:
jaccard_similarity(attention_final_layer_top, exam_attrib_top)

25
181


0.13812154696132597

### Removing Non-Alphanumeric Tokens

Here we run through the 95th percentile again, but we first mask all the non-alphanumeric tokens before we obtain our top 103 (top 5%) tokens.

In [None]:
attention_final_layer6 = np.copy(attention_final_layer)
attention_final_layer6 = normalize(attention_final_layer6)

attention_all_layer6 = np.copy(attention_all_layer) 
attention_all_layer6 = normalize(attention_all_layer6)

exam_attrib6 = np.abs(exam_attrib)
exam_attrib6 = normalize(exam_attrib6)
print(exam_attrib6)

exam_tokens = all_tokens[str(example)]
alpha_neumeric_nums = [idx for idx, element in enumerate(exam_tokens) if element.isalnum()]

[0.00000000e+00 3.09815393e-03 2.46389383e-04 ... 4.17945981e-05
 1.87499117e-03 0.00000000e+00]


Once we have masked all the non-alphanumeric tokens, we do the same masking as previous to obtain our top 5% of tokens.

In [None]:
mask = np.ones(attention_final_layer6.shape,dtype=bool) 
mask[alpha_neumeric_nums] = False

attention_final_layer6[mask] = 0
attention_all_layer6[mask] = 0
exam_attrib6[mask] = 0

In [None]:
top_final2 = np.percentile(attention_final_layer6, 95)
top_all2 = np.percentile(attention_all_layer6, 95)
top_attrib2 = np.percentile(exam_attrib6, 95)
print(top_attrib2)

0.030974660070814404


Like previously, we convert each array into a set.

In [None]:
attention_final_layer6[attention_final_layer6<top_final2] = 0
attention_all_layer6[attention_all_layer6<top_all2] = 0
exam_attrib6[exam_attrib6<top_attrib2] = 0

attention_final_layer_top2 = np.flatnonzero(attention_final_layer6)
attention_final_layer_top2 = set(attention_final_layer_top2)
print(len(attention_final_layer_top2))

attention_all_layer_top2 = np.flatnonzero(attention_all_layer6)
attention_all_layer_top2 = set(attention_all_layer_top2)
print(len(attention_all_layer_top2))

exam_attrib_top2 = np.flatnonzero(exam_attrib6)
exam_attrib_top2 = set(exam_attrib_top2)
print(len(exam_attrib_top2))

103
103
103


We once again find out which tokens have the highest attentions but not the highest attributions, and display it in a dataframe with the unmasked attentions and the attributions.

In [None]:
diff_alpha = sorted(list(attention_final_layer_top2 - exam_attrib_top2))
print(len(diff_alpha))
diff_alpha_tokens = [exam_tokens[idx] for idx in diff_alpha]
d_diff_alpha = {"token": diff_alpha_tokens, "position":diff_alpha, "attention_norm":attention_final_layer2[diff_alpha], "attribution_norm":exam_attrib2[diff_alpha]}
df_diff_alpha = pd.DataFrame(d_diff_alpha)
df_diff_alpha

81


Unnamed: 0,token,position,attention_norm,attribution_norm
0,ĠLanguage,21,0.048963,0.002372
1,arial,33,0.044238,0.000375
2,ĠTraining,34,0.042092,0.003152
3,Ġimage,137,0.059384,0.012456
4,Ġcaption,138,0.053254,0.001756
...,...,...,...,...
76,Ġgeneration,1561,0.052398,0.007117
77,Ġgeneration,1575,0.048096,0.011838
78,Ġexposure,1669,0.041588,0.025971
79,Ġtruth,1708,0.038915,0.003930


Let's check what tokens are different and how many times they appear.

In [None]:
print(df_diff_alpha['token'].value_counts())

Ġcaption            11
ing                  5
arial                5
Ġgeneration          5
Ġtraining            4
Ġadvers              4
Ġimage               4
Ġthe                 3
Ġgenerated           3
Ġvisual              3
Ġtask                2
Ġsearch              2
truth                2
Ġword                2
ĠNetworks            1
Ġtrained             1
Ġhuman               1
ĠLanguage            1
Ġdeep                1
ions                 1
Ġrecognizing         1
ĠCaption             1
Ġexposure            1
Ġtruth               1
Ġchallenge           1
Ġrepresentations     1
Ġon                  1
Ġperformance         1
Ġlanguage            1
Ġnetworks            1
Ġmedia               1
Ġsystems             1
Ġskiing              1
Ġlearning            1
Ġwritten             1
Ġgenerator           1
Ļ                    1
Ġvocabulary          1
ĠTraining            1
Ġlearns              1
Name: token, dtype: int64


We once again find out which tokens have the highest attributions but not the highest attentions, and display it in a dataframe with the unmasked attentions and the attributions.

In [None]:
diff_alpha2 = sorted(list(exam_attrib_top2- attention_final_layer_top2))
print(len(diff_alpha2))
diff_alpha_tokens2 = [exam_tokens[idx] for idx in diff_alpha2]
d_diff_alpha2 = {"token": diff_alpha_tokens2, "position":diff_alpha2, "attention_norm":attention_final_layer2[diff_alpha2], "attribution_norm":exam_attrib2[diff_alpha2]}
df_diff_alpha2 = pd.DataFrame(d_diff_alpha2)
df_diff_alpha2

81


Unnamed: 0,token,position,attention_norm,attribution_norm
0,ĠHuman,27,0.037913,0.039440
1,Ġhuman,247,0.027774,0.042492
2,Ġhuman,285,0.033306,0.037411
3,âĢ,521,0.020501,0.033046
4,Ġhuman,594,0.037367,0.066732
...,...,...,...,...
76,Ġapproaches,1805,0.016504,0.083835
77,Ġto,1806,0.015066,0.040588
78,Ġoptimize,1808,0.017454,0.048320
79,Ġmetrics,1811,0.010456,0.070791


Let's check what tokens are different and how many times they appear.

In [None]:
print(df_diff_alpha2['token'].value_counts())

Ġtraining        6
Ġhuman           5
Ġmodel           5
Ġto              4
Ġdiversity       3
Ġmodels          3
Ġmetrics         2
Ġsampling        2
Ġmethod          2
Ġthe             2
Ġa               2
Ġbias            2
Ġsentences       2
Ġgradually       1
Ġscheme          1
Ġwhich           1
Ġbegins          1
Ġoptimize        1
Ġalgorithm       1
Ġoptimal         1
Ġdata            1
Ġdistribution    1
Ġhave            1
Ġstandard        1
Ġmaximum         1
ĠSeveral         1
Ġother           1
Ġapproaches      1
Ġbased           1
Ġthis            1
Ġfollowed        1
Ġscore           1
ĠHuman           1
Ġduring          1
Ġas              1
âĢ               1
Ġeasy            1
Ġfor             1
Ġstatistics      1
Ġstatistical     1
Ġattempts        1
Ġdifferent       1
Ġloss            1
Ġafter           1
Ġonly            1
Ġhas             1
Ġchallenge       1
Ġin              1
Ġstudied         1
Ġdescription     1
Ġpredict         1
Ġtest            1
Ġpredicted  

Lastly, we find out which tokens are part of the highest attentions and highest attributions.

In [None]:
same_alpha = sorted(list(exam_attrib_top2 & attention_final_layer_top2))
print(len(same_alpha))
same_alpha_tokens = [exam_tokens[idx] for idx in same_alpha]
d_same_alpha = {"token": same_alpha_tokens, "position":same_alpha, "attention_norm":attention_final_layer2[same_alpha], "attribution_norm":exam_attrib2[same_alpha]}
df_same_alpha = pd.DataFrame(d_same_alpha)
df_same_alpha

22


Unnamed: 0,token,position,attention_norm,attribution_norm
0,ĠAI,126,0.097651,0.034024
1,Ġhuman,144,0.042408,0.099735
2,Ġhumans,181,0.050926,0.035158
3,Ġtraining,223,0.06872,0.033059
4,Ġtraining,266,0.077135,0.063398
5,Ġhuman,343,0.042498,0.125968
6,Ġhumans,751,0.053678,0.158666
7,Ġlanguage,1334,0.076873,0.306745
8,Ġtranslation,1354,0.045904,0.038866
9,Ġlearning,1362,0.085313,0.077581


Check what tokens are the same and how many times they appear.

In [None]:
print(df_same_alpha['token'].value_counts())

Ġtraining         6
Ġlearning         3
Ġhuman            2
Ġhumans           2
Ġlanguage         2
ĠAI               1
Ġtranslation      1
Ġdialogue         1
Ġlinguistic       1
Ġlearn            1
Ġexposure         1
Ġreinforcement    1
Name: token, dtype: int64


Finally, we get the Jaccard Index to indentify how similar our group of top attribtuions and attentions are.

In [None]:
jaccard_similarity(attention_final_layer_top2, exam_attrib_top2)

22
184


0.11956521739130435