This notebook explores the relation between the model's attributions and attentions for a given example. Historically, we found that attentions are not a feasible method of explanation whereas attributions are, but attributions are also not part of a model's traditional outputs. Therefore it may be interesting to see if we can find anything with attentions by comparing them to a feasible and plausible method of explanation.

In [1]:
# from google.colab import drive
# drive.mount('/content/drive')

## Import dependencies

In [2]:
# pip install transformers --quiet

In [3]:
# pip install captum --quiet

In [4]:
# pip install datasets --quiet

In [5]:
pip install rbo

Note: you may need to restart the kernel to use updated packages.


In [6]:
import os

In [7]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [8]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

In [9]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
model_path = 'danielhou13/longformer-finetuned_papers_v2'
#model_path = 'danielhou13/longformer-finetuned-new-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Create functions that give us the input ids and the position ids for the text we want to examine

In [10]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [11]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [12]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

Import dataset and take a few examples from it for testing purposes

Here we import the papers dataset

In [13]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402dataset")["test"]

Using custom data configuration danielhou13--cogs402dataset-3b57e27666917d08
Reusing dataset parquet (C:\Users\danie\.cache\huggingface\datasets\parquet\danielhou13--cogs402dataset-3b57e27666917d08\0.0.0\0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/2 [00:00<?, ?it/s]

Here we import the news dataset

In [14]:
# cogs402_ds2 = load_dataset('hyperpartisan_news_detection', 'bypublisher')['validation']
# val_size = 5000
# val_indices = np.random.randint(0, len(cogs402_ds2), val_size)
# val_ds = cogs402_ds2.select(val_indices)
# labels2 = map(int, val_ds['hyperpartisan'])
# labels2 = list(labels2)
# val_ds = val_ds.add_column("labels", labels2)

In [15]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

Perform Layer Integrated Gradients using the longformer's embeddings

In [16]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    return attributions

In [17]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)

This function will let us get the example and the baseline inputs in order to perform integrated gradients, and add the attributions to our visualization tool. Additionally, we will add the attributions and tokens for each example into an array so we can use them when we want to further example the attributions scores for each example

In [18]:
all_attributions = {}
all_tokens = {}

In [19]:
all_attributions = torch.load('example_attrib_dict.pt')

In [20]:
example = 891
text = cogs402_ds['text'][example]
label = cogs402_ds['labels'][example]

input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
attention_mask = construct_attention_mask(input_ids)

indices = input_ids[0].detach().tolist()
all_tokens_curr = tokenizer.convert_ids_to_tokens(indices)

all_tokens[str(example)] = all_tokens_curr

In [21]:
# attributions, delta = lig.attribute(inputs=input_ids,
#                                   baselines=ref_input_ids,
#                                   return_convergence_delta=True,
#                                   additional_forward_args=(position_ids, attention_mask),
#                                   target=1,
#                                   n_steps=1500,
#                                   internal_batch_size = 2)

# attributions_sum = summarize_attributions(attributions)

# all_attributions[str(example)] = attributions_sum.detach().cpu().numpy()

In [22]:
# torch.save(all_attributions, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict.pt')

In [23]:
# all_attributions = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict.pt')

In [24]:
print(all_attributions)

{'605': array([ 0.00000000e+00, -3.40173878e-06, -1.12807776e-06, ...,
        4.23677990e-06,  7.29073352e-06,  0.00000000e+00]), '976': array([ 0.00000000e+00, -1.84040401e-04,  1.46363292e-05, ...,
       -2.48273481e-06,  1.11380563e-04,  0.00000000e+00]), '148': array([ 0.        ,  0.00225466,  0.00335924, ...,  0.00014387,
       -0.000275  ,  0.        ]), '891': array([ 0.00000000e+00, -1.05098504e-03, -2.83239306e-03, ...,
        1.52405500e-06,  9.14739375e-05,  0.00000000e+00])}


We then get the attentions and global attentions so we can compare with the attributions

In [25]:
output = model(input_ids.cuda(), attention_mask=attention_mask.cuda(), labels=torch.tensor(label).cuda(), output_attentions = True)
batch_attn = output[-2]
output_attentions = torch.stack(batch_attn).cpu()
global_attention = output[-1]
output_global_attentions = torch.stack(global_attention).cpu()
print("output_attention.shape", output_attentions.shape)
print("gl_output_attention.shape", output_global_attentions.shape)

output_attention.shape torch.Size([12, 1, 12, 2048, 514])
gl_output_attention.shape torch.Size([12, 1, 12, 2048, 1])


Since the longformer has a unique attention matrix shape, we convert it into the required sequence length x sequence length matrix

In [26]:
def create_head_matrix(output_attentions, global_attentions):
    new_attention_matrix = torch.zeros((output_attentions.shape[0], 
                                      output_attentions.shape[0]))
    for i in range(output_attentions.shape[0]):
        test_non_zeroes = torch.nonzero(output_attentions[i]).squeeze()
        test2 = output_attentions[i][test_non_zeroes[1:]]
        new_attention_matrix_indices = test_non_zeroes[1:]-257 + i
        new_attention_matrix[i][new_attention_matrix_indices] = test2
        new_attention_matrix[i][0] = output_attentions[i][0]
        new_attention_matrix[0] = global_attentions.squeeze()[:output_attentions.shape[0]]
    return new_attention_matrix


def attentions_all_heads(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = create_head_matrix(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

def all_batches(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = attentions_all_heads(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

def all_layers(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = all_batches(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

In [27]:
converted_mat = all_layers(output_attentions, output_global_attentions).detach().cpu().numpy()
print(converted_mat.shape)

(12, 1, 12, 2048, 2048)


We scale the attention matrix by head importance

In [28]:
head_importance = torch.load("T3-vis/papers/head_importance.pt")
# head_importance = torch.load("/content/drive/MyDrive/cogs402longformer/t3-visapplication/resources/news/head_importance.pt")

In [29]:
def scale_by_importance(attention_matrix, head_importance):
  new_matrix = np.zeros_like(attention_matrix)
  for i in range(attention_matrix.shape[0]):
    head_importance_layer = head_importance[i]
    for j in range(attention_matrix.shape[1]):
      new_matrix[i,j] = attention_matrix[i,j] * np.expand_dims(head_importance_layer, axis=(1,2))
  return new_matrix

In [30]:
converted_mat_importance = scale_by_importance(converted_mat, head_importance)

We get the attentions for each token. The shape of the attention matrix is layer x batch x head x seq_len x seq_len.

In [31]:
attention_matrix_importance = converted_mat_importance.sum(axis=3)
print(attention_matrix_importance.shape)

(12, 1, 12, 2048)


Sum the attentions for the last layer and over all layers

In [32]:
attention_final_layer = attention_matrix_importance[11].squeeze().sum(axis=0)
attention_all_layer = attention_matrix_importance.squeeze().sum(axis=1)
attention_all_layer = attention_all_layer.sum(axis=0)
print(attention_all_layer.shape)

(2048,)


Grab the attributions we stored

In [33]:
exam_attrib = all_attributions[str(example)]

Since we have the attributions and the attentions, we want to see how the largest attributions (in terms of magnitude) compares to the highest attentions.

Cosine similarity using the raw attributions and attentions

In [34]:
from numpy.linalg import norm
cosine_raw = np.dot(exam_attrib, attention_final_layer) / (norm(exam_attrib)*norm(attention_final_layer))
print("Layer 12 Cosine Similarity raw attrib:\n", cosine_raw)
cosine_all_raw = np.dot(exam_attrib, attention_all_layer) / (norm(exam_attrib)*norm(attention_all_layer))
print("Layer 12 Cosine Similarity raw attrib:\n", cosine_all_raw)

Layer 12 Cosine Similarity raw attrib:
 0.02248637021832213
Layer 12 Cosine Similarity raw attrib:
 -0.012734362617167585


The attributions and the attentions have different ranges. The attributions could range from -1 to 1 whereas the attentions range from 0 to 1. However, negative attributions would not necessarily mean that they have the lowest attention, rather they might have really high attention as they are more likely to help the model predict the negative class, and might be something the attentions picked up as a feature.

In [35]:
def normalize(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))

In [36]:
attention_final_layer2 = normalize(attention_final_layer)
attention_all_layer2 = normalize(attention_all_layer)

In [37]:
exam_attrib2 = np.abs(exam_attrib)
exam_attrib2 = normalize(exam_attrib2)

In [38]:
print(exam_attrib2)

[0.00000000e+00 4.56952825e-03 1.23148281e-02 ... 6.62636680e-06
 3.97715217e-04 0.00000000e+00]


Calculate cosine simularity using normalized attentions and attributions

In [39]:
cosine = np.dot(exam_attrib2, attention_final_layer2) / (norm(exam_attrib2)*norm(attention_final_layer2))
print("Layer 12 Cosine Similarity:\n", cosine)
cosine2 = np.dot(exam_attrib2, attention_all_layer2) / (norm(exam_attrib2)*norm(attention_all_layer2))
print("All layer Cosine Similarity:\n", cosine2)

Layer 12 Cosine Similarity:
 0.17583135298631875
All layer Cosine Similarity:
 0.10448268465151779


Cosine similarity while setting all the attention and attribution values below the median to 0

In [40]:
exam_attrib3 = np.abs(exam_attrib)
exam_attrib3 = normalize(exam_attrib3)
median_exam = np.percentile(exam_attrib3, 50)
exam_attrib3[exam_attrib3 < median_exam] = 0

In [41]:
attention_final_layer3 = np.copy(attention_final_layer)
attention_final_layer3 = normalize(attention_final_layer3)
median_12 = np.percentile(attention_final_layer3, 50)
attention_final_layer3[attention_final_layer3 < median_12] = 0

attention_all_layer3 = np.copy(attention_all_layer) 
attention_all_layer3 = normalize(attention_all_layer3)
median_all = np.percentile(attention_all_layer3, 50)
attention_all_layer3[attention_all_layer3 < median_all] = 0

In [42]:
cosine_med = np.dot(exam_attrib3, attention_final_layer3) / (norm(exam_attrib3)*norm(attention_final_layer3))
print("Layer 12 Cosine Similarity med:\n", cosine_med)
cosine_med2 = np.dot(exam_attrib3, attention_all_layer3) / (norm(exam_attrib3)*norm(attention_all_layer3))
print("All layer Cosine Similarity med:\n", cosine_med2)

Layer 12 Cosine Similarity med:
 0.17376752101559828
All layer Cosine Similarity med:
 0.10084104034199037


Cosine similarity while setting all the attention and attribution values below the mean to 0

In [43]:
exam_attrib4 = np.abs(exam_attrib)
exam_attrib4 = normalize(exam_attrib4)
mean_exam = np.mean(exam_attrib4)
exam_attrib4[exam_attrib4 < mean_exam] = 0

In [44]:
attention_final_layer4 = np.copy(attention_final_layer)
attention_final_layer4 = normalize(attention_final_layer4)
mean_12 = np.mean(attention_final_layer4)
attention_final_layer4[attention_final_layer4 < mean_12] = 0

attention_all_layer4 = np.copy(attention_all_layer) 
attention_all_layer4 = normalize(attention_all_layer4)
mean_all = np.mean(attention_all_layer4)
attention_all_layer4[attention_all_layer4 < mean_all] = 0

In [45]:
cosine_mean = np.dot(exam_attrib4, attention_final_layer4) / (norm(exam_attrib4)*norm(attention_final_layer4))
print("Layer 12 Cosine Similarity mean:\n", cosine_mean)
cosine_mean2 = np.dot(exam_attrib4, attention_all_layer4) / (norm(exam_attrib4)*norm(attention_all_layer4))
print("All layer Cosine Similarity mean:\n", cosine_mean2)

Layer 12 Cosine Similarity mean:
 0.15724215783372744
All layer Cosine Similarity mean:
 0.08985276883412399


Cosine Similarity using the ranks of each token

In [46]:
exam_attrib_rank = np.abs(exam_attrib)
order_attrib = exam_attrib_rank.argsort()
print(order_attrib)
ranks_attrib = order_attrib.argsort()+1

[   0 2047 2037 ...  695   32  720]


In [47]:
attention_final_layer_rank = np.copy(attention_final_layer)
order = attention_final_layer_rank.argsort()
print(order)
ranks = order.argsort()+1
print(ranks)

attention_all_layer_rank = np.copy(attention_all_layer)
order2 = attention_all_layer_rank.argsort()
ranks2 = order2.argsort()+1

[1662 1922 2014 ... 1864 1823 1803]
[1528  239  308 ...  360  187 1114]


In [48]:
cosine_rank = np.dot(ranks_attrib, ranks) / (norm(ranks_attrib)*norm(ranks))
print("Layer 12 Cosine Similarity rank:\n", cosine_rank)
cosine_rank2 = np.dot(ranks_attrib, ranks2) / (norm(ranks_attrib)*norm(ranks2))
print("All layer Cosine Similarity rank:\n", cosine_rank2)

Layer 12 Cosine Similarity rank:
 0.8430761265725671
All layer Cosine Similarity rank:
 0.7993863440837687


Kendall Tau metric

In [49]:
import scipy.stats as stats
tau, p_value = stats.kendalltau(ranks_attrib, ranks)
print("Tau statistic layer 12:", tau, "p value", p_value)
tau, p_value = stats.kendalltau(ranks_attrib, ranks2)
print("Tau statistic: all layers", tau, "p value", p_value)

Tau statistic layer 12: 0.25441099016853935 p value 1.0218679901904437e-66
Tau statistic: all layers 0.13212456491206645 p value 3.2100350471997357e-19


In [50]:
import rbo
print("rbo layer 12", rbo.RankingSimilarity(order_attrib, order).rbo())
print("rbo all", rbo.RankingSimilarity(order_attrib, order2).rbo())

rbo layer 12 0.6031866800272659
rbo all 0.5311173049407113


The cosine similarity using only the last layer of attentions

In [51]:
d = {'example': [example], 'similarity normalized': [cosine], 'similarity raw': [cosine_raw], 'sim_norm w/ median threshold':[cosine_med], 'sim_norm w/ mean threshold':[cosine_mean], "sim w/ ranks":cosine_rank}
df = pd.DataFrame(data=d)
df

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,891,0.175831,0.022486,0.173768,0.157242,0.843076


The cosine similarity using all layers

In [52]:
d2 = {'example': [example], 'similarity normalized': [cosine2], 'similarity raw': [cosine_all_raw], 'sim_norm w/ median threshold':[cosine_med2], 'sim_norm w/ mean threshold':[cosine_mean2], "sim w/ ranks":cosine_rank2}
df2 = pd.DataFrame(data=d2)
df2

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,891,0.104483,-0.012734,0.100841,0.089853,0.799386


In [53]:
df_layer12 = pd.read_csv("cos_sim_layer12.csv")
df_all = pd.read_csv("cos_sim_all.csv")

In [54]:
df_layer12

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,148,0.147101,-0.02026,0.14267,0.123295,0.826202
1,605,0.299375,-0.165507,0.288221,0.271128,0.802063
2,891,0.175846,0.022566,0.173801,0.157258,0.842796
3,976,0.141923,0.082403,0.137433,0.119868,0.823112


In [55]:
df_all

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,148,0.140745,0.016048,0.134218,0.123968,0.805108
1,605,0.249735,-0.219944,0.233159,0.219811,0.784505
2,891,0.104379,-0.012725,0.100741,0.089762,0.799092
3,976,0.11527,0.096755,0.109309,0.097665,0.819613


Append the new row into the dataframe.

Comment out if revisiting a dataframe.

In [56]:
# df_layer12 = pd.concat([df, df_layer12], axis=0)
# df_all = pd.concat([df2, df_all], axis=0)

In [57]:
df_layer12

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,148,0.147101,-0.02026,0.14267,0.123295,0.826202
1,605,0.299375,-0.165507,0.288221,0.271128,0.802063
2,891,0.175846,0.022566,0.173801,0.157258,0.842796
3,976,0.141923,0.082403,0.137433,0.119868,0.823112


In [58]:
df_all

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,148,0.140745,0.016048,0.134218,0.123968,0.805108
1,605,0.249735,-0.219944,0.233159,0.219811,0.784505
2,891,0.104379,-0.012725,0.100741,0.089762,0.799092
3,976,0.11527,0.096755,0.109309,0.097665,0.819613


In [59]:
df_layer12 = df_layer12.sort_values(by=['example'])
df_all = df_all.sort_values(by=['example'])

Save the dataframe

In [60]:
# df_layer12.to_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_layer12.csv", index=False)
# df_all.to_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_all.csv", index=False)

We know from the cosine similarities that it does not seem like the attribtions and the attentions are very similar; however, we can find out if there are similarities in the tokens in the highest percentiles.


In [61]:
attention_final_layer5 = np.copy(attention_final_layer)
attention_final_layer5 = normalize(attention_final_layer5)

attention_all_layer5 = np.copy(attention_all_layer) 
attention_all_layer5 = normalize(attention_all_layer5)

exam_attrib5 = np.abs(exam_attrib)
exam_attrib5 = normalize(exam_attrib5)
print(exam_attrib5)

[0.00000000e+00 4.56952825e-03 1.23148281e-02 ... 6.62636680e-06
 3.97715217e-04 0.00000000e+00]


In [62]:
top_final = np.percentile(attention_final_layer5, 95)
top_all = np.percentile(attention_final_layer5, 95)
top_attrib = np.percentile(exam_attrib5, 95)
print(top_attrib)

0.048841727922329804


In [63]:
attention_final_layer5[attention_final_layer5<top_final] = 0
attention_all_layer5[attention_all_layer5<top_all] = 0
exam_attrib5[exam_attrib5<top_attrib] = 0

In [64]:
print(exam_attrib5)

[0. 0. 0. ... 0. 0. 0.]


In [65]:
cosine_thresh = np.dot(exam_attrib5, attention_final_layer5) / (norm(exam_attrib5)*norm(attention_final_layer5))
print("Layer 12 Cosine Similarity 95th:\n", cosine_thresh)
cosine_thresh2 = np.dot(exam_attrib5, attention_all_layer5) / (norm(exam_attrib5)*norm(attention_all_layer5))
print("All layer Cosine Similarity 95th:\n", cosine_thresh2)

Layer 12 Cosine Similarity 95th:
 0.13128457143072728
All layer Cosine Similarity 95th:
 0.02065351602442174


In [66]:
num = 2048 - np.ceil(2048 * 0.95)
print(num)

exam_attrib_rank2 = np.copy(ranks_attrib)
exam_attrib_rank2[exam_attrib_rank2 > num] = 0
print(exam_attrib_rank2)

attention_final_layer_rank2 = np.copy(ranks)
attention_final_layer_rank2[attention_final_layer_rank2 > num] = 0
print(attention_final_layer_rank2)

attention_all_layer_rank2 = np.copy(ranks2)
attention_all_layer_rank2[attention_all_layer_rank2 > num] = 0

102.0
[ 1  0  0 ... 18  0  2]
[0 0 0 ... 0 0 0]


In [67]:
cosine_rank_top = np.dot(exam_attrib_rank2, attention_final_layer_rank2) / (norm(exam_attrib_rank2)*norm(attention_final_layer_rank2))
print("Layer 12 Cosine Similarity 95th ranks:\n", cosine_rank_top)
cosine_rank_top2 = np.dot(exam_attrib_rank2, attention_all_layer_rank2) / (norm(exam_attrib_rank2)*norm(attention_all_layer_rank2))
print("All layer Cosine Similarity 95th ranks:\n", cosine_rank_top2)

Layer 12 Cosine Similarity 95th ranks:
 0.10306584390801075
All layer Cosine Similarity 95th ranks:
 0.021016561964591663


In [68]:
exam_attrib_order2 = np.copy(order_attrib)
print(exam_attrib_order2[:int(num)])

attention_final_layer_order2 = np.copy(order)
print(attention_final_layer_order2[:int(num)])

attention_all_layer_order2 = np.copy(order2)
print(attention_all_layer_order2[:int(num)])

[   0 2047 2037 1621 1683  364 1972 1881 1819  491 1254 1934 1604 1612
 1813 1780 1952 2045 1922 1866 1987 1687 1983 1960 1591 1849 1933  935
 2026 1893 1153 1706 1835 2022 1867 1262 1921 1349 1841 1899 1929 1947
 2042 1949 1195 1677 1885 1280 1887 1831 1644 1930 1828 1984 1938 1185
  913 1889 2009 1918 2023 1134 1969 1903 1985 1964 1965 1978 1940 1839
 1998 2003 1994 1948 2020 1966 2014 1645 1371 1606 1907 1757 1726 1331
 2033 1927 1848 1812 2007 1882 1924 1698 1454 2010 1412 1665 1720 1981
 1992 1467  930 1678]
[1662 1922 2014 1858 1927 1764  592 1615 1960 1796 1528  977 1868   12
  118 1586 1986 1999 1014 1751 1089 1601 1985   50 1598 1641 1862 1987
 2022 1509 1669   61 1446 2033 1596   57 1663 1774 1141  609 1982  121
  362   52 1088 2008 1494 2031 1066 1854  942    5 1594 2036 1436    8
 1715 2017 1605 1116 1856 1608 1950  165 2035 1595   11  883 2037 1938
 1936 1980 1770   22 1496 2006 1454 1119    6 1988 1016 1602 1819 1590
 2010 1024 1944   35 1506 1756 1903 1931  831  745 1616

In [69]:
print("rbo layer 12 95th", rbo.RankingSimilarity(exam_attrib_order2[:int(num)], attention_final_layer_order2[:int(num)]).rbo())
print("rbo all 95th", rbo.RankingSimilarity(exam_attrib_order2[:int(num)], attention_all_layer_order2[:int(num)]).rbo())

rbo layer 12 95th 0.07937527536789793
rbo all 95th 0.04187165721843579


In [70]:
attention_final_layer_top = np.flatnonzero(attention_final_layer5)
attention_final_layer_top = set(attention_final_layer_top)

attention_all_layer_top = np.flatnonzero(attention_all_layer5)
attention_all_layer_top = set(attention_all_layer_top)

exam_attrib_top = np.flatnonzero(exam_attrib5)
exam_attrib_top = set(exam_attrib_top)

Grab the tokens stored in the all tokens dictionary so we can know which tokens we are working with as we currently only have the indices.

In [71]:
exam_tokens = all_tokens[str(example)]

Find out which tokens have the highest attentions but not the highest attributions

In [72]:
diff = sorted(list(attention_final_layer_top - exam_attrib_top))
print(len(diff))
diff_tokens = [exam_tokens[idx] for idx in diff]
d_diff = {"token": diff_tokens, "position":diff, "attention_norm":attention_final_layer5[diff], "attribution_norm":exam_attrib5[diff]}
df_diff = pd.DataFrame(d_diff)
df_diff

73


Unnamed: 0,token,position,attention_norm,attribution_norm
0,Ġimperative,228,0.069075,0.0
1,Ġmulti,266,0.034261,0.0
2,aint,313,0.040501,0.0
3,.,413,0.536035,0.0
4,Ġresource,429,0.034572,0.0
...,...,...,...,...
68,.,1803,1.000000,0.0
69,.,1823,0.976264,0.0
70,.,1864,0.928553,0.0
71,.,1910,0.861682,0.0


In [73]:
print(df_diff['token'].value_counts())

.                 21
Ġimperative        2
Ġdistributed       2
Ġmulti             2
OP                 2
Ġcoordination      2
Ġresource          2
Ġpassing           1
Ġmessage           1
Ġutilities         1
ĠLogic             1
Ġimplicit          1
Ġfunctions         1
Ġoptimized         1
Ġsol               1
Ġpropag            1
Ġconsistency       1
Ġmodel             1
Ġsolve             1
Ġlanguage          1
Ġof                1
agent              1
Ġallocation        1
lf                 1
Ġprotocols         1
Ġinput             1
Ġinformation       1
Ġexpressive        1
pling              1
aint               1
Ġscheduling        1
Ġmeetings          1
Ġa                 1
Ġnetwork           1
Ġevacuation        1
Ġpower             1
Ġdistribution      1
Ġcoalition         1
Ġlogistics         1
Ġoperations        1
-                  1
Ļ                  1
T                  1
Ġresolution        1
Ġdecentralized     1
ference            1
ĠProgramming       1
Name: token, 

Find out which tokens have the highest attributions but not the highest attentions

In [74]:
diff2 = sorted(list(exam_attrib_top - attention_final_layer_top))
print(len(diff))
diff_tokens2 = [exam_tokens[idx] for idx in diff2]
d_diff2 = {"token": diff_tokens2, "position":diff2, "attention_norm": attention_final_layer5[diff2], "attribution_norm":exam_attrib5[diff2]}
df_diff2 = pd.DataFrame(d_diff2)
df_diff2

73


Unnamed: 0,token,position,attention_norm,attribution_norm
0,Ġ2017,20,0.0,0.064682
1,ĠUnder,22,0.0,0.063502
2,Ġpublication,25,0.0,0.121091
3,ĠTheory,27,0.0,0.095636
4,ĠPractice,29,0.0,0.198533
...,...,...,...,...
68,Ġdecl,1025,0.0,0.066455
69,ar,1026,0.0,0.062032
70,Ġalgorithm,1099,0.0,0.134293
71,ization,1537,0.0,0.050738


In [75]:
print(df_diff2['token'].value_counts())

Ġalgorithms       6
ĠLogic            4
Ġof               3
ĠProgramming      3
ĠProblems         3
Ġalgorithm        3
Ġin               2
Ġ                 2
ĠOptim            2
Ġ2017             1
Ġproofs           1
Ġresults          1
Ġexperimental     1
Ġnetworks         1
Ġresearchers      1
Ġscenarios        1
ĠResearchers      1
Ġagents           1
Ġfield            1
ĠDC               1
Ġcontinue         1
Ġdevelop          1
Ġmajority         1
-                 1
based             1
Ġthe              1
Ġcommands         1
Ġdecl             1
ar                1
T                 1
Ġlimitations      1
Ġconsideration    1
ĠAnswer           1
Ġpublication      1
ĠTheory           1
ĠPractice         1
Ġ1                1
ĠTie              1
ĠComputer         1
ĠDepartment       1
edu               1
Ġsubmitted        1
ĠSet              1
1                 1
ributed           1
Ġnovel            1
Ġcontributions    1
Ġprograms         1
;                 1
Ġcounterpart      1


Find out which tokens are part of the highest attentions and highest attributions.

In [76]:
same = sorted(list(attention_final_layer_top & exam_attrib_top))
print(len(same))
same_tokens = [exam_tokens[idx] for idx in same]
d_same = {"token": same_tokens, "position":same, "attention_norm": attention_final_layer5[same], "attribution_norm":exam_attrib5[same]}
df_same = pd.DataFrame(d_same)
df_same

30


Unnamed: 0,token,position,attention_norm,attribution_norm
0,.,110,0.551079,0.185865
1,).,155,0.627575,0.100035
2,Ġlogic,177,0.042015,0.064856
3,Ġlogic,199,0.039247,0.086852
4,Ġprogramming,200,0.057668,0.331142
5,Ġprogramming,229,0.060675,0.388581
6,Ġmemory,247,0.061827,0.076359
7,agent,268,0.053241,0.05931
8,.,276,0.663305,0.231863
9,ĠProgramming,286,0.05231,0.401087


In [77]:
print(df_same['token'].value_counts())

.                 5
Ġprogramming      5
ĠProgramming      3
Ġlogic            2
Ġalgorithm        1
Ġsearch           1
Ġvalue            1
ĠSearch           1
Ġalgorithms       1
Ġsolving          1
Ġsophisticated    1
Ġdisaster         1
ĠASP              1
).                1
Ġsensors          1
Ġtargets          1
agent             1
Ġmemory           1
Ġimperative       1
Name: token, dtype: int64


In [78]:
def jaccard_similarity(set1, set2):
    intersection = len(list(set1.intersection(set2)))
    print(intersection)
    union = (len(set1) + len(set2)) - intersection
    print(union)
    return float(intersection) / union

In [79]:
jaccard_similarity(attention_final_layer_top, exam_attrib_top)

30
176


0.17045454545454544