<a href="https://colab.research.google.com/github/danielhou13/cogs402longformer/blob/main/src/Attention_attribution_cosine_sim.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook explores the relation between the model's attributions and attentions for a given example. Historically, we found that attentions are not a feasible method of explanation whereas attributions are, but attributions are also not part of a model's traditional outputs. Therefore it may be interesting to see if we can find anything with attentions by comparing them to a feasible and plausible method of explanation.

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Import dependencies

In [6]:
pip install transformers --quiet

In [7]:
pip install captum --quiet

[?25l[K     |▎                               | 10 kB 29.2 MB/s eta 0:00:01[K     |▌                               | 20 kB 35.2 MB/s eta 0:00:01[K     |▊                               | 30 kB 26.8 MB/s eta 0:00:01[K     |█                               | 40 kB 9.3 MB/s eta 0:00:01[K     |█▏                              | 51 kB 7.3 MB/s eta 0:00:01[K     |█▍                              | 61 kB 8.7 MB/s eta 0:00:01[K     |█▋                              | 71 kB 9.4 MB/s eta 0:00:01[K     |█▉                              | 81 kB 10.2 MB/s eta 0:00:01[K     |██                              | 92 kB 11.3 MB/s eta 0:00:01[K     |██▎                             | 102 kB 9.9 MB/s eta 0:00:01[K     |██▌                             | 112 kB 9.9 MB/s eta 0:00:01[K     |██▊                             | 122 kB 9.9 MB/s eta 0:00:01[K     |███                             | 133 kB 9.9 MB/s eta 0:00:01[K     |███▏                            | 143 kB 9.9 MB/s eta 0:00:01[K

In [8]:
pip install datasets --quiet

[K     |████████████████████████████████| 362 kB 8.1 MB/s 
[K     |████████████████████████████████| 212 kB 77.9 MB/s 
[K     |████████████████████████████████| 140 kB 77.7 MB/s 
[K     |████████████████████████████████| 1.1 MB 55.1 MB/s 
[K     |████████████████████████████████| 127 kB 62.8 MB/s 
[K     |████████████████████████████████| 94 kB 3.8 MB/s 
[K     |████████████████████████████████| 144 kB 71.3 MB/s 
[K     |████████████████████████████████| 271 kB 54.3 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
[?25h

In [9]:
pip install rbo --quiet

In [10]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [11]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [12]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

In [13]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
model_path = 'danielhou13/longformer-finetuned_papers_v2'
#model_path = 'danielhou13/longformer-finetuned-new-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Downloading:   0%|          | 0.00/0.99k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/567M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/694 [00:00<?, ?B/s]

Create functions that give us the input ids and the position ids for the text we want to examine

In [14]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [15]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [164]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 

    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

Import dataset and take a few examples from it for testing purposes

Here we import the papers dataset

In [165]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402dataset")["test"]

Using custom data configuration danielhou13--cogs402dataset-144b958ac1a53abb
Reusing dataset parquet (/root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8)


  0%|          | 0/2 [00:00<?, ?it/s]

Here we import the news dataset

In [129]:
# cogs402_ds2 = load_dataset('hyperpartisan_news_detection', 'bypublisher')['validation']
# val_size = 5000
# val_indices = np.random.randint(0, len(cogs402_ds2), val_size)
# val_ds = cogs402_ds2.select(val_indices)
# labels2 = map(int, val_ds['hyperpartisan'])
# labels2 = list(labels2)
# val_ds = val_ds.add_column("labels", labels2)

In [130]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

Perform Layer Integrated Gradients using the longformer's embeddings

In [131]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    return attributions

In [132]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)

This function will let us get the example and the baseline inputs in order to perform integrated gradients, and add the attributions to our visualization tool. Additionally, we will add the attributions and tokens for each example into an array so we can use them when we want to further example the attributions scores for each example

In [133]:
all_attributions = {}
all_tokens = {}

In [23]:
all_attributions = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict.pt')

In [146]:
example = 976
text = cogs402_ds['text'][example]
label = cogs402_ds['labels'][example]

In [159]:
indices = input_ids[0].detach().tolist()
text = tokenizer.convert_ids_to_tokens(indices)
print(text)

['<s>', 'Published', 'Ġas', 'Ġa', 'Ġconference', 'Ġpaper', 'Ġin', 'ĠInternational', 'ĠConference', 'Ġof', 'ĠComputer', 'ĠVision', 'Ġ(', 'IC', 'CV', ')', 'Ġ2017', 'Ġ', 'ĠSpeaking', 'Ġthe', 'ĠSame', 'ĠLanguage', ':', 'ĠMatch', 'ing', 'ĠMachine', 'Ġto', 'ĠHuman', 'ĠCapt', 'ions', 'Ġby', 'ĠAd', 'vers', 'arial', 'ĠTraining', 'ĠRak', 'sh', 'ith', 'ĠShe', 'tty', '1', 'Ġ', 'ĠMarcus', 'ĠRoh', 'r', 'bach', '2', ',', '3', 'Ġ', 'Ġar', 'X', 'iv', ':', '17', '03', '10', '476', 'v', '2', 'Ġ[', 'cs', 'CV', ']', 'Ġ6', 'ĠNov', 'Ġ2017', 'Ġ', 'ĠMario', 'ĠFritz', '1', 'Ġ1', 'Ġ', 'ĠLisa', 'ĠAnne', 'ĠHendricks', '2', 'Ġ', 'ĠBer', 'nt', 'ĠS', 'chie', 'le', '1', 'Ġ', 'ĠMax', 'ĠPlan', 'ck', 'ĠInstitute', 'Ġfor', 'ĠIn', 'format', 'ics', ',', 'ĠSa', 'ar', 'land', 'ĠIn', 'format', 'ics', 'ĠCampus', ',', 'ĠSa', 'arb', 'ru', 'Ì', 'Ī', 'ck', 'en', ',', 'ĠGermany', 'Ġ2', 'Ġ3', 'ĠUC', 'ĠBerkeley', 'ĠE', 'EC', 'S', ',', 'ĠCA', ',', 'ĠUnited', 'ĠStates', 'ĠFacebook', 'ĠAI', 'ĠResearch', 'Ġ', 'ĠAbstract', 'ĠWhile', 'Ġstro

In [160]:
input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
attention_mask = construct_attention_mask(input_ids)

indices = input_ids[0].detach().tolist()
all_tokens_curr = tokenizer.convert_ids_to_tokens(indices)

all_tokens[str(example)] = all_tokens_curr

1992
1992


In [161]:
print(attention_mask.shape)

torch.Size([1, 1992])


Remove all periods and see the relation

In [162]:
attributions, delta = lig.attribute(inputs=input_ids,
                                  baselines=ref_input_ids,
                                  return_convergence_delta=True,
                                  additional_forward_args=(position_ids, attention_mask),
                                  target=1,
                                  n_steps=1500,
                                  internal_batch_size = 2)

attributions_sum = summarize_attributions(attributions)

all_attributions[str(example)] = attributions_sum.detach().cpu().numpy()

In [30]:
torch.save(all_attributions, '/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict.pt')

In [24]:
# all_attributions = torch.load('/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/example_attrib_dict.pt')

In [92]:
print(all_attributions)

{'605': array([ 0.00000000e+00, -3.40173878e-06, -1.12807776e-06, ...,
        4.23677990e-06,  7.29073352e-06,  0.00000000e+00]), '976': array([ 0.00000000e+00, -1.84040401e-04,  1.46363292e-05, ...,
       -2.48273481e-06,  1.11380563e-04,  0.00000000e+00]), '148': array([ 0.        ,  0.00225466,  0.00335924, ...,  0.00014387,
       -0.000275  ,  0.        ]), '891': array([ 0.00000000e+00, -1.05098504e-03, -2.83239306e-03, ...,
        1.52405500e-06,  9.14739375e-05,  0.00000000e+00]), '589': array([ 0.00000000e+00, -4.46043067e-07, -7.54823196e-07, ...,
        5.76093299e-07,  1.27659739e-05,  0.00000000e+00])}


We then get the attentions and global attentions so we can compare with the attributions

In [163]:
output = model(input_ids.cuda(), attention_mask=attention_mask.cuda(), labels=torch.tensor(label).cuda(), output_attentions = True)
batch_attn = output[-2]
output_attentions = torch.stack(batch_attn).cpu()
global_attention = output[-1]
output_global_attentions = torch.stack(global_attention).cpu()
print("output_attention.shape", output_attentions.shape)
print("gl_output_attention.shape", output_global_attentions.shape)

output_attention.shape torch.Size([12, 1, 12, 1992, 514])
gl_output_attention.shape torch.Size([12, 1, 12, 2048, 1])


Since the longformer has a unique attention matrix shape, we convert it into the required sequence length x sequence length matrix

In [166]:
def create_head_matrix(output_attentions, global_attentions):
    new_attention_matrix = torch.zeros((output_attentions.shape[0], 
                                      output_attentions.shape[0]))
    for i in range(output_attentions.shape[0]):
        test_non_zeroes = torch.nonzero(output_attentions[i]).squeeze()
        test2 = output_attentions[i][test_non_zeroes[1:]]
        new_attention_matrix_indices = test_non_zeroes[1:]-257 + i
        new_attention_matrix[i][new_attention_matrix_indices] = test2
        new_attention_matrix[i][0] = output_attentions[i][0]
        new_attention_matrix[0] = global_attentions.squeeze()[:output_attentions.shape[0]]
    return new_attention_matrix


def attentions_all_heads(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = create_head_matrix(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

def all_batches(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = attentions_all_heads(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

def all_layers(output_attentions, global_attentions):
    new_matrix = []
    for i in range(output_attentions.shape[0]):
        matrix = all_batches(output_attentions[i], global_attentions[i])
        new_matrix.append(matrix)
    return torch.stack(new_matrix)

In [167]:
converted_mat = all_layers(output_attentions, output_global_attentions).detach().cpu().numpy()
print(converted_mat.shape)

(12, 1, 12, 1992, 1992)


We scale the attention matrix by head importance

In [168]:
head_importance = torch.load("/content/drive/MyDrive/cogs402longformer/t3-visapplication/resources/papers/pretrained/head_importance.pt")
# head_importance = torch.load("/content/drive/MyDrive/cogs402longformer/t3-visapplication/resources/news/head_importance.pt")

In [170]:
def scale_by_importance(attention_matrix, head_importance):
  new_matrix = np.zeros_like(attention_matrix)
  for i in range(attention_matrix.shape[0]):
    head_importance_layer = head_importance[i]
    for j in range(attention_matrix.shape[1]):
      new_matrix[i,j] = attention_matrix[i,j] * np.expand_dims(head_importance_layer, axis=(1,2))
  return new_matrix

In [171]:
converted_mat_importance = scale_by_importance(converted_mat, head_importance)

We get the attentions for each token. The shape of the attention matrix is layer x batch x head x seq_len x seq_len.

In [172]:
attention_matrix_importance = converted_mat_importance.sum(axis=3)
print(attention_matrix_importance.shape)

(12, 1, 12, 1992)


Sum the attentions for the last layer and over all layers

In [173]:
attention_final_layer = attention_matrix_importance[11].squeeze().sum(axis=0)
attention_all_layer = attention_matrix_importance.squeeze().sum(axis=1)
attention_all_layer = attention_all_layer.sum(axis=0)
print(attention_all_layer.shape)

(1992,)


Grab the attributions we stored

In [177]:
exam_attrib = all_attributions[str(example)]
exam_attrib = exam_attrib[:len(attention_final_layer)]

In [178]:
len(exam_attrib)

1992

Since we have the attributions and the attentions, we want to see how the largest attributions (in terms of magnitude) compares to the highest attentions.

Cosine similarity using the raw attributions and attentions

In [179]:
from numpy.linalg import norm
cosine_raw = np.dot(exam_attrib, attention_final_layer) / (norm(exam_attrib)*norm(attention_final_layer))
print("Layer 12 Cosine Similarity raw attrib:\n", cosine_raw)
cosine_all_raw = np.dot(exam_attrib, attention_all_layer) / (norm(exam_attrib)*norm(attention_all_layer))
print("Layer 12 Cosine Similarity raw attrib:\n", cosine_all_raw)

Layer 12 Cosine Similarity raw attrib:
 0.033739950406549125
Layer 12 Cosine Similarity raw attrib:
 0.07393596816185204


The attributions and the attentions have different ranges. The attributions could range from -1 to 1 whereas the attentions range from 0 to 1. However, negative attributions would not necessarily mean that they have the lowest attention, rather they might have really high attention as they are more likely to help the model predict the negative class, and might be something the attentions picked up as a feature.

In [180]:
def normalize(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))

In [181]:
attention_final_layer2 = normalize(attention_final_layer)
attention_all_layer2 = normalize(attention_all_layer)

In [182]:
exam_attrib2 = np.abs(exam_attrib)
exam_attrib2 = normalize(exam_attrib2)

In [183]:
print(exam_attrib2)

[0.         0.00342258 0.00068077 ... 0.01669862 0.16723352 0.        ]


Calculate cosine simularity using normalized attentions and attributions

In [184]:
cosine = np.dot(exam_attrib2, attention_final_layer2) / (norm(exam_attrib2)*norm(attention_final_layer2))
print("Layer 12 Cosine Similarity:\n", cosine)
cosine2 = np.dot(exam_attrib2, attention_all_layer2) / (norm(exam_attrib2)*norm(attention_all_layer2))
print("All layer Cosine Similarity:\n", cosine2)

Layer 12 Cosine Similarity:
 0.07014069762403952
All layer Cosine Similarity:
 0.08507612833033591


Cosine similarity while setting all the attention and attribution values below the median to 0

In [185]:
exam_attrib3 = np.abs(exam_attrib)
exam_attrib3 = normalize(exam_attrib3)
median_exam = np.percentile(exam_attrib3, 50)
exam_attrib3[exam_attrib3 < median_exam] = 0

In [186]:
attention_final_layer3 = np.copy(attention_final_layer)
attention_final_layer3 = normalize(attention_final_layer3)
median_12 = np.percentile(attention_final_layer3, 50)
attention_final_layer3[attention_final_layer3 < median_12] = 0

attention_all_layer3 = np.copy(attention_all_layer) 
attention_all_layer3 = normalize(attention_all_layer3)
median_all = np.percentile(attention_all_layer3, 50)
attention_all_layer3[attention_all_layer3 < median_all] = 0

In [187]:
cosine_med = np.dot(exam_attrib3, attention_final_layer3) / (norm(exam_attrib3)*norm(attention_final_layer3))
print("Layer 12 Cosine Similarity med:\n", cosine_med)
cosine_med2 = np.dot(exam_attrib3, attention_all_layer3) / (norm(exam_attrib3)*norm(attention_all_layer3))
print("All layer Cosine Similarity med:\n", cosine_med2)

Layer 12 Cosine Similarity med:
 0.06648542682627386
All layer Cosine Similarity med:
 0.07920069895416489


Cosine similarity while setting all the attention and attribution values below the mean to 0

In [188]:
exam_attrib4 = np.abs(exam_attrib)
exam_attrib4 = normalize(exam_attrib4)
mean_exam = np.mean(exam_attrib4)
exam_attrib4[exam_attrib4 < mean_exam] = 0

In [189]:
attention_final_layer4 = np.copy(attention_final_layer)
attention_final_layer4 = normalize(attention_final_layer4)
mean_12 = np.mean(attention_final_layer4)
attention_final_layer4[attention_final_layer4 < mean_12] = 0

attention_all_layer4 = np.copy(attention_all_layer) 
attention_all_layer4 = normalize(attention_all_layer4)
mean_all = np.mean(attention_all_layer4)
attention_all_layer4[attention_all_layer4 < mean_all] = 0

In [190]:
cosine_mean = np.dot(exam_attrib4, attention_final_layer4) / (norm(exam_attrib4)*norm(attention_final_layer4))
print("Layer 12 Cosine Similarity mean:\n", cosine_mean)
cosine_mean2 = np.dot(exam_attrib4, attention_all_layer4) / (norm(exam_attrib4)*norm(attention_all_layer4))
print("All layer Cosine Similarity mean:\n", cosine_mean2)

Layer 12 Cosine Similarity mean:
 0.05360013751747135
All layer Cosine Similarity mean:
 0.06928756608043356


Cosine Similarity using the ranks of each token

In [191]:
exam_attrib_rank = np.abs(exam_attrib)
order_attrib = exam_attrib_rank.argsort()
print(order_attrib)
ranks_attrib = order_attrib.argsort()
print(ranks_attrib)

[   0 1991 1002 ... 1655 1552 1786]
[   0 1392  556 ... 1811 1978    1]


In [192]:
attention_final_layer_rank = np.copy(attention_final_layer)
order = attention_final_layer_rank.argsort()
ranks = order.argsort()

attention_all_layer_rank = np.copy(attention_all_layer)
order2 = attention_all_layer_rank.argsort()
ranks2 = order2.argsort()

In [193]:
cosine_rank = np.dot(ranks_attrib, ranks) / (norm(ranks_attrib)*norm(ranks))
print("Layer 12 Cosine Similarity rank:\n", cosine_rank)
cosine_rank2 = np.dot(ranks_attrib, ranks2) / (norm(ranks_attrib)*norm(ranks2))
print("All layer Cosine Similarity rank:\n", cosine_rank2)

Layer 12 Cosine Similarity rank:
 0.7912035844599294
All layer Cosine Similarity rank:
 0.8065736198082651


Try the kenall tau metric and the RBO metric

In [194]:
import scipy.stats as stats
tau, p_value = stats.kendalltau(ranks_attrib, ranks)
print("Tau statistic layer 12:", tau, "p value", p_value)
tau, p_value = stats.kendalltau(ranks_attrib, ranks2)
print("Tau statistic: all layers", tau, "p value", p_value)

Tau statistic layer 12: 0.11012306382738388 p value 1.7579907258592896e-13
Tau statistic: all layers 0.1512841925209628 p value 4.5390589063496066e-24


In [195]:
import rbo
print("rbo layer 12", rbo.RankingSimilarity(order_attrib, order).rbo())
print("rbo all", rbo.RankingSimilarity(order_attrib, order2).rbo())

rbo layer 12 0.5364390419580304
rbo all 0.5433868574744053


The cosine similarity using only the last layer of attentions

In [196]:
d = {'example': [example], 'similarity normalized': [cosine], 'similarity raw': [cosine_raw], 'sim_norm w/ median threshold': [cosine_med], 'sim_norm w/ mean threshold': [cosine_mean], "sim w/ ranks":[cosine_rank]}
df = pd.DataFrame(data=d)
df

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,976,0.070141,0.03374,0.066485,0.0536,0.791204


The cosine similarity using all layers

In [197]:
d2 = {'example': [example], 'similarity normalized': [cosine2], 'similarity raw': [cosine_all_raw], 'sim_norm w/ median threshold':[cosine_med2], 'sim_norm w/ mean threshold':[cosine_mean2], "sim w/ ranks":[cosine_rank2]}
df2 = pd.DataFrame(data=d2)
df2

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,976,0.085076,0.073936,0.079201,0.069288,0.806574


In [231]:
df_layer12 = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_layer12.csv")
df_all = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_all.csv")

In [232]:
df_layer12

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,148,0.147101,-0.02026,0.14267,0.123295,0.826202
1,589,0.396823,-0.189594,0.392803,0.383813,0.791068
2,605,0.299375,-0.165507,0.288221,0.271128,0.802063
3,891,0.175846,0.022566,0.173801,0.157258,0.842796
4,976,0.141923,0.082403,0.137433,0.119868,0.823112


In [233]:
df_all

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,148,0.140745,0.016048,0.134218,0.123968,0.805108
1,589,0.180767,-0.119204,0.170673,0.159155,0.807287
2,605,0.249735,-0.219944,0.233159,0.219811,0.784505
3,891,0.104379,-0.012725,0.100741,0.089762,0.799092
4,976,0.11527,0.096755,0.109309,0.097665,0.819613


Append the new row into the dataframe.

Comment out if revisiting a dataframe.

In [201]:
df_layer12 = pd.concat([df, df_layer12], axis=0)
df_all = pd.concat([df2, df_all], axis=0)

In [202]:
df_layer12

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,976,0.070141,0.03374,0.066485,0.0536,0.791204
0,148,0.147101,-0.02026,0.14267,0.123295,0.826202
1,589,0.396823,-0.189594,0.392803,0.383813,0.791068
2,605,0.299375,-0.165507,0.288221,0.271128,0.802063
3,891,0.175846,0.022566,0.173801,0.157258,0.842796
4,976,0.141923,0.082403,0.137433,0.119868,0.823112


In [203]:
df_all

Unnamed: 0,example,similarity normalized,similarity raw,sim_norm w/ median threshold,sim_norm w/ mean threshold,sim w/ ranks
0,976,0.085076,0.073936,0.079201,0.069288,0.806574
0,148,0.140745,0.016048,0.134218,0.123968,0.805108
1,589,0.180767,-0.119204,0.170673,0.159155,0.807287
2,605,0.249735,-0.219944,0.233159,0.219811,0.784505
3,891,0.104379,-0.012725,0.100741,0.089762,0.799092
4,976,0.11527,0.096755,0.109309,0.097665,0.819613


In [204]:
df_layer12 = df_layer12.sort_values(by=['example'])
df_all = df_all.sort_values(by=['example'])

Save the dataframe

In [230]:
# df_layer12.to_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_layer12.csv", index=False)
# df_all.to_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/cos_sim_all.csv", index=False)

We know from the cosine similarities that it does not seem like the attribtions and the attentions are very similar; however, we can find out if there are similarities in the tokens in the highest percentiles.


In [206]:
attention_final_layer5 = np.copy(attention_final_layer)
attention_final_layer5 = normalize(attention_final_layer5)

attention_all_layer5 = np.copy(attention_all_layer) 
attention_all_layer5 = normalize(attention_all_layer5)

exam_attrib5 = np.abs(exam_attrib)
exam_attrib5 = normalize(exam_attrib5)
print(exam_attrib5)

[0.         0.00342258 0.00068077 ... 0.01669862 0.16723352 0.        ]


In [207]:
top_final = np.percentile(attention_final_layer5, 95)
top_all = np.percentile(attention_final_layer5, 95)
top_attrib = np.percentile(exam_attrib5, 95)
print(top_attrib)

0.02558814957302181


In [208]:
attention_final_layer5[attention_final_layer5<top_final] = 0
attention_all_layer5[attention_all_layer5<top_all] = 0
exam_attrib5[exam_attrib5<top_attrib] = 0

In [209]:
print(exam_attrib5)

[0.         0.         0.         ... 0.         0.16723352 0.        ]


In [210]:
cosine_thresh = np.dot(exam_attrib5, attention_final_layer5) / (norm(exam_attrib5)*norm(attention_final_layer5))
print("Layer 12 Cosine Similarity 95th:\n", cosine_thresh)
cosine_thresh2 = np.dot(exam_attrib5, attention_all_layer5) / (norm(exam_attrib5)*norm(attention_all_layer5))
print("All layer Cosine Similarity 95th:\n", cosine_thresh2)

Layer 12 Cosine Similarity 95th:
 0.04029479960949007
All layer Cosine Similarity 95th:
 0.022680997886953675


In [211]:
num = 2048 -np.ceil(2048 * 0.95)
exam_attrib_rank2 = np.copy(ranks_attrib)
exam_attrib_rank2[exam_attrib_rank2 > num] = 0

attention_final_layer_rank2 = np.copy(ranks)
attention_final_layer_rank2[attention_final_layer_rank2 > num] = 0

attention_all_layer_rank2 = np.copy(ranks2)
attention_all_layer_rank2[attention_all_layer_rank2 > num] = 0

In [212]:
cosine_rank_top = np.dot(exam_attrib_rank2, attention_final_layer_rank2) / (norm(exam_attrib_rank2)*norm(attention_final_layer_rank2))
print("Layer 12 Cosine Similarity 95th ranks:\n", cosine_rank_top)
cosine_rank_top2 = np.dot(exam_attrib_rank2, attention_all_layer_rank2) / (norm(exam_attrib_rank2)*norm(attention_all_layer_rank2))
print("All layer Cosine Similarity 95th ranks:\n", cosine_rank_top2)

Layer 12 Cosine Similarity 95th ranks:
 0.045991837416946416
All layer Cosine Similarity 95th ranks:
 0.03154712986307476


Try RBO on the 95th percentile

In [213]:
exam_attrib_order2 = np.copy(order_attrib)

attention_final_layer_order2 = np.copy(order)

attention_all_layer_order2 = np.copy(order2)

In [214]:
print("rbo layer 12 95th", rbo.RankingSimilarity(exam_attrib_order2[:int(num)], attention_final_layer_order2[:int(num)]).rbo())
print("rbo all 95th", rbo.RankingSimilarity(exam_attrib_order2[:int(num)], attention_all_layer_order2[:int(num)]).rbo())

rbo layer 12 95th 0.019666899298905804
rbo all 95th 0.010179946468327447


In [215]:
attention_final_layer_top = np.flatnonzero(attention_final_layer5)
attention_final_layer_top = set(attention_final_layer_top)

attention_all_layer_top = np.flatnonzero(attention_all_layer5)
attention_all_layer_top = set(attention_all_layer_top)

exam_attrib_top = np.flatnonzero(exam_attrib5)
exam_attrib_top = set(exam_attrib_top)
print(exam_attrib_top)

{1537, 1550, 1552, 1554, 1555, 1556, 1558, 1559, 1560, 1565, 1609, 1612, 1614, 1615, 1622, 1623, 1625, 1626, 1633, 1641, 1642, 1643, 620, 1644, 1752, 1652, 1753, 1654, 1655, 1656, 1658, 1660, 1673, 1678, 1688, 1689, 1694, 1705, 1718, 1719, 1720, 1721, 1741, 1742, 1743, 1744, 1745, 1747, 1748, 211, 213, 214, 215, 216, 217, 1755, 1756, 1757, 1758, 218, 1760, 1754, 1759, 1762, 1763, 1772, 1780, 1782, 1783, 1784, 1785, 1786, 1787, 1788, 1789, 1791, 1803, 1804, 1830, 1329, 1847, 1858, 1862, 1876, 1915, 1457, 1470, 1984, 1990, 1489, 1490, 1493, 1496, 1498, 1500, 1501, 1502, 1505, 1506, 1507}


Grab the tokens stored in the all tokens dictionary so we can know which tokens we are working with as we currently only have the indices.

In [216]:
exam_tokens = all_tokens[str(example)]

In [217]:
attention_final_layer5 = np.copy(attention_final_layer)
attention_final_layer5 = normalize(attention_final_layer5)

attention_all_layer5 = np.copy(attention_all_layer) 
attention_all_layer5 = normalize(attention_all_layer5)

exam_attrib5 = np.abs(exam_attrib)
exam_attrib5 = normalize(exam_attrib5)
print(exam_attrib5)

[0.         0.00342258 0.00068077 ... 0.01669862 0.16723352 0.        ]


Find out which tokens have the highest attentions but not the highest attributions

In [218]:
diff = sorted(list(attention_final_layer_top - exam_attrib_top))
print(len(diff))
diff_tokens = [exam_tokens[idx] for idx in diff]
d_diff = {"token": diff_tokens, "position":diff, "attention_norm":attention_final_layer5[diff], "attribution_norm":exam_attrib5[diff]}
df_diff = pd.DataFrame(d_diff)
df_diff

85


Unnamed: 0,token,position,attention_norm,attribution_norm
0,<s>,0,0.042455,0.000000
1,ĠLanguage,22,0.027986,0.006094
2,ĠAI,125,0.047689,0.013430
3,ĠWhile,129,0.809926,0.007571
4,Ġimage,136,0.030243,0.003874
...,...,...,...,...
80,Ġlinguistic,1540,0.039078,0.022086
81,],1629,0.542790,0.011864
82,].,1779,0.667832,0.014789
83,ĠThe,1923,0.546320,0.000191


In [219]:
print(df_diff['token'].value_counts())

Ġcaption            13
].                   7
Ġtraining            6
Ġlanguage            3
Ġhuman               3
arial                3
ing                  3
Ġadvers              3
Ġvisual              3
Ġhumans              2
truth                2
Ġthe                 2
Ġword                2
Ġgenerated           2
Ġimage               2
Ġtranslation         1
Ġdeep                1
Ġon                  1
Ġrecognizing         1
Ġlinguistic          1
]                    1
ĠNetworks            1
Ġtrained             1
ĠIn                  1
Ġchallenge           1
ĠCaption             1
ĠThe                 1
Ġevaluation          1
<s>                  1
Ġperformance         1
Ġrepresentations     1
Ġmedia               1
,                    1
ĠLanguage            1
Ġskiing              1
Ġlearning            1
Ġwritten             1
Ġgenerator           1
Ġtask                1
Ġvocabulary          1
ĠThis                1
ĠWhile               1
ĠAI                  1
Ġlearns    

Find out which tokens have the highest attributions but not the highest attentions

In [220]:
diff2 = sorted(list(exam_attrib_top - attention_final_layer_top))
print(len(diff))
diff_tokens2 = [exam_tokens[idx] for idx in diff2]
d_diff2 = {"token": diff_tokens2, "position":diff2, "attention_norm": attention_final_layer5[diff2], "attribution_norm":exam_attrib5[diff2]}
df_diff2 = pd.DataFrame(d_diff2)
df_diff2

85


Unnamed: 0,token,position,attention_norm,attribution_norm
0,Ġthese,213,0.005645,0.075760
1,Ġchallenges,214,0.010964,0.045072
2,",",215,0.009625,0.080432
3,Ġwe,216,0.011107,0.139791
4,Ġchange,217,0.002361,0.056875
...,...,...,...,...
80,Ġdiscrim,1862,0.004760,0.031331
81,Ġhuman,1876,0.013822,0.036298
82,Ġhuman,1915,0.011404,0.032828
83,Ġtraining,1984,0.011231,0.028771


In [221]:
print(df_diff2['token'].value_counts())

Ġtraining       8
Ġmodel          4
Ġevaluation     3
Ġmetrics        3
Ġthe            3
Ġhuman          3
Ġword           2
Ġdirectly       2
Ġbias           2
Ġsampling       2
Ġa              2
Ġmethod         2
Ġwe             2
,               2
Ġoptimize       1
Ġwith           1
Ġusing          1
Ġbased          1
Ġapproaches     1
Ġhave           1
Ġto             1
ĠSeveral        1
Ġthese          1
Ġoptimizing     1
Ġand            1
ĠHowever        1
Ġstandard       1
Ġdoes           1
Ġnot            1
Ġaddress        1
Ġdiversity      1
Ġscores         1
Ġnetwork        1
Ġdiscrim        1
Ġmaximum        1
Ġbegins         1
Ġby             1
Ġmost           1
Ġchange         1
Ġachieves       1
Ġloss           1
Ġas             1
Ġafter          1
Ġhas            1
Ġstudied        1
Ġdescription    1
Ġmodels         1
Ġcommon         1
Ġdata           1
Ġpredicted      1
Ġonly           1
Ġsuffers        1
Ġduring         1
Ġscheme         1
Ġchallenges     1
Ġwords    

Find out which tokens are part of the highest attentions and highest attributions.

In [222]:
same = sorted(list(attention_final_layer_top & exam_attrib_top))
print(len(same))
same_tokens = [exam_tokens[idx] for idx in same]
d_same = {"token": same_tokens, "position":same, "attention_norm": attention_final_layer5[same], "attribution_norm":exam_attrib5[same]}
df_same = pd.DataFrame(d_same)
df_same

15


Unnamed: 0,token,position,attention_norm,attribution_norm
0,ĠTo,211,0.975158,0.058848
1,Ġlearning,1329,0.04707,0.046033
2,Ġlearning,1493,0.048585,0.072477
3,].,1550,0.536302,0.071381
4,Ġtraining,1552,0.029093,0.668442
5,Ġexposure,1625,0.025809,0.028566
6,Ġlearn,1633,0.033742,0.058384
7,ĠTo,1644,0.551842,0.043236
8,Ġtraining,1655,0.02939,0.488802
9,Ġexposure,1719,0.02996,0.04367


In [223]:
print(df_same['token'].value_counts())

Ġlearning         3
Ġtraining         3
ĠTo               2
Ġexposure         2
].                1
Ġlearn            1
Ġlikelihood       1
Ġworks            1
Ġreinforcement    1
Name: token, dtype: int64


In [224]:
def jaccard_similarity(set1, set2):
    intersection = len(list(set1.intersection(set2)))
    print(intersection)
    union = (len(set1) + len(set2)) - intersection
    print(union)
    return float(intersection) / union

In [225]:
jaccard_similarity(attention_final_layer_top, exam_attrib_top)

15
185


0.08108108108108109