This notebook adapts the [Captum tutorial for question answering](https://captum.ai/tutorials/Bert_SQUAD_Interpret) and refactors it into the longformer sequence classification task. Specifically, this notebook focuses on using the model's embeddings to get token attributions for the examples of your choice, or the entire dataset if needed. By doing so, we can visualize which tokens have the most influence in the model's prediction, and find out the k tokens with the most influence at helping the model predict correctly as well as incorrectly.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Import dependencies

In [None]:
pip install transformers --quiet

In [None]:
pip install captum --quiet

In [None]:
pip install datasets --quiet

In [None]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [None]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

Here we are importing the model and tokenizer and letting the model use our GPU to run. Please change model path, and tokenizer to whichever one you wish to use.

In [None]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model

# load model
test = torch.load("/content/drive/MyDrive/cogs402longformer/fakeclinicalnotes/models/full_augmented_lr2e-5_dropout3_10_trained_threshold.pt")
model = LongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096', state_dict=test['state_dict'], num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Some weights of the model checkpoint at allenai/longformer-base-4096 were not used when initializing LongformerForSequenceClassification: ['dense.weight', 'longformer_model.encoder.layer.10.attention.output.LayerNorm.weight', 'longformer_model.encoder.layer.7.attention.output.dense.bias', 'longformer_model.encoder.layer.4.attention.self.query.bias', 'longformer_model.encoder.layer.0.attention.self.key.weight', 'longformer_model.encoder.layer.9.attention.self.query.weight', 'longformer_model.encoder.layer.5.attention.self.key_global.bias', 'longformer_model.encoder.layer.6.attention.self.query.weight', 'longformer_model.encoder.layer.3.attention.self.key_global.weight', 'longformer_model.encoder.layer.10.attention.self.query.weight', 'longformer_model.encoder.layer.4.output.dense.weight', 'longformer_model.encoder.layer.10.output.dense.weight', 'longformer_model.encoder.layer.3.intermediate.dense.bias', 'longformer_model.encoder.layer.9.intermediate.dense.bias', 'longformer_model.encode

Create functions that give us the input ids and the position ids for the text we want to examine along with the baselines for integrated gradients.

In [None]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [None]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):
    text = text.lower()
    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    position_ids = position_ids[:, :seq_length]
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

### Import Dataset

Here we import the papers dataset

In [None]:
from datasets import load_dataset
import numpy as np
# cogs402_ds = load_dataset("danielhou13/cogs402datafake")["train"]

ds = pd.read_csv("/content/drive/MyDrive/cogs402longformer/fakeclinicalnotes/data/fake_notes.csv")
dataset = datasets.Dataset.from_pandas(ds)
cogs402_ds = dataset.train_test_split(test_size=0.20)
cogs402_ds = cogs402_ds['train']



  0%|          | 0/1 [00:00<?, ?it/s]

Here we import the news dataset

In [None]:
# cogs402_ds = load_dataset("danielhou13/cogs402dataset2")["validation"]

## Getting the Attributions

A custom forward function that returns the softmaxed logits, which are the class probabilities that the model uses for prediction.

In [None]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [None]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

A helper function to summarize attributions for each word token in the sequence.

In [None]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.linalg.norm(attributions)
    return attributions

Perform Layer Integrated Gradients using the longformer's embeddings.

In [None]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)

This function will let us get the example and the baseline inputs in order to perform integrated gradients, and add the attributions to our visualization tool. Additionally, we will add the attributions and tokens for each example into an array so we can use them when we want to further examine the attributions scores for each example. More information about the integrated gradients function can be found [here](https://captum.ai/api/layer.html#layer-integrated-gradients).

In [None]:
vis_data_records = []
all_attributions = {}
all_tokens = {}
all_deltas = {}

In [None]:
# Takes in dataset and example number
def get_token_attributions(dataset, example):
  text = dataset['text'][example]
  label = dataset['labels'][example]
  # get the inputs, position ids, attention mask, and the baselines
  input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
  position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
  attention_mask = construct_attention_mask(input_ids)

  #get the tokens
  indices = input_ids[0].detach().tolist()
  all_tokens_curr = tokenizer.convert_ids_to_tokens(indices)
  all_tokens[str(example)] = all_tokens_curr

  #perform integrated gradients
  attributions, delta = lig.attribute(inputs=input_ids,
                                    baselines=ref_input_ids,
                                    return_convergence_delta=True,
                                    additional_forward_args=(position_ids, attention_mask),
                                    target=1,
                                    n_steps=500,
                                    internal_batch_size = 2)

  # We want one value for every token.
  attributions_sum = summarize_attributions(attributions)

  # store the values in our dictionary
  all_attributions[str(example)] = attributions_sum
  all_deltas[str(example)] = attributions_sum

  # get the score for our visualization
  score = predict(input_ids, position_ids, attention_mask)

  all_tokens_curr = [x.replace('Ġ', '') for x in all_tokens_curr]
  # storing couple samples in an array for visualization purposes
  # requires array of attributions, prediction score, predicted class, true class 
  # the label you want your attributions to associate positive with, the attribution score
  # the tokens, and the delta if you have it.
  vis_data_records.append(viz.VisualizationDataRecord(
                        attributions_sum,
                        torch.softmax(score, dim = 1).max(),
                        torch.argmax(torch.softmax(score, dim = 1)),
                        label,
                        str(1),
                        attributions_sum.sum(),       
                        all_tokens_curr,
                        delta)
  )

Here we are taking some examples from the Papers datasets.

In [None]:
get_token_attributions(cogs402_ds, 7)

This function allows us to display our attributions in a manner that is easy to read. We can see the attributions of the word overlayed on top of their respective token. The green colour represents positive attributions (i.e. the model is attributing this token to influential for predicting the positive class) while the red colour represents negative attributions. 

In [None]:
# # storing couple samples in an array for visualization purposes
# score_vis = viz.VisualizationDataRecord(
#                         attributions_sum,
#                         torch.softmax(score, dim = 1).max(),
#                         torch.argmax(torch.softmax(score, dim = 1)),
#                         label,
#                         str(1),
#                         attributions_sum.sum(),       
#                         all_tokens,
#                         delta)

print('\033[1m', 'Visualization For Score', '\033[0m')
_ = viz.visualize_text(vis_data_records)

[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.54),1.0,-3.46,"#s bc mental health centre and substance abuse ĉ patient loc - sv c : c pe - pm outpatient psychiatry clinic discharge summary admitted : ĉ [ ** 2020 - 12 - 29 ** ] ( dd / mm / yy yy ) discharged : ĉ ( dd / mm / yy yy ) date of admission : [ ** 2020 - 12 - 29 ** ]. date of discharge : [ ** 2020 - 12 - 30 ** ]. admission medications : 1 . ĉ ris per id one 1 mg daily . 2 . ĉ flu v ox amine 125 mg daily . 3 . ĉ d oxy cycl ine 100 mg daily . 4 . ĉ fer am ax 300 mg daily . 5 . ĉ mel atonin 10 mg nightly . 6 . ĉ as needed : ben ad ry l . discharge medications : no change to any medications except : 1 . ĉ ris per id one will be slightly increased by 0 . 125 mg every week for a target dose of 1 mg p . o . b . i . d . 2 . ĉ ben z trop ine has been added as a p . r . n . medication in case of any extrap y ram idal side effects . 3 . ĉ d oxy cycl ine , fer am ax , mel atonin , and flu v ox amine are unchanged . admission diagnosis autistic spectrum disorder , borderline intellectual disability , aggression in the context of previous . discharge diagnosis autistic spectrum disorder , borderline intellectual disability , aggression in the context of previous . course in hospital please see the excellent consultation note by dr . [ ** first name 4 ( name pattern 1 ) 1 ** ] [ ** last name ( name pattern 1 ) 2 ** ] regarding the admission of no or ayne lad ha . i have known no or ayne from previous consultations in the emergency department , and this admission summary was very helpful in catching me up to the things that have recently happened . no or ayne had a very short stay on the cape unit which was un event ful from any safety events . briefly , no or ayne was admitted to hospital for endorsing suicidal ide ation in the context of conflict with his family , who had removed equipment that he usually uses for entertainment ( n intendo d s , computer ) after he broke a laptop in frustration . during the next day , he became very upset with respect to his loss of privileges , had to be removed from school after getting very upset , and then when he was at home he indicated that he wanted to hurt himself . mother drove him to hospital . no or ayne spent the evening sleeping , and was very easy to be interviewed on the next day of admission . meeting with no or ayne , it was clear that he had returned to his baseline . he had perceived some sl ights with respect to his mom taking away his electronics , but he admitted that he took things too far and felt like he said things that he did not mean . he aut istically explained many problems that he has with the world , and on the face of it these are concerning things to be said . for example , he believes that the world would be better without any children . he believes that the world would be better if someone would kill [ ** first name 4 ( name pattern 1 ) 3 ** ] [ ** last name ( name pattern 1 ) 4 ** ]. he believes that if there were no banks or money , that people would be happier . he also makes misogyn istic and racial comments , sharing with me that he wished that there were "" no women , and no black people ."" he says these very matter of fact ly , and i do not believe his intention is to create any offense , but it is clear that by expressing these things he is going to create significant concern for people around him . i shared that with him , and he admitted that he should not say those things out loud . i believe that in his autistic world , he is very influenced by his online hang outs . he is particularly interested in two internet web sites , reddit , and 4 [ ** last name ( un ) 5 ** ], both places where if one wants to , they can descend into a world of significant misogyny , racism , and hatred . i believe that no or ayne is very influenced by things that he reads online , and is very powerfully captured by funny things such as the means or jokes , even at the expense of misogyny or racism . i shared with no or ayne that he needs to make sure that he is looking at things that are appropriate and remembering that people online can be manipulative . no or ayne superf ic ially accepts this but also believes that he is part of the group and believes that he could lead "" a revolution against the world ."" no or ayne admits that he should not have been aggressive towards his mother and should not have threatened suicide . he is no longer suicidal . he feels like going home is appropriate , and felt like he had no difficulties with being discharged today . he wanted to be a part of any family planning meetings , and was receptive to the idea of me meeting with his parents first . when parents came in for a meeting , they shared that his behavior had declined since march , with a reduction of ris per id one from a higher dose which caused significant o cul ogy ric events , down to 1 mg which has not led to any o cul ogy ric events but has been less helpful with respect to containing his aggression . we explored options with respect to treating this and parents selected the treatment plan below , which was our recommended treatment plan . parents had no concerns in taking no or ayne home and felt that he had returned to baseline . they were appreci ative of his short stay on the unit but they do miss him and the unit was active with respect to distress by other patients , so they were worried about the influence of fear on no or ayne . they elected to take him home on discharge . no or ayne was discharged easily with no complications on [ ** 2020 - 12 - 30 ** ], at approximately 2 p . m . impression if feels like no or ayne was doing better at 2 mg of ris per id one but he was having events that were quite convinc ingly o cul ogy ric crises . these events are dangerous with respect to their connection to other central dy ston ias , which can include l ary ng osp asm . for this reason , it is very important to weigh the o cul ogy ric crisis ' significant negative with respect to ris per id one . at the same time , the ris per id one was restart ed from a switch from a rip ip raz ole at a high dose and ramp ed up very quickly , and with his previous good response to ris per id one and his lack of any dy ston ia today , we felt that it was appropriate to try and increase his ris per id one gradually . another option that was considered was to keep his ris per id one exactly where it was , but to add cl on idine . in the pursuit of not adding too many medications together , he is already on quite a list , we elected to do a cautious re tit ration of ris per id one , watching out for any o cul ogy ric events . with respect to his behavior and language , it will always be shocking and in his autistic world , he is not causing any offence by his statements . he comes across as legitimately a charming person , but when you delve into his thinking it is clear that he is very black and white , and he has been heavily influenced by racial and misogyn istic posts online . that all being said , he treats people with respect and says the right thing when he knows he should . some of his more outlandish statements are very difficult to digest , but i believe that there is no current evidence of any significant violence towards others or his threats that he has mentioned to other people do not seem to have significant weight behind them . i know that he has lost school standing and had to switch schools because of a threat towards principal , and he says some things such as "" w anting to kill all children which are obviously unsett led . at the same time , his autistic spectrum disorder is not treat able , and he is well contained and responds very well to a behavioral approach . he holds himself to a very high standard and i believe his greatest risks are when he is frustrated doing self injury or attempting to el ope . i do not believe that he is at risk for hom icidal acting out , either injuring others or trying to kill others . when it comes to his suicide risk , his autism and borderline i q are protective factors . that being said , in frustration i could see him hurting himself . for this reason , frustration tolerance is one of our biggest goals , which i hope that the medication changes will accommodate . as well , i have encouraged parents not to try too many behavioral things right now while he is clearly unst ead y . for all these reasons , i felt that discharge from hospital was appropriate , there is a chronic risk of hurting himself , and he has previously made threats against others , however i do not believe that these threats or violent parameters will change with any in patient treatment , and a gradual tit ration of ris per id one is most appropriately done as an outpatient . watching no or ayne react so negatively to emotion of the unit ( another patient became quite distressed ) was also quite convincing that the hospital ization was relatively traumatic for no or ayne . with parents being on board with the safety plan , demonstrating excellent judgment with respect to managing no or ayne , and no or ayne 's willingness to try to take things a little bit easier and try a new medication , discharge was appropriate . treatment plan 1 . ĉ incre ase gradually ris per id one by 0 . 125 mg every week to a target dose of 2 mg ( 1 mg b . i . d .). 2 . ĉ mother will keep look out for any o cul ogy ric events , if they happen , she has benz trop ine on order to be able to give him . if the benz trop ine does not work i have instructed her #/s"
,,,,


## Further Examination of the Attributions

Next we might want to look in-depth about the attribution scores for each token of an example. We saved the attributions for the examples we looked at above, so we can easily retrieve the attributions. We also grab the examples because we want to know what tokens the attributions are associated with.

Both lists are of shape: (seq_len)

In [None]:
example = 7
attributions_sum = all_attributions[f"{example}"]
all_tokens2 = all_tokens[f"{example}"]

These functions return which words had the strongest (most positive and most negative) attributions. Change the number of tokens you wish to visualize for your needs. It takes in the attributions and the tokens we grabbed in the previous cell and returns 3 lists: the topk (or bottomk) attributions, their respective token and their position.

Note: Remember that the attributions are with respect to the positive class, so the most impact tokens that helped the model predict the negative class will be in the botk attributed tokens.

In [None]:
def get_topk_attributed_tokens(attrs, all_tokens, k=20):
    values, indices = torch.topk(attrs, k)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

In [None]:
def get_botk_attributed_tokens(attrs, all_tokens, k=20):
    values, indices = torch.topk(attrs, k, largest=False)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

Convert the values, index of the values, and the token into a pandas Dataframe for visualization. It will be sorted by highest value for attributions to lowest. Alternatively, if youre looking for the most negative attributions, it goes from lowest to highest.

In [None]:
top_words_start, top_words_val_start, top_word_ind_start = get_topk_attributed_tokens(attributions_sum.cpu(), all_tokens2)
bot_words_start, bot_words_val_start, bot_word_ind_start = get_botk_attributed_tokens(attributions_sum.cpu(), all_tokens2)

df_high = pd.DataFrame({'Word': top_words_start, 'index':top_word_ind_start, 'attribution': top_words_val_start})

df_low = pd.DataFrame({'Word': bot_words_start, 'index':bot_word_ind_start, 'attribution': bot_words_val_start})
# df_start.style.apply(['cell_ids: False'])

# ['{}({})'.format(token, str(i)) for i, token in enumerate(all_tokens)]

Here we display our top k positively and negatively attributed tokens for our example.

In [None]:
df_high['Word'] = df_high['Word'].str.replace('Ġ', '')
df_high

Unnamed: 0,Word,index,attribution
0,family,430,0.063025
1,threat,1619,0.061966
2,ben,214,0.060865
3,name,331,0.057714
4,you,1502,0.057316
5,discharged,1205,0.056924
6,name,318,0.056194
7,happen,2018,0.05436
8,1,325,0.054011
9,from,1330,0.053674


In [None]:
df_low['Word'] = df_low['Word'].str.replace('Ġ', '')
df_low

Unnamed: 0,Word,index,attribution
0,plan,1966,-0.078297
1,plan,1122,-0.069454
2,computer,448,-0.069208
3,day,530,-0.068755
4,cycl,248,-0.06653
5,easier,1953,-0.066436
6,diagnosis,268,-0.064512
7,misogyn,1527,-0.061235
8,diagnosis,286,-0.060768
9,would,646,-0.059901


In [None]:
d = {"tokens":all_tokens2, "attribution":attributions_sum[:len(all_tokens2)].cpu()}

We notice that there are many repeating tokens in each example that have different positions. While we might want to know how the position plays into the attributions, if we want to know strictly based on the tokens itself, we can add all the duplicate tokens together to get the aggregate attribution for each token. Therefore, we aggregate the attributions strictly based on token type.

In [None]:
df_attrib = pd.DataFrame(d)
aggregation_functions = {'attribution': 'sum'}
df_new = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_functions)

In [None]:
highest_attrib_tokens = df_new.sort_values(by=['attribution'], ascending=False).reset_index()
highest_attrib_tokens['tokens'] = highest_attrib_tokens['tokens'].str.replace('Ġ', '')
highest_attrib_tokens[:10]

Unnamed: 0,tokens,attribution
0,.,0.678344
1,he,0.606769
2,was,0.605042
3,",",0.569732
4,mg,0.453486
5,in,0.37845
6,,0.346632
7,on,0.299068
8,one,0.281195
9,1,0.280139


In [None]:
lowest_attrib_tokens = df_new.sort_values(by=['attribution']).reset_index()
lowest_attrib_tokens['tokens'] = lowest_attrib_tokens['tokens'].str.replace('Ġ', '')
lowest_attrib_tokens[:10]

Unnamed: 0,tokens,attribution
0,that,-1.42433
1,with,-1.056753
2,and,-0.619182
3,a,-0.586632
4,the,-0.580131
5,his,-0.566687
6,of,-0.559768
7,very,-0.545634
8,is,-0.53829
9,ayne,-0.477179


## Masking the stopwords and non-alpha tokens

There may be some stopwords or punctuations in our top attributed tokens, so now that we have the list of the highest and lowest, we can identify interesting keywords.

In [None]:
import nltk
from transformers import AutoTokenizer
nltk.download('stopwords')
tokenizer2 = AutoTokenizer.from_pretrained('allenai/longformer-base-4096', add_prefix_space=True)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


Downloading tokenizer.json:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
from nltk.corpus import stopwords
all_stopwords = stopwords.words('english')
all_stopwords.append(" ")
stopwords = set(tokenizer2.tokenize(all_stopwords, is_split_into_words =True))
stopwords.update(all_stopwords)
print(stopwords)

{'their', 'Ġas', 'each', 'Ġbeing', 'Ġon', 'Ġbe', 'with', 'Ġis', 'Ġeach', 'Ġthat', "she's", 'are', 'my', 'at', 'through', 'should', 'Ġmy', 'Ġwasn', 'Ġwhy', 'Ġfew', "shouldn't", 'so', 'will', 'me', 'shouldn', 'an', 'if', 'aren', 'in', 'Ġand', 'after', 'can', 'haven', 'shan', 'hers', 'Ġre', "wouldn't", 'over', "that'll", 'off', 'Ġve', 'Ġaren', 'not', 'Ġtheir', 'there', 'Ġonce', "hadn't", 'too', 'Ġmyself', 'has', "'s", 'Ġdid', 'Ġthen', 'him', 'Ġ', 'Ġuntil', 'from', 'it', 'Ġagain', "it's", 'those', 'Ġshe', 'own', 'Ġout', 'Ġmore', 'Ġwill', 'Ġno', 'because', 'that', 'Ġbecause', 'once', 'out', 'during', 'Ġwho', 'Ġo', 'Ġat', "didn't", 'we', 'Ġtheirs', 'Ġisn', "doesn't", 'Ġy', 'them', 'then', 'a', "wasn't", 're', 'up', 'theirs', 'be', 'Ġwhere', 'Ġall', 'Ġthis', 'into', 'Ġwhom', 'Ġor', 'Ġher', 'Ġt', 'both', 'do', 'Ġdoesn', 'Ġhim', 'have', 'this', 'wasn', 'Ġthese', 'yourself', 'how', 'Ġin', 'Ġwhich', "haven't", "shan't", 'Ġother', 'your', 'he', 'Ġwon', 'to', 'Ġhas', 'here', 'doing', 'below', 'Ġthr

Lets see the per-token attributions again

In [None]:
df_high[(df_high['Word'].str.isalpha()) & ~(df_high['Word'].isin(stopwords))][:10].reset_index(drop=True)

Unnamed: 0,Word,index,attribution
0,family,430,0.063025
1,threat,1619,0.061966
2,ben,214,0.060865
3,name,331,0.057714
4,discharged,1205,0.056924
5,name,318,0.056194
6,happen,2018,0.05436
7,aggression,1098,0.053639
8,mg,1993,0.05325
9,take,1948,0.051977


In [None]:
df_low[(df_low['Word'].str.isalpha()) & ~(df_low['Word'].isin(stopwords))][:10].reset_index(drop=True)

Unnamed: 0,Word,index,attribution
0,plan,1966,-0.078297
1,plan,1122,-0.069454
2,computer,448,-0.069208
3,day,530,-0.068755
4,cycl,248,-0.06653
5,easier,1953,-0.066436
6,diagnosis,268,-0.064512
7,misogyn,1527,-0.061235
8,diagnosis,286,-0.060768
9,would,646,-0.059901


Here we have the aggregate attributions for the example

In [None]:
highest_attrib_tokens[(highest_attrib_tokens['tokens'].str.isalpha()) & ~(highest_attrib_tokens['tokens'].isin(stopwords))][:10].reset_index(drop=True)

Unnamed: 0,tokens,attribution
0,mg,0.453486
1,one,0.281195
2,name,0.226609
3,discharge,0.196559
4,significant,0.173714
5,ric,0.172324
6,aggression,0.132329
7,frustration,0.131977
8,discharged,0.127808
9,risk,0.123484


In [None]:
lowest_attrib_tokens[(lowest_attrib_tokens['tokens'].str.isalpha()) & ~(lowest_attrib_tokens['tokens'].isin(stopwords))][:10].reset_index(drop=True)

Unnamed: 0,tokens,attribution
0,ayne,-0.477179
1,ĉ,-0.364867
2,plan,-0.246068
3,would,-0.205324
4,cul,-0.177146
5,context,-0.160468
6,medication,-0.138289
7,daily,-0.130862
8,try,-0.130545
9,things,-0.128669


Using this [notebook](https://colab.research.google.com/drive/1SFXcdDASW09re2L_uADwPKbiXJynqPND?usp=sharing), we can get the files to view the aggregated attributions for the entire dataset for both the positive and negative classes. This means we summed up and averaged the attributions for every instance of any given token throughout the entire dataset (whether or not they have positive or negative attributions).

In [None]:
# df_word = pd.read_csv("/content/drive/MyDrive/cogs402longformer/fakeclinicalnotes/results/notes_attributions/longformer_emb_notes.csv")

Here we see the highest attributions for the positive class, meaning that these tokens have the most influence when the model tries to predict positive. All of these words do have relevence to A.I. related topics.

In [None]:
# df_word[:15]

Here we see the largest attributions for the negative class, meaning that these tokens have the most influence when the model predicts negative.

In [None]:
# df_word[:-15:-1]