This notebook adapts the [Captum tutorial for question answering](https://captum.ai/tutorials/Bert_SQUAD_Interpret) and refactors it into the longformer sequence classification task. Specifically, this notebook focuses on using the model's embeddings to get token attributions for the examples of your choice, or the entire dataset if needed. By doing so, we can visualize which tokens have the most influence in the model's prediction, and find out the k tokens with the most influence at helping the model predict correctly as well as incorrectly.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import dependencies

In [None]:
pip install transformers --quiet

[K     |████████████████████████████████| 4.7 MB 6.8 MB/s 
[K     |████████████████████████████████| 101 kB 10.2 MB/s 
[K     |████████████████████████████████| 6.6 MB 58.0 MB/s 
[K     |████████████████████████████████| 596 kB 91.7 MB/s 
[?25h

In [None]:
pip install captum --quiet

[?25l[K     |▎                               | 10 kB 38.4 MB/s eta 0:00:01[K     |▌                               | 20 kB 17.8 MB/s eta 0:00:01[K     |▊                               | 30 kB 14.6 MB/s eta 0:00:01[K     |█                               | 40 kB 13.3 MB/s eta 0:00:01[K     |█▏                              | 51 kB 6.2 MB/s eta 0:00:01[K     |█▍                              | 61 kB 7.4 MB/s eta 0:00:01[K     |█▋                              | 71 kB 7.8 MB/s eta 0:00:01[K     |█▉                              | 81 kB 7.6 MB/s eta 0:00:01[K     |██                              | 92 kB 8.5 MB/s eta 0:00:01[K     |██▎                             | 102 kB 6.8 MB/s eta 0:00:01[K     |██▌                             | 112 kB 6.8 MB/s eta 0:00:01[K     |██▊                             | 122 kB 6.8 MB/s eta 0:00:01[K     |███                             | 133 kB 6.8 MB/s eta 0:00:01[K     |███▏                            | 143 kB 6.8 MB/s eta 0:00:01[K 

In [None]:
pip install datasets --quiet

[K     |████████████████████████████████| 365 kB 7.6 MB/s 
[K     |████████████████████████████████| 115 kB 80.6 MB/s 
[K     |████████████████████████████████| 141 kB 76.0 MB/s 
[K     |████████████████████████████████| 212 kB 71.2 MB/s 
[K     |████████████████████████████████| 127 kB 86.7 MB/s 
[?25h

In [None]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [None]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

Here we are importing the model and tokenizer and letting the model use our GPU to run. Please change model path, and tokenizer to whichever one you wish to use.

In [None]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model

# load model
test = torch.load("/content/drive/MyDrive/fakeclinicalnotes/models/full_augmented_lr2e-5_dropout3_10_trained_threshold.pt")
model = LongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096', state_dict=test['state_dict'], num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Downloading config.json:   0%|          | 0.00/694 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/570M [00:00<?, ?B/s]

Some weights of the model checkpoint at allenai/longformer-base-4096 were not used when initializing LongformerForSequenceClassification: ['longformer_model.encoder.layer.3.output.LayerNorm.bias', 'longformer_model.encoder.layer.8.output.LayerNorm.weight', 'fc.bias', 'longformer_model.encoder.layer.7.attention.output.dense.weight', 'longformer_model.encoder.layer.10.attention.self.value.bias', 'longformer_model.encoder.layer.1.attention.self.value.bias', 'longformer_model.encoder.layer.7.attention.self.key_global.weight', 'longformer_model.encoder.layer.5.attention.self.query.bias', 'longformer_model.encoder.layer.6.attention.self.query.bias', 'longformer_model.encoder.layer.1.attention.self.query_global.bias', 'longformer_model.encoder.layer.9.attention.self.key_global.bias', 'longformer_model.encoder.layer.10.attention.self.query.weight', 'longformer_model.encoder.layer.0.attention.self.query.weight', 'longformer_model.encoder.layer.10.output.dense.bias', 'longformer_model.encoder.la

Downloading vocab.json:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

Create functions that give us the input ids and the position ids for the text we want to examine along with the baselines for integrated gradients.

In [None]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [None]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    position_ids = position_ids[:, :seq_length]
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

### Import Dataset

Here we import the papers dataset

In [None]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402datafake")["train"]

Downloading:   0%|          | 0.00/613 [00:00<?, ?B/s]

Using custom data configuration danielhou13--cogs402datafake-f5349e6cf83e41d8


Downloading and preparing dataset None/None (download: 59.78 KiB, generated: 114.45 KiB, post-processed: Unknown size, total: 174.22 KiB) to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402datafake-f5349e6cf83e41d8/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/61.2k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402datafake-f5349e6cf83e41d8/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Here we import the news dataset

In [None]:
# cogs402_ds = load_dataset("danielhou13/cogs402dataset2")["validation"]

## Getting the Attributions

A custom forward function that returns the softmaxed logits, which are the class probabilities that the model uses for prediction.

In [None]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [None]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

A helper function to summarize attributions for each word token in the sequence.

In [None]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.linalg.norm(attributions)
    return attributions

Perform Layer Integrated Gradients using the longformer's embeddings.

In [None]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)

This function will let us get the example and the baseline inputs in order to perform integrated gradients, and add the attributions to our visualization tool. Additionally, we will add the attributions and tokens for each example into an array so we can use them when we want to further examine the attributions scores for each example. More information about the integrated gradients function can be found [here](https://captum.ai/api/layer.html#layer-integrated-gradients).

In [None]:
vis_data_records = []
all_attributions = {}
all_tokens = {}
all_deltas = {}

In [None]:
# Takes in dataset and example number
def get_token_attributions(dataset, example):
  text = dataset['text'][example]
  label = dataset['labels'][example]

  # get the inputs, position ids, attention mask, and the baselines
  input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
  position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
  attention_mask = construct_attention_mask(input_ids)

  #get the tokens
  indices = input_ids[0].detach().tolist()
  all_tokens_curr = tokenizer.convert_ids_to_tokens(indices)
  all_tokens[str(example)] = all_tokens_curr

  #perform integrated gradients
  attributions, delta = lig.attribute(inputs=input_ids,
                                    baselines=ref_input_ids,
                                    return_convergence_delta=True,
                                    additional_forward_args=(position_ids, attention_mask),
                                    target=1,
                                    n_steps=500,
                                    internal_batch_size = 2)

  # We want one value for every token.
  attributions_sum = summarize_attributions(attributions)

  # store the values in our dictionary
  all_attributions[str(example)] = attributions_sum
  all_deltas[str(example)] = attributions_sum

  # get the score for our visualization
  score = predict(input_ids, position_ids, attention_mask)

  # storing couple samples in an array for visualization purposes
  # requires array of attributions, prediction score, predicted class, true class 
  # the label you want your attributions to associate positive with, the attribution score
  # the tokens, and the delta if you have it.
  vis_data_records.append(viz.VisualizationDataRecord(
                        attributions_sum,
                        torch.softmax(score, dim = 1).max(),
                        torch.argmax(torch.softmax(score, dim = 1)),
                        label,
                        str(1),
                        attributions_sum.sum(),       
                        all_tokens_curr,
                        delta)
  )

Here we are taking some examples from the Papers datasets.

In [None]:
get_token_attributions(cogs402_ds, 7)

This function allows us to display our attributions in a manner that is easy to read. We can see the attributions of the word overlayed on top of their respective token. The green colour represents positive attributions (i.e. the model is attributing this token to influential for predicting the positive class) while the red colour represents negative attributions. 

In [None]:
# # storing couple samples in an array for visualization purposes
# score_vis = viz.VisualizationDataRecord(
#                         attributions_sum,
#                         torch.softmax(score, dim = 1).max(),
#                         torch.argmax(torch.softmax(score, dim = 1)),
#                         label,
#                         str(1),
#                         attributions_sum.sum(),       
#                         all_tokens,
#                         delta)

print('\033[1m', 'Visualization For Score', '\033[0m')
_ = viz.visualize_text(vis_data_records)

[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.51),1.0,-1.12,"#s ĠSTART _ OF _ REC ORD = 0 |||| 0 |||| Ġ ĠBC ĠM ENT AL ĠHE ALTH ĠCENT RE ĠAND ĠSUB ST ANCE ĠAB USE ĉ Pat ient ĠLoc - S vc : Ġ ĠC PE - PM Ġ ĠOUT P AT IENT ĠPS Y CH I AT RY ĠCL IN IC Ġ ĠDIS CHAR GE ĠSUM M ARY Ġ ĠAd mitted : ĉ [ ** 2020 - 12 - 29 ** ] Ġ Ġ( DD / MM / YY YY ) ĠDis charged : ĉ Ġ Ġ( DD / MM / YY YY ) Ġ Ġ Ġ ĠDate Ġof Ġadmission : Ġ Ġ[ ** 2020 - 12 - 29 ** ]. Ġ ĠDate Ġof Ġdischarge : Ġ Ġ[ ** 2020 - 12 - 30 ** ]. Ġ Ġ ĠAdmission Ġmedications : Ġ Ġ1 . ĉ R is per id one Ġ1 Ġmg Ġdaily . Ġ2 . ĉ F lu v ox amine Ġ125 Ġmg Ġdaily . Ġ3 . ĉ D oxy cycl ine Ġ100 Ġmg Ġdaily . Ġ4 . ĉ Fer am ax Ġ300 Ġmg Ġdaily . Ġ5 . ĉ Mel atonin Ġ10 Ġmg Ġnightly . Ġ6 . ĉ As Ġneeded : Ġ ĠBen ad ry l . Ġ ĠDis charge Ġmedications : Ġ ĠNo Ġchange Ġto Ġany Ġmedications Ġexcept : Ġ Ġ1 . ĉ R is per id one Ġwill Ġbe Ġslightly Ġincreased Ġby Ġ0 . 125 Ġmg Ġevery Ġweek Ġfor Ġa Ġtarget Ġdose Ġof Ġ1 Ġmg Ġp . o . Ġb . i . d . Ġ2 . ĉ Benz trop ine Ġhas Ġbeen Ġadded Ġas Ġa Ġp . r . n . Ġmedication Ġin Ġcase Ġof Ġany Ġextrap y ram idal Ġside Ġeffects . Ġ Ġ Ġ3 . ĉ D oxy cycl ine , ĠFer am ax , ĠMel atonin , Ġand Ġflu v ox amine Ġare Ġunchanged . Ġ ĠAD M ISSION ĠD IA GN OS IS ĠAut istic Ġspectrum Ġdisorder , Ġborderline Ġintellectual Ġdisability , Ġaggression Ġin Ġthe Ġcontext Ġof Ġprevious . Ġ ĠDIS CHAR GE ĠD IA GN OS IS ĠAut istic Ġspectrum Ġdisorder , Ġborderline Ġintellectual Ġdisability , Ġaggression Ġin Ġthe Ġcontext Ġof Ġprevious . Ġ ĠCOUR SE ĠIN ĠH OSP ITAL ĠPlease Ġsee Ġthe Ġexcellent Ġconsultation Ġnote Ġby ĠDr . Ġ[ ** First ĠName 4 Ġ( Name Pattern 1 ) Ġ1 ** ] Ġ[ ** Last ĠName Ġ( Name Pattern 1 ) Ġ2 ** ] Ġregarding Ġthe Ġadmission Ġof ĠNo or ayne ĠLad ha . Ġ ĠI Ġhave Ġknown ĠNo or ayne Ġfrom Ġprevious Ġconsultations Ġin Ġthe Ġemergency Ġdepartment , Ġand Ġthis Ġadmission Ġsummary Ġwas Ġvery Ġhelpful Ġin Ġcatching Ġme Ġup Ġto Ġthe Ġthings Ġthat Ġhave Ġrecently Ġhappened . Ġ ĠNo or ayne Ġhad Ġa Ġvery Ġshort Ġstay Ġon Ġthe ĠCAP E Ġunit Ġwhich Ġwas Ġun event ful Ġfrom Ġany Ġsafety Ġevents . Ġ Ġ ĠBrief ly , ĠNo or ayne Ġwas Ġadmitted Ġto Ġhospital Ġfor Ġendorsing Ġsuicidal Ġide ation Ġin Ġthe Ġcontext Ġof Ġconflict Ġwith Ġhis Ġfamily , Ġwho Ġhad Ġremoved Ġequipment Ġthat Ġhe Ġusually Ġuses Ġfor Ġentertainment Ġ( Nintendo ĠDS , Ġcomputer ) Ġafter Ġhe Ġbroke Ġa Ġlaptop Ġin Ġfrustration . Ġ ĠDuring Ġthe Ġnext Ġday , Ġhe Ġbecame Ġvery Ġupset Ġwith Ġrespect Ġto Ġhis Ġloss Ġof Ġprivileges , Ġhad Ġto Ġbe Ġremoved Ġfrom Ġschool Ġafter Ġgetting Ġvery Ġupset , Ġand Ġthen Ġwhen Ġhe Ġwas Ġat Ġhome Ġhe Ġindicated Ġthat Ġhe Ġwanted Ġto Ġhurt Ġhimself . Ġ ĠMother Ġdrove Ġhim Ġto Ġhospital . Ġ Ġ ĠNo or ayne Ġspent Ġthe Ġevening Ġsleeping , Ġand Ġwas Ġvery Ġeasy Ġto Ġbe Ġinterviewed Ġon Ġthe Ġnext Ġday Ġof Ġadmission . Ġ Ġ ĠMeeting Ġwith ĠNo or ayne , Ġit Ġwas Ġclear Ġthat Ġhe Ġhad Ġreturned Ġto Ġhis Ġbaseline . Ġ ĠHe Ġhad Ġperceived Ġsome Ġsl ights Ġwith Ġrespect Ġto Ġhis Ġmom Ġtaking Ġaway Ġhis Ġelectronics , Ġbut Ġhe Ġadmitted Ġthat Ġhe Ġtook Ġthings Ġtoo Ġfar Ġand Ġfelt Ġlike Ġhe Ġsaid Ġthings Ġthat Ġhe Ġdid Ġnot Ġmean . Ġ ĠHe Ġaut istically Ġexplained Ġmany Ġproblems Ġthat Ġhe Ġhas Ġwith Ġthe Ġworld , Ġand Ġon Ġthe Ġface Ġof Ġit Ġthese Ġare Ġconcerning Ġthings Ġto Ġbe Ġsaid . Ġ ĠFor Ġexample , Ġhe Ġbelieves Ġthat Ġthe Ġworld Ġwould Ġbe Ġbetter Ġwithout Ġany Ġchildren . Ġ ĠHe Ġbelieves Ġthat Ġthe Ġworld Ġwould Ġbe Ġbetter Ġif Ġsomeone Ġwould Ġkill Ġ[ ** First ĠName 4 Ġ( Name Pattern 1 ) Ġ3 ** ] Ġ[ ** Last ĠName Ġ( Name Pattern 1 ) Ġ4 ** ]. Ġ ĠHe Ġbelieves Ġthat Ġif Ġthere Ġwere Ġno Ġbanks Ġor Ġmoney , Ġthat Ġpeople Ġwould Ġbe Ġhappier . Ġ ĠHe Ġalso Ġmakes Ġmisogyn istic Ġand Ġracial Ġcomments , Ġsharing Ġwith Ġme Ġthat Ġhe Ġwished Ġthat Ġthere Ġwere Ġ"" no Ġwomen , Ġand Ġno Ġblack Ġpeople ."" Ġ ĠHe Ġsays Ġthese Ġvery Ġmatter Ġof Ġfact ly , Ġand ĠI Ġdo Ġnot Ġbelieve Ġhis Ġintention Ġis Ġto Ġcreate Ġany Ġoffense , Ġbut Ġit Ġis Ġclear Ġthat Ġby Ġexpressing Ġthese Ġthings Ġhe Ġis Ġgoing Ġto Ġcreate Ġsignificant Ġconcern Ġfor Ġpeople Ġaround Ġhim . Ġ ĠI Ġshared Ġthat Ġwith Ġhim , Ġand Ġhe Ġadmitted Ġthat Ġhe Ġshould Ġnot Ġsay Ġthose Ġthings Ġout Ġloud . Ġ ĠI Ġbelieve Ġthat Ġin Ġhis Ġautistic Ġworld , Ġhe Ġis Ġvery Ġinfluenced Ġby Ġhis Ġonline Ġhang Ġouts . Ġ ĠHe Ġis Ġparticularly Ġinterested Ġin Ġtwo ĠInternet Ġweb Ġsites , ĠReddit , Ġand Ġ4 Ġ[ ** Last ĠName Ġ( un ) Ġ5 ** ], Ġboth Ġplaces Ġwhere Ġif Ġone Ġwants Ġto , Ġthey Ġcan Ġdescend Ġinto Ġa Ġworld Ġof Ġsignificant Ġmisogyny , Ġracism , Ġand Ġhatred . Ġ ĠI Ġbelieve Ġthat ĠNo or ayne Ġis Ġvery Ġinfluenced Ġby Ġthings Ġthat Ġhe Ġreads Ġonline , Ġand Ġis Ġvery Ġpowerfully Ġcaptured Ġby Ġfunny Ġthings Ġsuch Ġas Ġthe Ġmeans Ġor Ġjokes , Ġeven Ġat Ġthe Ġexpense Ġof Ġmisogyny Ġor Ġracism . Ġ ĠI Ġshared Ġwith ĠNo or ayne Ġthat Ġhe Ġneeds Ġto Ġmake Ġsure Ġthat Ġhe Ġis Ġlooking Ġat Ġthings Ġthat Ġare Ġappropriate Ġand Ġremembering Ġthat Ġpeople Ġonline Ġcan Ġbe Ġmanipulative . Ġ ĠNo or ayne Ġsuperf ic ially Ġaccepts Ġthis Ġbut Ġalso Ġbelieves Ġthat Ġhe Ġis Ġpart Ġof Ġthe Ġgroup Ġand Ġbelieves Ġthat Ġhe Ġcould Ġlead Ġ"" a Ġrevolution Ġagainst Ġthe Ġworld ."" Ġ Ġ ĠNo or ayne Ġadmits Ġthat Ġhe Ġshould Ġnot Ġhave Ġbeen Ġaggressive Ġtowards Ġhis Ġmother Ġand Ġshould Ġnot Ġhave Ġthreatened Ġsuicide . Ġ ĠHe Ġis Ġno Ġlonger Ġsuicidal . Ġ ĠHe Ġfeels Ġlike Ġgoing Ġhome Ġis Ġappropriate , Ġand Ġfelt Ġlike Ġhe Ġhad Ġno Ġdifficulties Ġwith Ġbeing Ġdischarged Ġtoday . Ġ ĠHe Ġwanted Ġto Ġbe Ġa Ġpart Ġof Ġany Ġfamily Ġplanning Ġmeetings , Ġand Ġwas Ġreceptive Ġto Ġthe Ġidea Ġof Ġme Ġmeeting Ġwith Ġhis Ġparents Ġfirst . Ġ Ġ ĠWhen Ġparents Ġcame Ġin Ġfor Ġa Ġmeeting , Ġthey Ġshared Ġthat Ġhis Ġbehavior Ġhad Ġdeclined Ġsince ĠMarch , Ġwith Ġa Ġreduction Ġof Ġris per id one Ġfrom Ġa Ġhigher Ġdose Ġwhich Ġcaused Ġsignificant Ġo cul ogy ric Ġevents , Ġdown Ġto Ġ1 Ġmg Ġwhich Ġhas Ġnot Ġled Ġto Ġany Ġo cul ogy ric Ġevents Ġbut Ġhas Ġbeen Ġless Ġhelpful Ġwith Ġrespect Ġto Ġcontaining Ġhis Ġaggression . Ġ ĠWe Ġexplored Ġoptions Ġwith Ġrespect Ġto Ġtreating Ġthis Ġand Ġparents Ġselected Ġthe Ġtreatment Ġplan Ġbelow , Ġwhich Ġwas Ġour Ġrecommended Ġtreatment Ġplan . Ġ Ġ ĠParents Ġhad Ġno Ġconcerns Ġin Ġtaking ĠNo or ayne Ġhome Ġand Ġfelt Ġthat Ġhe Ġhad Ġreturned Ġto Ġbaseline . Ġ ĠThey Ġwere Ġappreci ative Ġof Ġhis Ġshort Ġstay Ġon Ġthe Ġunit Ġbut Ġthey Ġdo Ġmiss Ġhim Ġand Ġthe Ġunit Ġwas Ġactive Ġwith Ġrespect Ġto Ġdistress Ġby Ġother Ġpatients , Ġso Ġthey Ġwere Ġworried Ġabout Ġthe Ġinfluence Ġof Ġfear Ġon ĠNo or ayne . Ġ ĠThey Ġelected Ġto Ġtake Ġhim Ġhome Ġon Ġdischarge . Ġ Ġ ĠNo or ayne Ġwas Ġdischarged Ġeasily Ġwith Ġno Ġcomplications Ġon Ġ[ ** 2020 - 12 - 30 ** ], Ġat Ġapproximately Ġ2 Ġp . m . Ġ ĠIM PRESS ION ĠIf Ġfeels Ġlike ĠNo or ayne Ġwas Ġdoing Ġbetter Ġat Ġ2 Ġmg Ġof Ġris per id one Ġbut Ġhe Ġwas Ġhaving Ġevents Ġthat Ġwere Ġquite Ġconvinc ingly Ġo cul ogy ric Ġcrises . Ġ ĠThese Ġevents Ġare Ġdangerous Ġwith Ġrespect Ġto Ġtheir Ġconnection Ġto Ġother Ġcentral Ġdy ston ias , Ġwhich Ġcan Ġinclude Ġl ary ng osp asm . Ġ ĠFor Ġthis Ġreason , Ġit Ġis Ġvery Ġimportant Ġto Ġweigh Ġthe Ġo cul ogy ric Ġcrisis ' Ġsignificant Ġnegative Ġwith Ġrespect Ġto Ġris per id one . Ġ ĠAt Ġthe Ġsame Ġtime , Ġthe Ġris per id one Ġwas Ġrestart ed Ġfrom Ġa Ġswitch Ġfrom Ġa rip ip raz ole Ġat Ġa Ġhigh Ġdose Ġand Ġramp ed Ġup Ġvery Ġquickly , Ġand Ġwith Ġhis Ġprevious Ġgood Ġresponse Ġto Ġris per id one Ġand Ġhis Ġlack Ġof Ġany Ġdy ston ia Ġtoday , Ġwe Ġfelt Ġthat Ġit Ġwas Ġappropriate Ġto Ġtry Ġand Ġincrease Ġhis Ġris per id one Ġgradually . Ġ ĠAnother Ġoption Ġthat Ġwas Ġconsidered Ġwas Ġto Ġkeep Ġhis Ġris per id one Ġexactly Ġwhere Ġit Ġwas , Ġbut Ġto Ġadd Ġcl on idine . Ġ ĠIn Ġthe Ġpursuit Ġof Ġnot Ġadding Ġtoo Ġmany Ġmedications Ġtogether , Ġhe Ġis Ġalready Ġon Ġquite Ġa Ġlist , Ġwe Ġelected Ġto Ġdo Ġa Ġcautious Ġre Ġtit ration Ġof Ġris per id one , Ġwatching Ġout Ġfor Ġany Ġo cul ogy ric Ġevents . Ġ Ġ ĠWith Ġrespect Ġto Ġhis Ġbehavior Ġand Ġlanguage , Ġit Ġwill Ġalways Ġbe Ġshocking Ġand Ġin Ġhis Ġautistic Ġworld , Ġhe Ġis Ġnot Ġcausing Ġany Ġoffence Ġby Ġhis Ġstatements . Ġ ĠHe Ġcomes Ġacross Ġas Ġlegitimately Ġa Ġcharming Ġperson , Ġbut Ġwhen Ġyou Ġdelve Ġinto Ġhis Ġthinking Ġit Ġis Ġclear Ġthat Ġhe Ġis Ġvery Ġblack Ġand Ġwhite , Ġand Ġhe Ġhas Ġbeen Ġheavily Ġinfluenced Ġby Ġracial Ġand Ġmisogyn istic Ġposts Ġonline . Ġ ĠThat Ġall Ġbeing Ġsaid , Ġhe Ġtreats Ġpeople Ġwith Ġrespect Ġand Ġsays Ġthe Ġright Ġthing Ġwhen Ġhe Ġknows Ġhe Ġshould . Ġ ĠSome Ġof Ġhis Ġmore Ġoutlandish Ġstatements Ġare Ġvery Ġdifficult Ġto Ġdigest , Ġbut ĠI Ġbelieve Ġthat Ġthere Ġis Ġno Ġcurrent Ġevidence Ġof Ġany Ġsignificant Ġviolence Ġtowards Ġothers Ġor Ġhis Ġthreats Ġthat Ġhe Ġhas Ġmentioned Ġto Ġother Ġpeople Ġdo Ġnot Ġseem Ġto Ġhave Ġsignificant Ġweight Ġbehind Ġthem . Ġ ĠI Ġknow Ġthat Ġhe Ġhas Ġlost Ġschool Ġstanding Ġand Ġhad Ġto Ġswitch Ġschools Ġbecause Ġof Ġa Ġthreat Ġtowards Ġprincipal , Ġand Ġhe Ġsays Ġsome Ġthings Ġsuch Ġas Ġ"" w anting Ġto Ġkill Ġall Ġ Ġchildren Ġwhich Ġare Ġobviously Ġunsett led . Ġ ĠAt Ġthe Ġsame Ġtime , Ġhis Ġautistic Ġspectrum Ġdisorder Ġis Ġnot Ġtreat able , Ġand Ġhe Ġis Ġwell Ġcontained Ġand Ġresponds Ġvery Ġwell Ġto Ġa Ġbehavioral Ġapproach . Ġ ĠHe Ġholds Ġhimself Ġto Ġa Ġvery Ġhigh Ġstandard Ġand ĠI Ġbelieve Ġhis Ġgreatest Ġrisks Ġare Ġwhen Ġhe Ġis Ġfrustrated Ġdoing Ġself Ġinjury Ġor Ġattempting Ġto Ġel ope . Ġ ĠI Ġdo Ġnot Ġbelieve Ġthat Ġhe Ġis Ġat Ġrisk Ġfor Ġhom icidal Ġacting Ġout , Ġeither Ġinjuring Ġothers Ġor Ġtrying Ġto Ġkill Ġothers . Ġ Ġ ĠWhen Ġit Ġcomes Ġto Ġhis Ġsuicide Ġrisk , Ġhis Ġautism Ġand Ġborderline ĠIQ Ġare Ġprotective Ġfactors . Ġ ĠThat Ġbeing Ġsaid , Ġin Ġfrustration ĠI Ġcould Ġsee Ġhim Ġhurting Ġhimself . Ġ ĠFor Ġthis Ġreason , Ġfrustration Ġtolerance Ġis Ġone Ġof Ġour Ġbiggest Ġgoals , Ġwhich ĠI Ġhope Ġthat Ġthe Ġmedication Ġchanges Ġwill Ġaccommodate . Ġ ĠAs Ġwell , ĠI Ġhave Ġencouraged Ġparents Ġnot Ġto Ġtry Ġtoo Ġmany Ġbehavioral Ġthings Ġright Ġnow Ġwhile Ġhe Ġis Ġclearly Ġunst ead y . Ġ Ġ ĠFor Ġall Ġthese Ġreasons , ĠI Ġfelt Ġthat Ġdischarge Ġfrom Ġhospital Ġwas Ġappropriate , Ġthere Ġis Ġa Ġchronic Ġrisk Ġof Ġhurting Ġhimself , Ġand Ġhe Ġhas Ġpreviously Ġmade Ġthreats Ġagainst Ġothers , Ġhowever ĠI Ġdo Ġnot Ġbelieve Ġthat Ġthese Ġthreats Ġor Ġviolent Ġparameters Ġwill Ġchange Ġwith Ġany Ġin patient Ġtreatment , Ġand Ġa Ġgradual Ġtit ration Ġof Ġris per id one Ġis Ġmost Ġappropriately Ġdone Ġas Ġan Ġoutpatient . Ġ ĠWatching ĠNo or ayne Ġreact Ġso Ġnegatively Ġto Ġemotion Ġof Ġthe Ġunit Ġ( another Ġpatient Ġbecame Ġquite Ġdistressed ) Ġwas Ġalso Ġquite Ġconvincing Ġthat Ġthe Ġhospital ization Ġwas Ġrelatively Ġtraumatic Ġfor ĠNo or ayne . Ġ ĠWith Ġparents Ġbeing Ġon Ġboard Ġwith Ġthe Ġsafety Ġplan , Ġdemonstrating Ġexcellent Ġjudgment Ġwith Ġrespect Ġto Ġmanaging ĠNo or ayne , Ġand ĠNo or ayne 's Ġwillingness Ġto Ġtry Ġto Ġtake Ġthings Ġa Ġlittle Ġbit Ġeasier Ġand Ġtry Ġa Ġnew Ġmedication , Ġdischarge Ġwas Ġappropriate . Ġ ĠTRE AT MENT ĠPLAN Ġ1 . ĉ Increase Ġgradually Ġris per id one Ġby Ġ0 . 125 Ġmg Ġevery Ġweek Ġto Ġa Ġtarget Ġdose Ġof #/s"
,,,,


## Further Examination of the Attributions

Next we might want to look in-depth about the attribution scores for each token of an example. We saved the attributions for the examples we looked at above, so we can easily retrieve the attributions. We also grab the examples because we want to know what tokens the attributions are associated with.

Both lists are of shape: (seq_len)

In [None]:
example = 7
attributions_sum = all_attributions[f"{example}"]
all_tokens2 = all_tokens[f"{example}"]

These functions return which words had the strongest (most positive and most negative) attributions. Change the number of tokens you wish to visualize for your needs. It takes in the attributions and the tokens we grabbed in the previous cell and returns 3 lists: the topk (or bottomk) attributions, their respective token and their position.

Note: Remember that the attributions are with respect to the positive class, so the most impact tokens that helped the model predict the negative class will be in the botk attributed tokens.

In [None]:
def get_topk_attributed_tokens(attrs, all_tokens, k=20):
    values, indices = torch.topk(attrs, k)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

In [None]:
def get_botk_attributed_tokens(attrs, all_tokens, k=20):
    values, indices = torch.topk(attrs, k, largest=False)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

Convert the values, index of the values, and the token into a pandas Dataframe for visualization. It will be sorted by highest value for attributions to lowest. Alternatively, if youre looking for the most negative attributions, it goes from lowest to highest.

In [None]:
top_words_start, top_words_val_start, top_word_ind_start = get_topk_attributed_tokens(attributions_sum.cpu(), all_tokens2)
bot_words_start, bot_words_val_start, bot_word_ind_start = get_botk_attributed_tokens(attributions_sum.cpu(), all_tokens2)

df_high = pd.DataFrame({'Word': top_words_start, 'index':top_word_ind_start, 'attribution': top_words_val_start})

df_low = pd.DataFrame({'Word': bot_words_start, 'index':bot_word_ind_start, 'attribution': bot_words_val_start})
# df_start.style.apply(['cell_ids: False'])

# ['{}({})'.format(token, str(i)) for i, token in enumerate(all_tokens)]

Here we display our top k positively and negatively attributed tokens for our example.

In [None]:
df_high

Unnamed: 0,Word,index,attribution
0,Ġaut,649,0.073457
1,F,150,0.069968
2,Ġprivileges,530,0.06658
3,Ġautistic,1709,0.066424
4,Ġevidence,1633,0.066358
5,Ġfactors,1802,0.065642
6,ĠAt,1703,0.063147
7,Ġfrom,536,0.060302
8,:,205,0.060179
9,Ġpursuit,1475,0.060011


In [None]:
df_low

Unnamed: 0,Word,index,attribution
0,Ġthat,1643,-0.07848
1,Ġthat,601,-0.071664
2,Ġthat,1765,-0.070016
3,Ġthat,1194,-0.069526
4,Ġthat,1016,-0.069123
5,Ġthat,960,-0.067535
6,Ġthat,1876,-0.066175
7,Ġthat,966,-0.065946
8,Ġand,297,-0.064247
9,Ġgetting,539,-0.063121


In [None]:
d = {"tokens":all_tokens2, "attribution":attributions_sum[:len(all_tokens2)].cpu()}

We notice that there are many repeating tokens in each example that have different positions. While we might want to know how the position plays into the attributions, if we want to know strictly based on the tokens itself, we can add all the duplicate tokens together to get the aggregate attribution for each token. Therefore, we aggregate the attributions strictly based on token type.

In [None]:
df_attrib = pd.DataFrame(d)
aggregation_functions = {'attribution': 'sum'}
df_new = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_functions)

In [None]:
highest_attrib_tokens = df_new.sort_values(by=['attribution'], ascending=False).reset_index()
highest_attrib_tokens[:10]

Unnamed: 0,tokens,attribution
0,.,1.070852
1,Ġwas,0.628402
2,Ġthe,0.523456
3,Ġto,0.500308
4,Ġof,0.389774
5,Ġbe,0.348745
6,Ġa,0.338468
7,Ġfrom,0.337042
8,per,0.315743
9,:,0.305437


In [None]:
lowest_attrib_tokens = df_new.sort_values(by=['attribution']).reset_index()
lowest_attrib_tokens[:10]

Unnamed: 0,tokens,attribution
0,Ġthat,-2.287376
1,Ġand,-1.698054
2,Ġ,-1.03444
3,ĉ,-0.385827
4,Ġwith,-0.353948
5,Ġbut,-0.312914
6,ĠI,-0.297695
7,Ġby,-0.296279
8,Ġit,-0.281805
9,Ġany,-0.259322


## Masking the stopwords and non-alpha tokens

There may be some stopwords or punctuations in our top attributed tokens, so now that we have the list of the highest and lowest, we can identify interesting keywords.

In [None]:
import nltk
from transformers import AutoTokenizer
nltk.download('stopwords')
tokenizer2 = AutoTokenizer.from_pretrained('allenai/longformer-base-4096', add_prefix_space=True)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


Downloading tokenizer.json:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
from nltk.corpus import stopwords
all_stopwords = stopwords.words('english')
all_stopwords.append(" ")
stopwords = set(tokenizer2.tokenize(all_stopwords, is_split_into_words =True))
stopwords.update(all_stopwords)
print(stopwords)

{'ve', 'Ġany', 'should', 'Ġwhich', 'Ġd', 'on', "haven't", 'isn', 'Ġonce', 'Ġonly', 'Ġmyself', 'o', 'Ġsome', 'themselves', 'nor', 'Ġthere', 'Ġbut', 'yourself', 'y', 'most', 'such', 'those', "mightn't", 'Ġours', 'aren', 'hasn', 'Ġbetween', 'Ġown', 'Ġhim', 'Ġnow', 'why', 'Ġthrough', 'both', 'ourselves', 'had', 'mustn', 'against', 'Ġhers', 'is', "that'll", 'once', 'Ġout', 'Ġmost', 'Ġthe', 'by', 'having', 'Ġhe', 'Ġaren', 'Ġme', 'Ġwho', 'Ġboth', 'Ġneed', "you've", "aren't", 't', 'shouldn', 'Ġbefore', 'been', 'Ġagainst', 'these', 'they', 'Ġto', 'Ġall', 'Ġunder', 'Ġbecause', 'ain', 'Ġfor', 'from', 'Ġhow', 'mightn', 'shan', "don't", 'Ġthese', 'Ġwere', 'Ġdoing', 'Ġup', 'Ġnor', 'this', 'after', 'below', 'Ġa', 'have', 'Ġbe', 'above', 'n', 'Ġhaven', 'Ġt', 'Ġdoes', 'just', 'he', 'too', "didn't", 'your', 'll', 'Ġwon', 'theirs', 'between', 'you', 'Ġif', 'through', 'couldn', 'will', 'any', "'s", 'Ġwe', "wouldn't", 'Ġi', 'Ġafter', 'Ġwhom', 'Ġdown', 'Ġover', "doesn't", 'Ġthey', 'over', 'under', 'Ġtheir',

In [None]:
highest_attrib_tokens[(highest_attrib_tokens['tokens'].str.isalpha()) & ~(highest_attrib_tokens['tokens'].isin(stopwords)) & ~(highest_attrib_tokens['tokens']==0)][:10].reset_index(drop=True)

Unnamed: 0,tokens,attribution
0,per,0.315743
1,Ġris,0.265303
2,ĠHe,0.255713
3,Ġthings,0.233263
4,Ġworld,0.176242
5,Ġautistic,0.173848
6,ogy,0.170765
7,cul,0.143677
8,Ġprevious,0.137055
9,Ġbelieves,0.116889


In [None]:
lowest_attrib_tokens[(lowest_attrib_tokens['tokens'].str.isalpha()) & ~(lowest_attrib_tokens['tokens'].isin(stopwords)) & ~(lowest_attrib_tokens['tokens']==0)][:10].reset_index(drop=True)

Unnamed: 0,tokens,attribution
0,ĉ,-0.385827
1,ĠI,-0.297695
2,ĠNo,-0.228441
3,Ġmg,-0.153738
4,Ġsaid,-0.152039
5,Ġevents,-0.145942
6,Ġappropriate,-0.105687
7,Ġmany,-0.105013
8,YY,-0.10226
9,Ġrespect,-0.101182


Using this [notebook](https://colab.research.google.com/drive/1lktilbL1IY4nBanlzCdP8TLsBNfUsl_U?usp=sharing), we can get the files to view the aggregated attributions for the entire dataset for both the positive and negative classes. This means we summed up and averaged the attributions for every instance of any given token throughout the entire dataset (whether or not they have positive or negative attributions).

In [None]:
df_word = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/longformer_emb_papers.csv")

Here we see the highest attributions for the positive class, meaning that these tokens have the most influence when the model tries to predict positive. All of these words do have relevence to A.I. related topics.

In [None]:
df_word[:15]

Unnamed: 0,tokens,attribution
0,Ġlearning,0.163092
1,.,0.145281
2,Ġneural,0.110611
3,Ġdata,0.097347
4,",",0.077573
5,Ġthe,0.072926
6,Ġtraining,0.052609
7,Ġdataset,0.050907
8,Ġalgorithms,0.048352
9,ĠAI,0.045684


Here we see the largest attributions for the negative class, meaning that these tokens have the most influence when the model predicts negative.

In [None]:
df_word[:-15:-1]

Unnamed: 0,tokens,attribution
30061,Ġprogramming,-0.121651
30060,Ġprogram,-0.085085
30059,Ġprograms,-0.078384
30058,Ġlanguages,-0.070023
30057,Ġlanguage,-0.054024
30056,Ġ.,-0.053213
30055,Ġcode,-0.049736
30054,Ġsoftware,-0.037241
30053,Ġcompiler,-0.030792
30052,ĠProgramming,-0.029799
