This notebook adapts the [Captum tutorial for question answering](https://captum.ai/tutorials/Bert_SQUAD_Interpret) and refactors it into the longformer sequence classification task. Specifically, this notebook focuses on using the model's embeddings to get token attributions for the examples of your choice, or the entire dataset if needed. By doing so, we can visualize which tokens have the most influence in the model's prediction, and find out the k tokens with the most influence at helping the model predict correctly as well as incorrectly.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import dependencies

In [None]:
pip install transformers --quiet

[K     |████████████████████████████████| 4.4 MB 8.1 MB/s 
[K     |████████████████████████████████| 101 kB 12.0 MB/s 
[K     |████████████████████████████████| 6.6 MB 63.5 MB/s 
[K     |████████████████████████████████| 596 kB 68.2 MB/s 
[?25h

In [None]:
pip install captum --quiet

[?25l[K     |▎                               | 10 kB 21.5 MB/s eta 0:00:01[K     |▌                               | 20 kB 27.1 MB/s eta 0:00:01[K     |▊                               | 30 kB 17.8 MB/s eta 0:00:01[K     |█                               | 40 kB 7.6 MB/s eta 0:00:01[K     |█▏                              | 51 kB 7.7 MB/s eta 0:00:01[K     |█▍                              | 61 kB 9.1 MB/s eta 0:00:01[K     |█▋                              | 71 kB 8.2 MB/s eta 0:00:01[K     |█▉                              | 81 kB 8.8 MB/s eta 0:00:01[K     |██                              | 92 kB 9.7 MB/s eta 0:00:01[K     |██▎                             | 102 kB 8.3 MB/s eta 0:00:01[K     |██▌                             | 112 kB 8.3 MB/s eta 0:00:01[K     |██▊                             | 122 kB 8.3 MB/s eta 0:00:01[K     |███                             | 133 kB 8.3 MB/s eta 0:00:01[K     |███▏                            | 143 kB 8.3 MB/s eta 0:00:01[K  

In [None]:
pip install datasets --quiet

[K     |████████████████████████████████| 365 kB 7.5 MB/s 
[K     |████████████████████████████████| 140 kB 69.2 MB/s 
[K     |████████████████████████████████| 212 kB 68.4 MB/s 
[K     |████████████████████████████████| 1.1 MB 24.9 MB/s 
[K     |████████████████████████████████| 127 kB 79.3 MB/s 
[K     |████████████████████████████████| 94 kB 2.8 MB/s 
[K     |████████████████████████████████| 144 kB 70.2 MB/s 
[K     |████████████████████████████████| 271 kB 44.9 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
[?25h

In [None]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [None]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

Here we are importing the model and tokenizer and letting the model use our GPU to run. Please change model path, and tokenizer to whichever one you wish to use.

In [None]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
model_path = 'danielhou13/longformer-finetuned_papers_v2'
#model_path = 'danielhou13/longformer-finetuned-new-cogs402'

# load model
test = torch.load("/content/drive/MyDrive/fakeclinicalnotes/models/full_augmented_lr2e-5_dropout3_10_trained_threshold.pt")
model = LongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096', state_dict=test['state_dict'], num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Downloading:   0%|          | 0.00/694 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570M [00:00<?, ?B/s]

Some weights of the model checkpoint at allenai/longformer-base-4096 were not used when initializing LongformerForSequenceClassification: ['longformer_model.encoder.layer.3.attention.self.value_global.bias', 'longformer_model.encoder.layer.9.attention.output.LayerNorm.bias', 'longformer_model.encoder.layer.2.attention.output.dense.weight', 'longformer_model.encoder.layer.8.attention.output.LayerNorm.weight', 'longformer_model.encoder.layer.4.attention.output.dense.bias', 'longformer_model.encoder.layer.3.attention.self.query_global.weight', 'longformer_model.encoder.layer.9.output.dense.bias', 'longformer_model.encoder.layer.0.intermediate.dense.weight', 'longformer_model.encoder.layer.4.attention.output.dense.weight', 'longformer_model.encoder.layer.9.output.LayerNorm.weight', 'longformer_model.encoder.layer.1.attention.self.key_global.weight', 'longformer_model.encoder.layer.7.attention.self.key.weight', 'longformer_model.encoder.layer.0.intermediate.dense.bias', 'longformer_model.em

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Create functions that give us the input ids and the position ids for the text we want to examine along with the baselines for integrated gradients.

In [None]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [None]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)

    #taken from the longformer implementation
    mask = input_ids.ne(ref_token_id).int()
    incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask
    position_ids = incremental_indices.long().squeeze() + ref_token_id

    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    position_ids = position_ids[:, :seq_length]
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

### Import Dataset

Here we import the papers dataset

In [None]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402datafake")["train"]

Downloading:   0%|          | 0.00/613 [00:00<?, ?B/s]

Using custom data configuration danielhou13--cogs402datafake-f5349e6cf83e41d8


Downloading and preparing dataset None/None (download: 59.78 KiB, generated: 114.45 KiB, post-processed: Unknown size, total: 174.22 KiB) to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402datafake-f5349e6cf83e41d8/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/61.2k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402datafake-f5349e6cf83e41d8/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Here we import the news dataset

In [None]:
# cogs402_ds = load_dataset("danielhou13/cogs402dataset2")["validation"]

## Getting the Attributions

A custom forward function that returns the softmaxed logits, which are the class probabilities that the model uses for prediction.

In [None]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [None]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

A helper function to summarize attributions for each word token in the sequence.

In [None]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.linalg.norm(attributions)
    return attributions

Perform Layer Integrated Gradients using the longformer's embeddings.

In [None]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)

This function will let us get the example and the baseline inputs in order to perform integrated gradients, and add the attributions to our visualization tool. Additionally, we will add the attributions and tokens for each example into an array so we can use them when we want to further examine the attributions scores for each example. More information about the integrated gradients function can be found [here](https://captum.ai/api/layer.html#layer-integrated-gradients).

In [None]:
vis_data_records = []
all_attributions = {}
all_tokens = {}
all_deltas = {}

In [None]:
# Takes in dataset and example number
def get_token_attributions(dataset, example):
  text = dataset['text'][example]
  label = dataset['labels'][example]

  # get the inputs, position ids, attention mask, and the baselines
  input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
  position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
  attention_mask = construct_attention_mask(input_ids)

  #get the tokens
  indices = input_ids[0].detach().tolist()
  all_tokens_curr = tokenizer.convert_ids_to_tokens(indices)
  all_tokens[str(example)] = all_tokens_curr

  #perform integrated gradients
  attributions, delta = lig.attribute(inputs=input_ids,
                                    baselines=ref_input_ids,
                                    return_convergence_delta=True,
                                    additional_forward_args=(position_ids, attention_mask),
                                    target=1,
                                    n_steps=500,
                                    internal_batch_size = 2)

  # We want one value for every token.
  attributions_sum = summarize_attributions(attributions)

  # store the values in our dictionary
  all_attributions[str(example)] = attributions_sum
  all_deltas[str(example)] = attributions_sum

  # get the score for our visualization
  score = predict(input_ids, position_ids, attention_mask)

  # storing couple samples in an array for visualization purposes
  # requires array of attributions, prediction score, predicted class, true class 
  # the label you want your attributions to associate positive with, the attribution score
  # the tokens, and the delta if you have it.
  vis_data_records.append(viz.VisualizationDataRecord(
                        attributions_sum,
                        torch.softmax(score, dim = 1).max(),
                        torch.argmax(torch.softmax(score, dim = 1)),
                        label,
                        str(1),
                        attributions_sum.sum(),       
                        all_tokens_curr,
                        delta)
  )

Here we are taking some examples from the Papers datasets.

In [None]:
get_token_attributions(cogs402_ds, 3)
# get_token_attributions(cogs402_ds, 891)
# get_token_attributions(cogs402_ds, 589)
# get_token_attributions(cogs402_ds, 605)
# get_token_attributions(cogs402_ds, 148)

Here we are taking some examples from the Papers datasets.

In [None]:
# get_token_attributions(cogs402_ds, 102)
# get_token_attributions(cogs402_ds, 1168)
# # get_token_attributions(cogs402_ds, 2307)
# # get_token_attributions(cogs402_ds, 2359)

This function allows us to display our attributions in a manner that is easy to read. We can see the attributions of the word overlayed on top of their respective token. The green colour represents positive attributions (i.e. the model is attributing this token to influential for predicting the positive class) while the red colour represents negative attributions. 

In [None]:
# # storing couple samples in an array for visualization purposes
# score_vis = viz.VisualizationDataRecord(
#                         attributions_sum,
#                         torch.softmax(score, dim = 1).max(),
#                         torch.argmax(torch.softmax(score, dim = 1)),
#                         label,
#                         str(1),
#                         attributions_sum.sum(),       
#                         all_tokens,
#                         delta)

print('\033[1m', 'Visualization For Score', '\033[0m')
_ = viz.visualize_text(vis_data_records)

[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,0 (0.52),1.0,15.34,"#s ĠSTART _ OF _ REC ORD = 0 |||| 0 |||| Ġ Ġï » ¿ BC ĠM ENT AL ĠHE ALTH ĠCENT RE ĠAND ĠSUB ST ANCE ĠAB USE ĉ Pat ient ĠLoc - S vc : Ġ ĠC PE - MH Ġ ĠDIS CHAR GE ĠSUM M ARY Ġ ĠAd mitted : ĉ 15 / 11 / 2017 Ġ Ġ( DD / MM / YY YY ) ĠDis charged : ĉ 20 / 11 / 2017 Ġ Ġ( DD / MM / YY YY ) Ġ Ġ Ġ ĠAD M ISSION ĠD IA GN OS IS ĠNot Ġgiven . Ġ ĠDIS CHAR GE ĠD IA GN OS ES Ġ1 . ĉ Border line Ġpersonality Ġorganization . Ġ2 . ĉ Pers istent Ġdepressive Ġdisorder / dy sth ym ia . Ġ3 . ĉ Social Ġanxiety . Ġ ĠMED IC ATIONS ĠON ĠDIS CHAR GE Ġ1 . ĉ S ert ral ine Ġ150 Ġmg Ġp . o . Ġnightly . Ġ2 . ĉ L or az ep am Ġsub ling ual Ġ0 . 5 Ġmg . Ġ ĠMED IC ATIONS ĠAT ĠDIS CHAR GE Ġ1 . ĉ S ert ral ine Ġ175 Ġmg Ġp . o . Ġnightly Ġfor Ġanother Ġ3 Ġdays Ġthen Ġincrease Ġto Ġ200 Ġmg Ġp . o . Ġnightly . Ġ2 . ĉ L or az ep am Ġ0 . 5 Ġmg Ġas Ġa Ġp . r . n . Ġfor Ġanxiety . Ġ3 . ĉ G rav ol Ġ25 Ġmg Ġq . 6 Ġh . Ġp . r . n . Ġinternal Ġagitation . Ġ ĠTRE AT MENT ĠPLAN ĠAFTER ĠDIS CHAR GE Ġ1 . ĉ Return Ġto Ġthe ĠTR ACC Ġprogram Ġto Ġbe Ġfollowed Ġby Ġ[ ** First ĠName 5 Ġ( Name Pattern 1 ) Ġ1 ** ] Ġ[ ** Last ĠName Ġ( Name Pattern 1 ) Ġ2 ** ]- [ ** Last ĠName Ġ( un ) Ġ3 ** ] Ġfor Ġdialect ical Ġbehavior Ġtherapy . Ġ2 . ĉ Continue Ġwith ĠDr . Ġ[ ** Last ĠName Ġ( ST itle ) Ġ4 ** ] Ġ[ ** Name Ġ( ST itle ) Ġ5 ** ] Ġfor Ġmedication Ġmanagement Ġuntil Ġthe Ġ[ ** Location Ġ( un ) Ġ6 ** ] Ġpsychiatrist Ġis Ġavailable Ġto Ġtake Ġover Ġcare . Ġ ĠID ENT IFIC ATION ĠAND ĠCOUR SE ĠON ĠTHE ĠUN IT ĠPlease Ġsee Ġother Ġrecords Ġfor Ġmore Ġdetails . Ġ ĠIn Ġbrief , Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġis Ġa Ġ14 - year - old Ġyoung Ġwoman Ġwho Ġhas Ġhad Ġmany Ġyears Ġof Ġdepressive Ġsymptoms . Ġ ĠMany Ġyears Ġof Ġsuicidal Ġthoughts Ġand Ġdescribed Ġmaking Ġa Ġcommitment Ġto Ġherself Ġto Ġkill Ġherself Ġby Ġthe Ġage Ġshe Ġwas Ġ11 Ġand Ġthe Ġcommitment Ġto Ġkill Ġherself Ġby Ġthat Ġtime Ġshe Ġwas Ġage Ġ13 , Ġand Ġthat Ġshe Ġreports Ġan Ġincrease Ġin Ġsu ic id ality Ġin Ġthe Ġ2 Ġweeks Ġprior Ġto Ġadmission Ġwithout Ġany Ġclear Ġstress ors Ġor Ġother Ġfactors Ġadding Ġinto Ġthis . Ġ Ġ ĠShe Ġwas Ġseen Ġas Ġhaving Ġa Ġchange Ġin Ġclinical Ġstatus Ġleading Ġto Ġan Ġadmission Ġto Ġclarify Ġher Ġcurrent Ġissues , Ġto Ġensure Ġthat Ġshe Ġ Ġhas Ġappropriate Ġfollow up , Ġpossibly Ġadjust Ġmedications . Ġ Ġ ĠWhile Ġin Ġhospital , Ġshe Ġpresented Ġas Ġbright , Ġche ery Ġwith Ġher Ġco - pat ients Ġand Ġin Ġfact Ġit Ġappears Ġthey Ġhave Ġformed Ġa Ġcad re Ġof Ġlike - minded Ġindividuals Ġwho Ġhad Ġexchanged Ġphone Ġnumbers Ġand Ġcontact Ġinformation , Ġand Ġover Ġthe Ġweekend Ġhad Ġall Ġsecretly Ġplanned Ġto Ġsabotage Ġeither Ġpasses Ġin Ġorder Ġto Ġget Ġback Ġon Ġto Ġthe Ġunit Ġto Ġcontinue Ġsocial izing . Ġ ĠWe Ġcertainly Ġworry Ġabout Ġregression Ġor Ġdeveloping Ġnew Ġsymptoms Ġor Ġcontag ion Ġand Ġparents Ġwere Ġspecific Ġthat Ġshe Ġwas Ġeating Ġmuch Ġless Ġwhen Ġshe Ġwould Ġbe Ġout Ġon Ġpass Ġthat Ġthey Ġwonder Ġif Ġthis Ġmight Ġbe Ġsecondary Ġto Ġinfluence Ġfrom Ġa Ġco - patient Ġwith Ġsome Ġeating Ġdisorder Ġsymptoms . Ġ Ġ ĠDespite Ġher Ġlooking Ġbright Ġand Ġche ery Ġwith Ġco - pat ients , Ġclearly Ġenjoying Ġherself Ġon Ġmeeting Ġwith Ġpsychiatrists , Ġshe Ġwas Ġfairly Ġconsistent Ġin Ġsaying Ġthat Ġshe Ġwas Ġdesp ond ent , Ġthat Ġshe Ġwould Ġkill Ġherself Ġif Ġshe Ġwent Ġon Ġpass ; Ġbut Ġthen Ġwould Ġgo Ġon Ġpass Ġwithout Ġany Ġsuicide Ġattempt , Ġthough Ġshe Ġalso Ġcuts Ġfor Ġself Ġsoothing , Ġand Ġcut Ġon Ġher ĠSaturday Ġpass , Ġand Ġcut Ġagain Ġon Ġher ĠSunday Ġpass . Ġ Ġ ĠGiven Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ8 ** ] Ġresistance Ġto Ġlooking Ġat Ġways Ġto Ġtry Ġto Ġimprove Ġand Ġher Ġefforts Ġto Ġtry Ġto Ġremain Ġon Ġthe Ġunit Ġto Ġsocial ize Ġwith Ġother Ġpatients ; Ġalthough Ġshe Ġwas Ġclearly Ġdistressed Ġit Ġwas Ġfelt Ġthat Ġhospital ization Ġfor Ġa Ġlonger Ġperiod Ġof Ġtime Ġwould Ġnot Ġbe Ġhelpful Ġand Ġwas Ġalready Ġbecoming Ġharmful . Ġ ĠINV EST IG ATIONS ĠIN ĠH OSP ITAL ĠCBC Ġwas Ġunrem arkable . Ġ ĠElectro ly tes Ġand Ġliver Ġfunction Ġwas Ġunrem arkable . Ġ ĠTotal Ġcholesterol Ġ4 . 17 , Ġtriglycer ide Ġ0 . 89 , ĠHDL Ġ1 . 26 , ĠLDL Ġ2 . 51 , ĠT SH Ġwas Ġin Ġthe Ġnormal Ġrange Ġat Ġ1 . 55 , Ġand Ġserum Ġinsulin Ġwas Ġnormal Ġat Ġ38 . Ġ ĠM ENT AL ĠSTAT US ĠON ĠDIS CHAR GE Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġpresented Ġas Ġa Ġwell Ġgroom ed Ġ14 - year - old Ġyoung Ġwoman Ġwith Ġbraces Ġwho Ġpresented Ġas Ġsomewhat Ġguarded , Ġand Ġunlike Ġother Ġinterviews Ġwas Ġnot Ġtear ful Ġthrough Ġthe Ġtime . Ġ ĠShe Ġwas Ġable Ġto Ġdescribe Ġyears Ġof Ġsuicidal Ġthoughts Ġand Ġreported Ġthat Ġher Ġrisk Ġof Ġsuicide Ġwas Ġhigher , Ġbut Ġit Ġdoes Ġnot Ġappear Ġthat Ġshe Ġhas Ġcome Ġclose Ġto Ġacting Ġon Ġher Ġthoughts , Ġand Ġthis Ġis Ġmore Ġof Ġa Ġchronic Ġpattern . Ġ Ġ ĠInterestingly , Ġshe Ġendorsed Ġthat , Ġunlike Ġthe Ġflu ox et ine , Ġher Ġdose Ġof Ġs ert ral ine Ġwas Ġquite Ġhelpful Ġto Ġmake Ġher Ġthoughts Ġclearer Ġand Ġto Ġcompletely Ġget Ġrid Ġof Ġsuicidal Ġthoughts , Ġand Ġthe Ġurge Ġto Ġcut . Ġ Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġreports Ġhaving Ġso Ġmany Ġyears Ġof Ġsuicidal Ġthoughts Ġand Ġcutting Ġthat Ġthis Ġchange Ġwas Ġunn erving Ġas Ġif Ġshe Ġno Ġlonger Ġknew Ġwho Ġshe Ġwas Ġif Ġshe Ġwas Ġnot Ġsuicidal , Ġand Ġshe Ġdescribed Ġtrying Ġto Ġforce Ġherself Ġto Ġthink Ġof Ġsuicide Ġand Ġforcing Ġherself Ġto Ġcut , Ġeven Ġthough Ġthere Ġis Ġno Ġlonger Ġrelease Ġfrom Ġdistress . Ġ Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġdescribes Ġongoing Ġdy sth ym ic Ġsymptoms , Ġthough Ġshe Ġis Ġclearly Ġenjoying Ġherself Ġon Ġthe Ġunit . ĠThis Ġinitial Ġclearing Ġof Ġthoughts Ġand Ġlight ening Ġof Ġmood Ġhas Ġnot Ġbeen Ġsustained . Ġ ĠDespite Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ8 ** ] Ġclear Ġongoing Ġreports Ġof Ġsu ic id ality Ġgoing Ġback Ġmany Ġyears . Ġ ĠIt Ġappears Ġthat Ġher Ġrisk Ġis Ġmoderate Ġfor Ġa Ġsuicide Ġattempt , Ġthis Ġis Ġchronic Ġand Ġshe Ġis Ġat Ġher Ġbaseline Ġlevel Ġof Ġrisk . ĠKeeping Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġin Ġhospital Ġcould Ġincrease Ġher Ġrisk Ġbut Ġcould Ġnot Ġdecrease Ġthis . Ġ ĠShe Ġalready Ġseemed Ġto Ġbe Ġreg ressing , Ġbecoming Ġinstitutional ized Ġand Ġwas Ġstarting Ġto Ġpick Ġup Ġsymptoms Ġfrom Ġother Ġpatients . Ġ ĠFORM UL ATION Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġis Ġa Ġ14 - year - old Ġyoung Ġwoman Ġwith Ġbiological Ġloading Ġfor Ġdepression Ġwho Ġhas Ġhad Ġmany , Ġmany Ġyears Ġof Ġlow - grade Ġlow Ġmood Ġfitting Ġwith Ġdy sth ym ia Ġwith Ġepis odic Ġdips Ġinto Ġdepression Ġbut Ġalso Ġborderline Ġpersonality Ġstructure Ġand Ġuse Ġof Ġcutting Ġas Ġa Ġway Ġof Ġself Ġsoothing . Ġ Ġ ĠUnfortunately , Ġit Ġis Ġonly Ġin Ġthe Ġlast Ġ2 Ġweeks Ġthat Ġshe Ġhas Ġconnected Ġwith Ġa Ġlong - term Ġtherapist Ġin Ġ[ ** Location Ġ( un ) Ġ6 ** ] Ġwho Ġwill Ġbe Ġproviding ĠD BT , Ġso Ġshe Ġhas Ġbeen Ġmanaged Ġmostly Ġwith Ġmedication Ġand Ġprivate Ġtherapy Ġthat Ġhas Ġnot Ġlead Ġto Ġsignificant Ġchange , Ġthough Ġshe Ġreports Ġs ert ral ine Ġhad Ġbeen Ġeffective Ġat Ġchanging Ġher Ġsymptoms Ġto Ġthe Ġpoint Ġthat Ġshe Ġwas Ġunn erved Ġand Ġseemed Ġto Ġbe Ġmaking Ġdeliberate Ġefforts Ġto Ġhave Ġher Ġsu ic id ality Ġcome Ġback . Ġ Ġ ĠWithin Ġthis Ġcontext , Ġit Ġis Ġunclear Ġwhy Ġover Ġthe Ġlast Ġweek Ġor Ġ2 Ġthere Ġis Ġpersistence Ġof Ġdepressed Ġmood Ġif Ġshe Ġhad Ġhad Ġan Ġearlier Ġresponse Ġto Ġs ert ral ine . Ġ ĠAlthough Ġshe Ġhas Ġhad Ġchronic Ġsuicidal Ġide ation , Ġshe Ġis Ġat Ġlow - to - moderate Ġrisk Ġchronically , Ġand Ġit Ġseems Ġto Ġkeep Ġher Ġin Ġhospital Ġwas Ġleading Ġto Ġregression Ġand Ġabsorbing Ġsymptoms Ġfrom Ġothers Ġso Ġa Ġlonger Ġhospital ization Ġwould Ġincrease Ġher Ġrisk Ġand Ġis Ġrelatively Ġcont rain d icated . Ġ ĠREC OM M END ATIONS ĠAs Ġpart Ġof Ġher Ġtreatment , Ġsince Ġshe Ġhad Ġan Ġearly Ġresponse Ġto Ġs ert ral ine , ĠDr . Ġ[ ** Last ĠName Ġ( ST itle ) Ġ5 ** ] Ġwas Ġincreasing Ġthe Ġdose Ġup Ġto Ġ200 Ġmg Ġin Ġa Ġstep wise Ġmanner . Ġ ĠAdditionally , Ġwe Ġhad Ġheard Ġthat Ġshe Ġis Ġjust Ġstarting Ġdialect ical Ġbehavior Ġtherapy Ġwithin Ġ[ ** Last ĠName Ġ( un ) Ġ1 ** ] Ġwhich Ġis Ġthe Ġtreatment Ġof Ġchoice Ġfor Ġthe Ġborderline Ġtraits . Ġ Ġ ĠWe Ġworked Ġwith Ġparents Ġto Ġstart Ġto Ġlook Ġat Ġhow Ġto Ġcome Ġup Ġwith Ġfamily Ġrules , Ġexpectations , Ġto Ġencourage Ġher Ġto Ġreturn Ġto Ġschool , Ġetc ., Ġand Ġthe Ġcomplex Ġinter play Ġbetween Ġtrying Ġto Ġkeep Ġher Ġsafe Ġwithout Ġbeing Ġintrusive . Ġ ĠI Ġsuspect Ġthere Ġwill Ġbe Ġa Ġlot Ġmore Ġwork Ġthat Ġwill Ġneed Ġto Ġbe Ġdone Ġin Ġfamily Ġtherapy Ġor Ġwith Ġthe Ġsupport Ġof Ġ[ ** Last ĠName Ġ( un ) Ġ1 ** ]. Ġ ĠLuckily Ġparents Ġor Ġalready Ġconnected Ġto Ġ[ ** Last ĠName Ġ( un ) Ġ9 ** ] ĠFamily ĠServices Ġfor Ġparenting Ġsupport . Ġ Ġ ĠWe Ġhave Ġworked Ġon Ġsafety Ġproof ing Ġthe Ġhome Ġand Ġa Ġreturn Ġto Ġher Ġtherapist Ġwith Ġwhom Ġshe Ġhas Ġan Ġappointment Ġtomorrow , Ġand Ġreturn Ġto Ġschool Ġas Ġsoon Ġas Ġpossible . Ġ Ġ ĠIn Ġaddition Ġto Ġthe Ġabove , Ġthere Ġwas Ġsuggestion Ġon Ġthe Ġadmission Ġnote Ġof Ġadding Ġin Ġa Ġsecond - generation Ġantip sych otics Ġto Ġaugment Ġher Ġs ert ral ine . Ġ ĠAt Ġthe Ġtime Ġof Ġdischarge , Ġit Ġdoes Ġnot Ġseem Ġthe Ġbest Ġtime Ġto Ġstart Ġthat Ġup ; Ġbut ĠI Ġ Ġspoke Ġwith Ġparents Ġabout Ġhow Ġif Ġl or az ep am Ġis Ġnot Ġeffective Ġas Ġa Ġp . r . n ., Ġwe Ġoften Ġgo Ġto Ġthe Ġolder Ġanti hist amines Ġsuch Ġas ĠGrav ol , Ġif Ġ Ġthat Ġis Ġineffective Ġor Ġnot Ġtolerated , Ġthen Ġthere Ġare Ġthe ĠSG As Ġas Ġwere Ġdiscussed Ġon Ġadmission . Ġ ĠAfter Ġreviewing Ġthe Ġpossible Ġbenefits Ġand Ġside Ġeffects Ġof ĠGrav ol , Ġthey Ġare Ġall Ġcomfortable Ġwith Ġgiving Ġthis Ġa Ġtry ; Ġand Ġparents Ġknow Ġthat Ġthey Ġcan Ġspeak Ġwith ĠDr . Ġ[ ** Last ĠName Ġ( ST itle ) Ġ5 ** ] Ġabout Ġother Ġtreatment Ġoptions Ġif Ġthis Ġis Ġineffective . Ġ Ġ ĠAs Ġa Ġteam Ġwe Ġrecognize Ġthat Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġis Ġdistressed , Ġand Ġunfortunately , Ġthere Ġis Ġno Ġeffective Ġtreatment Ġin Ġhospital , Ġand Ġshe Ġmay Ġbe Ġmade Ġworse Ġin Ġhospital Ġdespite Ġher Ġdegree Ġof Ġdistress ; Ġand Ġwe Ġexpect Ġin Ġthe Ġnext Ġshort Ġterm , Ġsince Ġshe Ġenjoyed Ġbeing Ġin Ġhospital Ġso Ġmuch , Ġshe Ġmay Ġbe Ġmaking Ġefforts Ġto Ġbe Ġread mitted . Ġ ĠRead mission Ġis Ġnot Ġabsolutely Ġcont rain d icated , Ġbut Ġif Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġpresents Ġto Ġthe Ġemergency Ġand Ġis Ġconsidered Ġfor Ġadmission , Ġthere Ġwould Ġhave Ġto Ġbe Ġa Ġclear Ġcondition Ġfor Ġwhich Ġthere Ġis Ġeffective Ġtreatment Ġin Ġhospital Ġas Ġwell Ġas Ġa Ġdiscussion Ġabout Ġthe Ġpotential Ġdangers Ġof Ġhospital ization Ġincluding Ġwhat Ġwe Ġhave Ġalready Ġobserved : Ġ[ ** Female ĠFirst ĠName Ġ( un ) Ġ7 ** ] Ġseeing Ġherself Ġas Ġan Ġinstitutional ized Ġpatient Ġand Ġstrongly Ġaffili ating Ġwith Ġother Ġborderline Ġyouth . Ġ Ġ ĠOverall Ġwe Ġbelieve Ġthat Ġthe Ġdialect ical Ġbehavior Ġtherapy Ġis Ġthe Ġtreatment Ġof Ġchoice ; Ġand Ġassuming Ġshe Ġcan Ġstick Ġwith Ġthis , Ġshe Ġshould Ġbe Ġfeeling Ġmuch Ġmore Ġstable Ġand Ġsettled Ġin Ġthe Ġnext Ġ6 Ġmonths Ġor Ġso . Ġ Ġ Ġ[ ** Sign ature Ġ10 ** ] ĠD ict ated ĠBy : Ġ Ġ[ ** First ĠName 11 Ġ( Name ĠPattern 1 ) Ġ11 ** ] Ġ[ ** Initial Ġ( Name Pattern 1 ) Ġ12 ** ] Ġ[ ** Last ĠName Ġ( Name Pattern 1 ) Ġ13 ** ], ĠMD ĠPsychiatry Ġ ĠAW F / MOD L ĠJob Ġ# : Ġ Ġ14 95 50 ĠDoc Ġ# : Ġ Ġ33 28 10 86 ĠD : Ġ Ġ20 / 11 / 2017 Ġ14 : 13 : 47 ĠT : Ġ Ġ20 / 11 / 2017 Ġ15 : 18 : 22 Ġ|| || END _ OF _ REC ORD Ġ #/s"
,,,,


## Further Examination of the Attributions

Next we might want to look in-depth about the attribution scores for each token of an example. We saved the attributions for the examples we looked at above, so we can easily retrieve the attributions. We also grab the examples because we want to know what tokens the attributions are associated with.

Both lists are of shape: (seq_len)

In [None]:
example = 3
attributions_sum = all_attributions[f"{example}"]
all_tokens2 = all_tokens[f"{example}"]

These functions return which words had the strongest (most positive and most negative) attributions. Change the number of tokens you wish to visualize for your needs. It takes in the attributions and the tokens we grabbed in the previous cell and returns 3 lists: the topk (or bottomk) attributions, their respective token and their position.

Note: Remember that the attributions are with respect to the positive class, so the most impact tokens that helped the model predict the negative class will be in the botk attributed tokens.

In [None]:
def get_topk_attributed_tokens(attrs, all_tokens, k=20):
    values, indices = torch.topk(attrs, k)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

In [None]:
def get_botk_attributed_tokens(attrs, all_tokens, k=20):
    values, indices = torch.topk(attrs, k, largest=False)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

Convert the values, index of the values, and the token into a pandas Dataframe for visualization. It will be sorted by highest value for attributions to lowest. Alternatively, if youre looking for the most negative attributions, it goes from lowest to highest.

In [None]:
top_words_start, top_words_val_start, top_word_ind_start = get_topk_attributed_tokens(attributions_sum, all_tokens2)
bot_words_start, bot_words_val_start, bot_word_ind_start = get_botk_attributed_tokens(attributions_sum, all_tokens2)

df_high = pd.DataFrame({'Word(Index), Attribution': ["{} ({}), {}".format(word, pos, round(val.item(),2)) for word, pos, val in zip(top_words_start, top_word_ind_start, top_words_val_start)]})

df_low = pd.DataFrame({'Word(Index), Attribution': ["{} ({}), {}".format(word, pos, round(val.item(),2)) for word, pos, val in zip(bot_words_start, bot_word_ind_start, bot_words_val_start)]})
# df_start.style.apply(['cell_ids: False'])

# ['{}({})'.format(token, str(i)) for i, token in enumerate(all_tokens)]

Here we display our top k positively and negatively attributed tokens for our example.

In [None]:
df_high

Unnamed: 0,"Word(Index), Attribution"
0,"Ġ11 (2106), 0.07"
1,"Continue (324), 0.06"
2,"Ġchronic (1163), 0.06"
3,"Ġmore (956), 0.06"
4,"Ġable (920), 0.06"
5,"Ġongoing (1136), 0.06"
6,": (2011), 0.06"
7,"Ġmore (2069), 0.06"
8,", (976), 0.06"
9,"Ġdegree (1911), 0.06"


In [None]:
df_low

Unnamed: 0,"Word(Index), Attribution"
0,"51 (841), -0.07"
1,"Ġmake (988), -0.07"
2,"Ġdescribe (922), -0.07"
3,"Ġrange (849), -0.06"
4,"Ġeffective (1896), -0.06"
5,"Ġeffective (1989), -0.06"
6,"Ġeffective (1358), -0.06"
7,"Ġthe (1919), -0.05"
8,"Ġby (440), -0.05"
9,"Ġthe (1715), -0.05"


In [None]:
d = {"tokens":all_tokens2, "attribution":attributions_sum[:len(all_tokens2)].cpu()}

We notice that there are many repeating tokens in each example that have different positions. While we might want to know how the position plays into the attributions, if we want to know strictly based on the tokens itself, we can add all the duplicate tokens together to get the aggregate attribution for each token. Therefore, we aggregate the attributions strictly based on token type.

In [None]:
df_attrib = pd.DataFrame(d)
aggregation_functions = {'attribution': 'sum'}
df_new = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_functions)

In [None]:
highest_attrib_tokens = df_new.sort_values(by=['attribution'], ascending=False)
highest_attrib_tokens[:15]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
",",2.265135
Ġ,1.065322
),0.989281
Ġto,0.972879
Ġ(,0.873284
Ġwith,0.688382
Ġand,0.682288
Ġof,0.659233
**,0.65016
:,0.594274


In [None]:
lowest_attrib_tokens = df_new.sort_values(by=['attribution'])
lowest_attrib_tokens[:15]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
Ġthe,-1.141583
],-0.7925
Ġeffective,-0.216621
Ġif,-0.202163
Ġtherapy,-0.185965
Female,-0.177749
Ġit,-0.166456
ST,-0.155044
Ġwould,-0.135821
CHAR,-0.13159


Using this [notebook](https://colab.research.google.com/drive/1lktilbL1IY4nBanlzCdP8TLsBNfUsl_U?usp=sharing), we can get the files to view the aggregated attributions for the entire dataset for both the positive and negative classes. This means we summed up and averaged the attributions for every instance of any given token throughout the entire dataset (whether or not they have positive or negative attributions).

In [None]:
df_word = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/longformer_emb_papers.csv")

Here we see the highest attributions for the positive class, meaning that these tokens have the most influence when the model tries to predict positive. All of these words do have relevence to A.I. related topics.

In [None]:
df_word[:15]

Unnamed: 0,tokens,attribution
0,Ġlearning,0.163092
1,.,0.145281
2,Ġneural,0.110611
3,Ġdata,0.097347
4,",",0.077573
5,Ġthe,0.072926
6,Ġtraining,0.052609
7,Ġdataset,0.050907
8,Ġalgorithms,0.048352
9,ĠAI,0.045684


Here we see the largest attributions for the negative class, meaning that these tokens have the most influence when the model predicts negative.

In [None]:
df_word[:-15:-1]

Unnamed: 0,tokens,attribution
30061,Ġprogramming,-0.121651
30060,Ġprogram,-0.085085
30059,Ġprograms,-0.078384
30058,Ġlanguages,-0.070023
30057,Ġlanguage,-0.054024
30056,Ġ.,-0.053213
30055,Ġcode,-0.049736
30054,Ġsoftware,-0.037241
30053,Ġcompiler,-0.030792
30052,ĠProgramming,-0.029799
