<h1 style="padding-top: 25px;padding-bottom: 25px;text-align: left; padding-left: 10px; background-color: #DDDDDD; 
    color: black;"> <img style="float: left; padding-right: 10px; width: 45px" src="https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/iacs.png"> AC295: Advanced Practical Data Science </h1>

## Project: News Analytics for Stock Return Prediction

**Harvard University, Fall 2020**  
**Instructors**: Pavlos Protopapas  

### **Team: $\alpha\beta normal$ $Distri\beta ution$**
#### **Rohit Beri, Eduardo Peynetti, Jessica Wijaya, Stuart Neilson**



This notebook details the process of building a visualizations for Bert and Finbert models, to compare the attentions given by Bert vs FinBert on the same set of news article summaries.

## Imports

In [None]:
!pip install transformers
!pip install captum


Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/ed/db/98c3ea1a78190dac41c0127a063abf92bd01b4b0b6970a6db1c2f5b66fa0/transformers-4.0.1-py3-none-any.whl (1.4MB)
[K     |████████████████████████████████| 1.4MB 13.0MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 45.4MB/s 
Collecting tokenizers==0.9.4
[?25l  Downloading https://files.pythonhosted.org/packages/0f/1c/e789a8b12e28be5bc1ce2156cf87cb522b379be9cadc7ad8091a4cc107c4/tokenizers-0.9.4-cp36-cp36m-manylinux2010_x86_64.whl (2.9MB)
[K     |████████████████████████████████| 2.9MB 40.4MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893261 sha256=9c66dee249b03665fe

In [None]:
import os
import sys

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import torch
import torch.nn as nn

from transformers import BertTokenizer, BertForQuestionAnswering, BertConfig, BertForSequenceClassification

from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import torch.nn as nn

from transformers import BertTokenizer, AutoTokenizer, AutoModel
from transformers import BertForSequenceClassification, BertConfig

from captum.attr import InterpretableEmbeddingBase, TokenReferenceBase
from captum.attr import visualization


In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Utils

In [None]:
# utils
def predict(inputs, token_type_ids=None, position_ids=None, attention_mask=None):
    return model(inputs, token_type_ids=token_type_ids,
                 position_ids=position_ids, attention_mask=attention_mask)

def custom_forward(inputs, token_type_ids=None, position_ids=None, attention_mask=None):
    pred = predict(inputs,
                   token_type_ids=token_type_ids,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return torch.softmax(pred[0], dim=1)[:,-1] #for positive sentiment

def construct_input_ref_pair(question, ref_token_id, sep_token_id, cls_token_id):
    question_ids = tokenizer.encode(question, add_special_tokens=False)

    # construct input token ids
    input_ids = [cls_token_id] + question_ids + [sep_token_id] 
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(question_ids) + [sep_token_id] 

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(question_ids)

def construct_input_ref_token_type_pair(input_ids, sep_ind=0):
    seq_len = input_ids.size(1)
    token_type_ids = torch.tensor([[0 if i <= sep_ind else 1 for i in range(seq_len)]], device=device)
    ref_token_type_ids = torch.zeros_like(token_type_ids, device=device)# * -1
    return token_type_ids, ref_token_type_ids

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)
    position_ids = torch.arange(seq_length, dtype=torch.long, device=device)
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    return attributions

In [None]:
def visualize(news, true_label):
  input_ids, ref_input_ids, sep_id = construct_input_ref_pair(news, ref_token_id, sep_token_id, cls_token_id)
  token_type_ids, ref_token_type_ids = construct_input_ref_token_type_pair(input_ids, sep_id)
  position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
  attention_mask = construct_attention_mask(input_ids)

  indices = input_ids[0].detach().tolist()
  all_tokens = tokenizer.convert_ids_to_tokens(indices)

  output = predict(input_ids, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask)
  sentiment_idx = torch.argmax(output[0])
  # print('News: ', news)
  # print('Predicted Sentiment: ', classes[sentiment_idx])

  lig = LayerIntegratedGradients(custom_forward, model.bert.embeddings)

  attributions,delta= lig.attribute(inputs=input_ids,
                                    baselines=ref_input_ids,
                                    additional_forward_args=(token_type_ids, position_ids, attention_mask),
                                    return_convergence_delta=True)

  attributions_sum = summarize_attributions(attributions)

  sentiment_vis = viz.VisualizationDataRecord(
                          attributions_sum,
                          torch.max(torch.softmax(output[0], dim=1)),
                          classes[torch.argmax(output[0])],
                          true_label,
                          str(classes[sentiment_idx.item()]),
                          attributions_sum.sum(),       
                          all_tokens,
                          delta)

  print('\033[1m', 'Visualization of attention for the positive sentiment', '\033[0m')
  viz.visualize_text([sentiment_vis])


## Visualize Texts

### Using Pretrained **Finbert** Model

In [None]:
# load model
tokenizer = BertTokenizer.from_pretrained('ipuneetrathore/bert-base-cased-finetuned-finBERT')
model = BertForSequenceClassification.from_pretrained('ipuneetrathore/bert-base-cased-finetuned-finBERT').to(device)
model.eval()
model.zero_grad()


ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence


In [None]:
classes = ['negative', 'neutral', 'positive']

visualize("Oil prices fell on Tuesday amid concerns that a possible rise in Covid-19 cases following the U.S. Labor Day long weekend, which also marks the end of the peak U.S. driving season, could squeeze demand for fuel.",
          true_label= 'negative')
print("\n")
visualize("Shares of the CPU and GPU developer have more than doubled in 2019, as Wall Street gives a thumbs-up to its product launches and share gains.",
          true_label='positive')
print("\n")
visualize("Basware Corporation stock exchange release August 31 , 2010 at 16:25 Basware signed a large deal with an international industrial group Basware will deliver Invoice Automation solution and Connectivity Services to an international industrial group", 
          true_label='positive')
print("\n")
visualize("Terra Lycos the global Internet Group, and Google Inc. developer of the largest performance-based advertising program, announced a multi-year agreement making contextually-targeted advertisements through the Google AdSense(TM) program available on selected sites throughout the Terra Lycos Network.", 
          true_label='positive')
print("\n")
visualize('A tinyurl link takes users to a scamming site promising that users can earn thousands of dollars by becoming a Google ( NASDAQ : GOOG ) Cash advertiser .', 
          true_label='negative')


[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
negative,negative (1.00),negative,-1.48,"[CLS] Oil prices fell on Tuesday amid concerns that a possible rise in Co ##vid - 19 cases following the U . S . Labor Day long weekend , which also marks the end of the peak U . S . driving season , could squeeze demand for fuel . [SEP]"
,,,,




[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
positive,positive (1.00),positive,1.09,"[CLS] S ##hare ##s of the CPU and GP ##U developer have more than doubled in 2019 , as Wall Street gives a thumbs - up to its product launches and share gains . [SEP]"
,,,,




[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
positive,positive (1.00),positive,1.81,"[CLS] Ba ##s ##ware Corporation stock exchange release August 31 , 2010 at 16 : 25 Ba ##s ##ware signed a large deal with an international industrial group Ba ##s ##ware will deliver In ##vo ##ice Auto ##mation solution and Con ##nect ##ivity Services to an international industrial group [SEP]"
,,,,




[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
positive,positive (1.00),positive,2.23,"[CLS] Terra L ##y ##cos the global Internet Group , and Google Inc . developer of the largest performance - based advertising program , announced a multi - year agreement making context ##ually - targeted advertisements through the Google Ad ##S ##ense ( T ##M ) program available on selected sites throughout the Terra L ##y ##cos Network . [SEP]"
,,,,




[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
negative,negative (1.00),negative,-1.11,[CLS] A tiny ##ur ##l link takes users to a s ##cam ##ming site promising that users can earn thousands of dollars by becoming a Google ( NAS ##DA ##Q : GO ##O ##G ) Cash ad ##vert ##iser . [SEP]
,,,,


### Using Pretrained **Bert Base** Model

In [None]:
# load model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased').to(device)
model.eval()
model.zero_grad()


ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [None]:
classes = ['negative', 'positive']

visualize("Oil prices fell on Tuesday amid concerns that a possible rise in Covid-19 cases following the U.S. Labor Day long weekend, which also marks the end of the peak U.S. driving season, could squeeze demand for fuel.",
          true_label= 'negative')
print("\n")
visualize("Shares of the CPU and GPU developer have more than doubled in 2019, as Wall Street gives a thumbs-up to its product launches and share gains.",
          true_label='positive')
print("\n")
visualize("Basware Corporation stock exchange release August 31 , 2010 at 16:25 Basware signed a large deal with an international industrial group Basware will deliver Invoice Automation solution and Connectivity Services to an international industrial group", 
          true_label='positive')
print("\n")
visualize("Terra Lycos the global Internet Group, and Google Inc. developer of the largest performance-based advertising program, announced a multi-year agreement making contextually-targeted advertisements through the Google AdSense(TM) program available on selected sites throughout the Terra Lycos Network.", 
          true_label='positive')
print("\n")
visualize('A tinyurl link takes users to a scamming site promising that users can earn thousands of dollars by becoming a Google ( NASDAQ : GOOG ) Cash advertiser .', 
          true_label='negative')


[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
negative,positive (0.69),positive,1.73,"[CLS] oil prices fell on tuesday amid concerns that a possible rise in co ##vid - 19 cases following the u . s . labor day long weekend , which also marks the end of the peak u . s . driving season , could squeeze demand for fuel . [SEP]"
,,,,




[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
positive,positive (0.66),positive,1.22,"[CLS] shares of the cpu and gp ##u developer have more than doubled in 2019 , as wall street gives a thumbs - up to its product launches and share gains . [SEP]"
,,,,




[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
positive,positive (0.64),positive,0.45,"[CLS] bas ##ware corporation stock exchange release august 31 , 2010 at 16 : 25 bas ##ware signed a large deal with an international industrial group bas ##ware will deliver in ##vo ##ice automation solution and connectivity services to an international industrial group [SEP]"
,,,,




[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
positive,positive (0.63),positive,-0.33,"[CLS] terra l ##y ##cos the global internet group , and google inc . developer of the largest performance - based advertising program , announced a multi - year agreement making context ##ually - targeted advertisements through the google ads ##ense ( t ##m ) program available on selected sites throughout the terra l ##y ##cos network . [SEP]"
,,,,




[1m Visualization of attention for the positive sentiment [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
negative,positive (0.65),positive,0.09,[CLS] a tiny ##ur ##l link takes users to a sc ##am ##ming site promising that users can earn thousands of dollars by becoming a google ( nas ##da ##q : goo ##g ) cash ad ##vert ##iser . [SEP]
,,,,
