# FinBERT Example Notebook

This notebooks shows how to train and use the FinBERT pre-trained language model for financial sentiment analysis.

## Modules 

In [1]:
from pathlib import Path
import shutil
import os
import logging
import sys
sys.path.append('..')

from textblob import TextBlob
from pprint import pprint
from sklearn.metrics import classification_report

from transformers import AutoModelForSequenceClassification

from finbert.finbert import *
import finbert.utils as tools

%load_ext autoreload
%autoreload 2

project_dir = Path.cwd().parent
pd.set_option('max_colwidth', -1)

In [2]:
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt = '%m/%d/%Y %H:%M:%S',
                    level = logging.ERROR)

## Prepare the model

### Setting path variables:
1. `lm_path`: the path for the pre-trained language model (If vanilla Bert is used then no need to set this one).
2. `cl_path`: the path where the classification model is saved.
3. `cl_data_path`: the path of the directory that contains the data files of `train.csv`, `validation.csv`, `test.csv`.
---

In the initialization of `bertmodel`, we can either use the original pre-trained weights from Google by giving `bm = 'bert-base-uncased`, or our further pre-trained language model by `bm = lm_path`


---
All of the configurations with the model is controlled with the `config` variable. 

In [19]:
lm_path = project_dir/'models'/'language_model'/'HF'
cl_path = project_dir/'models'/'classifier_model'/'finbert-sentiment-ari'
cl_data_path = project_dir/'data'/'sentiment_data'

### Get predictions

With the `predict` function, given a piece of text, we split it into a list of sentences and then predict sentiment for each sentence. The output is written into a dataframe. Predictions are represented in three different columns: 

1) `logit`: probabilities for each class

2) `prediction`: predicted label

3) `sentiment_score`: sentiment score calculated as: probability of positive - probability of negative

Below we analyze a paragraph taken out of [this](https://www.economist.com/finance-and-economics/2019/01/03/a-profit-warning-from-apple-jolts-markets) article from The Economist. For comparison purposes, we also put the sentiments predicted with TextBlob.
> Later that day Apple said it was revising down its earnings expectations in the fourth quarter of 2018, largely because of lower sales and signs of economic weakness in China. The news rapidly infected financial markets. Apple’s share price fell by around 7% in after-hours trading and the decline was extended to more than 10% when the market opened. The dollar fell by 3.7% against the yen in a matter of minutes after the announcement, before rapidly recovering some ground. Asian stockmarkets closed down on January 3rd and European ones opened lower. Yields on government bonds fell as investors fled to the traditional haven in a market storm.

In [20]:
text = "Later that day Apple said it was revising down its earnings expectations in \
the fourth quarter of 2018, largely because of lower sales and signs of economic weakness in China. \
The news rapidly infected financial markets. Apple’s share price fell by around 7% in after-hours \
trading and the decline was extended to more than 10% when the market opened. The dollar fell \
by 3.7% against the yen in a matter of minutes after the announcement, before rapidly recovering \
some ground. Asian stockmarkets closed down on January 3rd and European ones opened lower. \
Yields on government bonds fell as investors fled to the traditional haven in a market storm."

In [79]:
cl_path = project_dir/'models'/'sentiment'/'finbert-sentiment_20211104'
model = AutoModelForSequenceClassification.from_pretrained(cl_path, cache_dir=None, num_labels=3)

In [72]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to C:\Users\Ari
[nltk_data]     Shater\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [80]:
result = predict(text,model)

11/04/2021 11:16:35 - INFO - finbert.utils -   *** Example ***
11/04/2021 11:16:35 - INFO - finbert.utils -   guid: 0
11/04/2021 11:16:35 - INFO - finbert.utils -   tokens: [CLS] later that day apple said it was rev ##ising down its earnings expectations in the fourth quarter of 2018 , largely because of lower sales and signs of economic weakness in china . [SEP]
11/04/2021 11:16:35 - INFO - finbert.utils -   input_ids: 101 2101 2008 2154 6207 2056 2009 2001 7065 9355 2091 2049 16565 10908 1999 1996 2959 4284 1997 2760 1010 4321 2138 1997 2896 4341 1998 5751 1997 3171 11251 1999 2859 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/04/2021 11:16:35 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/04/2021 11:16:35 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

In [81]:
blob = TextBlob(text)
result['textblob_prediction'] = [sentence.sentiment.polarity for sentence in blob.sentences]
result

Unnamed: 0,sentence,logit,prediction,sentiment_score,textblob_prediction
0,"Later that day Apple said it was revising down its earnings expectations in the fourth quarter of 2018, largely because of lower sales and signs of economic weakness in China.","[0.0033345295, 0.98776615, 0.008899213]",negative,-0.984432,0.051746
1,The news rapidly infected financial markets.,"[0.008665337, 0.96750736, 0.023827318]",negative,-0.958842,0.0
2,Apple’s share price fell by around 7% in after-hours trading and the decline was extended to more than 10% when the market opened.,"[0.0026462073, 0.99120516, 0.0061486457]",negative,-0.988559,0.5
3,"The dollar fell by 3.7% against the yen in a matter of minutes after the announcement, before rapidly recovering some ground.","[0.007249232, 0.9876209, 0.0051299236]",negative,-0.980372,0.0
4,Asian stockmarkets closed down on January 3rd and European ones opened lower.,"[0.0038619624, 0.9161373, 0.08000076]",negative,-0.912275,-0.051111
5,Yields on government bonds fell as investors fled to the traditional haven in a market storm.,"[0.0039962516, 0.9946044, 0.0013992615]",negative,-0.990608,0.0


In [82]:
print(f'Average sentiment is %.2f.' % (result.sentiment_score.mean()))

Average sentiment is -0.97.


Here is another example

In [35]:
text2 = "Shares in the spin-off of South African e-commerce group Naspers surged more than 25% \
in the first minutes of their market debut in Amsterdam on Wednesday. Bob van Dijk, CEO of \
Naspers and Prosus Group poses at Amsterdam's stock exchange, as Prosus begins trading on the \
Euronext stock exchange in Amsterdam, Netherlands, September 11, 2019. REUTERS/Piroschka van de Wouw \
Prosus comprises Naspers’ global empire of consumer internet assets, with the jewel in the crown a \
31% stake in Chinese tech titan Tencent. There is 'way more demand than is even available, so that’s \
good,' said the CEO of Euronext Amsterdam, Maurice van Tilburg. 'It’s going to be an interesting \
hour of trade after opening this morning.' Euronext had given an indicative price of 58.70 euros \
per share for Prosus, implying a market value of 95.3 billion euros ($105 billion). The shares \
jumped to 76 euros on opening and were trading at 75 euros at 0719 GMT."

In [83]:
result2 = predict(text2,model)
blob = TextBlob(text2)
result2['textblob_prediction'] = [sentence.sentiment.polarity for sentence in blob.sentences]

11/04/2021 11:18:14 - INFO - finbert.utils -   *** Example ***
11/04/2021 11:18:14 - INFO - finbert.utils -   guid: 0
11/04/2021 11:18:14 - INFO - finbert.utils -   tokens: [CLS] shares in the spin - off of south african e - commerce group nas ##pers surged more than 25 % in the first minutes of their market debut in amsterdam on wednesday . [SEP]
11/04/2021 11:18:14 - INFO - finbert.utils -   input_ids: 101 6661 1999 1996 6714 1011 2125 1997 2148 3060 1041 1011 6236 2177 17235 7347 18852 2062 2084 2423 1003 1999 1996 2034 2781 1997 2037 3006 2834 1999 7598 2006 9317 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/04/2021 11:18:14 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/04/2021 11:18:14 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

In [84]:
result2

Unnamed: 0,sentence,logit,prediction,sentiment_score,textblob_prediction
0,Shares in the spin-off of South African e-commerce group Naspers surged more than 25% in the first minutes of their market debut in Amsterdam on Wednesday.,"[0.9920455, 0.004372825, 0.0035817088]",positive,0.987673,0.25
1,"Bob van Dijk, CEO of Naspers and Prosus Group poses at Amsterdam's stock exchange, as Prosus begins trading on the Euronext stock exchange in Amsterdam, Netherlands, September 11, 2019.","[0.0037315853, 0.0064404076, 0.989828]",neutral,-0.002709,0.0
2,"REUTERS/Piroschka van de Wouw Prosus comprises Naspers’ global empire of consumer internet assets, with the jewel in the crown a 31% stake in Chinese tech titan Tencent.","[0.0032723187, 0.0016629169, 0.9950648]",neutral,0.001609,0.0
3,"There is 'way more demand than is even available, so that’s good,' said the CEO of Euronext Amsterdam, Maurice van Tilburg.","[0.98820764, 0.0028737478, 0.008918695]",positive,0.985334,0.533333
4,'It’s going to be an interesting hour of trade after opening this morning.',"[0.014273786, 0.004459984, 0.98126626]",neutral,0.009814,0.5
5,"Euronext had given an indicative price of 58.70 euros per share for Prosus, implying a market value of 95.3 billion euros ($105 billion).","[0.0059209173, 0.00090549554, 0.99317354]",neutral,0.005015,0.0
6,The shares jumped to 76 euros on opening and were trading at 75 euros at 0719 GMT.,"[0.21073681, 0.002572727, 0.7866905]",neutral,0.208164,0.0


In [85]:
print(f'Average sentiment is %.2f.' % (result2.sentiment_score.mean()))

Average sentiment is 0.31.


In [86]:

text3 = "The hedge fund traders watched as a \
nightmare scenario played out in the world’s bonds markets.  \
From Australia to the U.K. to the U.S., government bond \
yields abruptly moved against them last week amid growing \
speculation that central banks will accelerate plans for raising \
interest rates in the face of persistent inflation. The losses \
piled up -- and for a few became so big that the firms halted \
some trading to contain the damage.  \
Balyasny Asset Management, BlueCrest Capital Management and \
ExodusPoint Capital Management each curtailed the betting of two \
to four traders after they hit maximum loss levels, according to \
people with knowledge of the matter, who asked not to be \
identified because the information is private. That step stopped \
traders from changing their positions, an extraordinary risk- \
management move used so firms can reassess trades or unwind \
them. \
ExodusPoint lost about $400 million last month, leaving it \
down 2% in October, people said. The fund is still up 2.8% year- \
to-date. \
Millennium Management also suffered amid the tumult and is \
continuing to monitor its macro portfolio managers’ trades, \
people said. Meanwhile, Point72 Asset Management’s macro \
business was said to see some losses from the bond-market moves. \
The hits show how even some of the most sophisticated \
traders have been caught flat-footed by the rapid shift in \
sentiment that has raced through markets. It’s unclear how much \
the losses will drag down the hedge funds’ returns, and they \
could be offset by the stock rally that’s driven the S&P 500 to \
new record highs. \
Representatives for the firms declined to comment.  \
Many hedge funds had been betting that central banks would \
be slow to raise interest rates, seeing the surge in consumer \
prices as a temporary side-effect of the pandemic.  \
But that view has been challenged as hawkish comments from \
the Bank of England cemented expectations for a rate hike, the \
Bank of Canada shut down its bond buying program and Australian \
policymakers abandoned a key short-term yield target. In the \
U.S., where the Federal Reserve is widely expected Wednesday to \
announce plans for winding down its bond purchases, markets are \
now pricing in two rate hikes by December 2022. \
That has upended trades betting that the gap between short- \
and long-term bond yields would widen, which would happen if \
markets expected growth and inflation to accelerate in the face \
of loose monetary policy. Instead, that gap narrowed. \
Read more: Bond Market Dares Fed to Defy It After Bloody \
Week for Investors \
In the U.S., the difference between two- and 10-year \
Treasury yields flattened by 13 basis points Wednesday, marking \
one of the biggest one-day moves in the yield curve of the past \
two decades, according to Cornerstone Macro’s estimates. That \
came as the two-year Treasury yield almost doubled last month to \
about 0.5%. An upward move hadn’t approached that degree since \
December 2009, when the two-year yield jumped to about 1.14% \
from 0.66%. \
The volatility this year has led to heavy losses for some \
of the best known macro traders in the world. Billionaire Chris \
Rokos’s hedge fund losses worsened to 20% through Oct. 22 this \
year, in part because of wagers that the yield curve would \
steepen in the U.K. and U.S. Alphadyne Asset Management, which \
has never had a down year since it started up in 2006, had lost \
13% during the period.  \
Another hedge fund that suffered heavy losses was interest- \
rate-focused Frost Asset Management, which slumped almost 18% \
last month largely due to the sharp rise in Swedish short-term \
interest rates and flatter yield curves, according to the fund’s \
backer Brummer & Partners AB."


In [87]:
#text3 = text3.replace('\n',' ')
result3 = predict(text3,model)
blob = TextBlob(text3)
result3['textblob_prediction'] = [sentence.sentiment.polarity for sentence in blob.sentences]

11/04/2021 11:18:24 - INFO - finbert.utils -   *** Example ***
11/04/2021 11:18:24 - INFO - finbert.utils -   guid: 0
11/04/2021 11:18:24 - INFO - finbert.utils -   tokens: [CLS] the hedge fund traders watched as a nightmare scenario played out in the world ’ s bonds markets . [SEP]
11/04/2021 11:18:24 - INFO - finbert.utils -   input_ids: 101 1996 17834 4636 13066 3427 2004 1037 10103 11967 2209 2041 1999 1996 2088 1521 1055 9547 6089 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/04/2021 11:18:24 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/04/2021 11:18:24 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/04/2021 11:18:24 - INFO - finbert.utils -   label: None (id = 9090)
11/04/2021 1

In [88]:
result3

Unnamed: 0,sentence,logit,prediction,sentiment_score,textblob_prediction
0,The hedge fund traders watched as a nightmare scenario played out in the world’s bonds markets.,"[0.0108164, 0.21103124, 0.7781524]",neutral,-0.200215,0.0
1,"From Australia to the U.K. to the U.S., government bond yields abruptly moved against them last week amid growing speculation that central banks will accelerate plans for raising interest rates in the face of persistent inflation.","[0.0036933662, 0.9883773, 0.007929306]",negative,-0.984684,-0.041667
2,The losses piled up -- and for a few became so big that the firms halted some trading to contain the damage.,"[0.0053127944, 0.99111396, 0.003573179]",negative,-0.985801,-0.1
3,"Balyasny Asset Management, BlueCrest Capital Management and ExodusPoint Capital Management each curtailed the betting of two to four traders after they hit maximum loss levels, according to people with knowledge of the matter, who asked not to be identified because the information is private.","[0.0038873581, 0.9824223, 0.013690379]",negative,-0.978535,0.0
4,"That step stopped traders from changing their positions, an extraordinary risk- management move used so firms can reassess trades or unwind them.","[0.0088688545, 0.9849134, 0.006217668]",negative,-0.976045,0.333333
5,"ExodusPoint lost about $400 million last month, leaving it down 2% in October, people said.","[0.0024979883, 0.98882663, 0.008675445]",negative,-0.986329,-0.077778
6,The fund is still up 2.8% year- to-date.,"[0.9834394, 0.0023719133, 0.0141887255]",positive,0.981067,0.0
7,"Millennium Management also suffered amid the tumult and is continuing to monitor its macro portfolio managers’ trades, people said.","[0.004146723, 0.9259216, 0.069931716]",negative,-0.921775,0.0
8,"Meanwhile, Point72 Asset Management’s macro business was said to see some losses from the bond-market moves.","[0.0025855629, 0.99376655, 0.0036478993]",negative,-0.991181,0.0
9,The hits show how even some of the most sophisticated traders have been caught flat-footed by the rapid shift in sentiment that has raced through markets.,"[0.0047986354, 0.11334432, 0.88185704]",neutral,-0.108546,0.5


In [89]:
print(f'Average sentiment is %.2f.' % (result3.sentiment_score.mean()))

Average sentiment is -0.53.
