# 7CCMFM18 Machine Learning
King's College London <br>
Academic year 2022-2023 <br>
Lecturer: Mario Martone

## Sentiment analysis in finance
First version: <i>29th March 2023</i>

FinBERT is a pretrained model on several financial NLP tasks, all outperforming traditional machine learning models, deep learning models, and fine-tuned BERT models.

All the fine-tuned FinBERT models are publicly hosted at Huggingface 🤗. Here we will look at two specific instances:
- **FinBERT-Sentiment**: for sentiment classification task
- **FinBERT-FLS**: for forward-looking statement (FLS) classification task

## Import the transformers:

First import the pre-trained model:

In [1]:
from transformers import BertTokenizer, BertForSequenceClassification, pipeline

2023-03-29 14:59:58.620296: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# tested in transformers==4.18.0 
import transformers
transformers.__version__

'4.26.1'

## Sentiment Analysis
Analyzing financial text sentiment is valuable as it can engage the views and opinions of managers, information intermediaries and investors. FinBERT-Sentiment is a FinBERT model fine-tuned on 10,000 manually annotated sentences from analyst reports of S&P 500 firms.

**Input**: A financial text.

**Output**: Positive, Neutral or Negative.

Import FinBERT:

In [3]:
finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')

And let's see a small demo:

In [4]:
nlp = pipeline("text-classification", model=finbert, tokenizer=tokenizer)
results = nlp(['growth is strong and we have plenty of liquidity.', 
               'there is a shortage of capital, and we need extra financing.',
              'formulation patents might protect Vasotec to a limited extent.'])

In [5]:
results

[{'label': 'Positive', 'score': 1.0},
 {'label': 'Negative', 'score': 0.9952379465103149},
 {'label': 'Neutral', 'score': 0.9979718327522278}]

For the rest of the homework:

1. Download and import the sentiment_analysis dataset.
2. Using FinBERT compute the label for each entry of the dataset.
3. Compute the f1 score using as y_true the label which come with dataset and as y_pred the predictions from FinBERT.

## Import dataset:

Let's import pandas to manage our dataset as well as the f1 score from sklearn

In [6]:
import pandas as pd
from sklearn.metrics import f1_score

To run the lines below, you should replace "dataset_dir" with the name of the folder you have downloaded the dataset in and "dataset_file" with the name of the file (default is sentiment_analysis.txt).

In [14]:
dataset_dir = 'NLP_finance/'
dataset_file = 'sentiment_analysis.txt'
finance_df = pd.read_csv(dataset_dir+dataset_file,
                     sep='\@',
                     header=None,
                     names=['sentence','label'])

  finance_df = pd.read_csv(dataset_dir+dataset_file,


The dataset is now imported as panda dataframe which is an extremely handy format!

In [16]:
finance_df

Unnamed: 0,sentence,label
0,"According to Gran , the company has no plans t...",neutral
1,With the new production plant the company woul...,positive
2,"For the last quarter of 2010 , Componenta 's n...",positive
3,"In the third quarter of 2010 , net sales incre...",positive
4,Operating profit rose to EUR 13.1 mn from EUR ...,positive
...,...,...
3448,Operating result for the 12-month period decre...,negative
3449,HELSINKI Thomson Financial - Shares in Cargote...,negative
3450,LONDON MarketWatch -- Share prices ended lower...,negative
3451,Operating profit fell to EUR 35.4 mn from EUR ...,negative


Compute now the prediction:

In [15]:
nlp = pipeline("text-classification", model=finbert, tokenizer=tokenizer)
results = nlp(finance_df['sentence'].tolist())
y_pred = [item['label'].lower() for item in results]

Now let's also import the list of lables as well as the true values from the dataset:

In [17]:
labels = list(set(finance_df['label'].tolist()))
y_true = finance_df['label'].tolist()

And finally compute the f1 score:

In [18]:
f1_score(y_true, y_pred,labels=labels,average='macro')

0.8473605746028382

85 % it is pretty remarkable!

## FLS-Classification
Forward-looking statements (FLS) inform investors of managers’ beliefs and opinions about firm's future events or results. Identifying forward-looking statements from corporate reports can assist investors in financial analysis. FinBERT-FLS is a FinBERT model fine-tuned on 3,500 manually annotated sentences from Management Discussion and Analysis section of annual reports of Russell 3000 firms.

**Input**: A financial text.

**Output**: Specific-FLS , Non-specific FLS, or Not-FLS.

In [9]:
finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-fls',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-fls')

In [10]:
nlp = pipeline("text-classification", model=finbert, tokenizer=tokenizer)
results = nlp(['we expect the age of our fleet to enhance availability and reliability due to reduced downtime for repairs.',
               'on an equivalent unit of production basis, general and administrative expenses declined 24 percent from 1994 to $.67 per boe.',
               'we will continue to assess the need for a valuation allowance against deferred tax assets considering all available evidence obtained in future reporting periods.'])

In [11]:
results

[{'label': 'Specific FLS', 'score': 0.77278733253479},
 {'label': 'Not FLS', 'score': 0.9905241131782532},
 {'label': 'Non-specific FLS', 'score': 0.975904107093811}]