# Fine-tunning FinBERT for sentiment analysis of financial news

FinBERT is a BERT model pre-trained on financial communication text. It has been shown that FinBERT outperforms traditional machine learning models on several financial NLP tasks [1]. The model is trained on a total corpora size of 4.9B tokens, and is available in the following flavours (all hosted at Huggingface 🤗):

* FinBERT-Pretrained: The pretrained FinBERT model on large-scale financial text.
* FinBERT-Sentiment: for sentiment classification task.
* FinBERT-ESG: for ESG classification task.
* FinBERT-FLS: for forward-looking statement (FLS) classification task.

This notebook uses code from [FinBERT.AI](https://finbert.ai/) to showcase the use of pre-trained models in Domino and to demonstrate the process of GPU-accelerated fine-tuning using Nvidia GPUs. We also use the [Sentiment Analysis for Financial News](https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-for-financial-news) dataset [2], which provides 4,837 samples of sentiments for financial news headlines from the perspective of a retail investor.

*[1] Yi Yang and Mark Christopher Siy UY and Allen Huang, FinBERT: A Pretrained Language Model for Financial Communications, 2020, [2006.08097](https://arxiv.org/abs/2006.08097)*

*[2] Malo, P., Sinha, A., Korhonen, P., Wallenius, J., & Takala, P. (2014). Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4), 782-796.*



## Simple demonstration of FinBERT

Let's start by loading the libraries that are needed for acessing and fine-tuning FinBERT.

In [1]:
import torch
import transformers

import numpy as np
import pandas as pd 

from transformers import BertTokenizer, Trainer, BertForSequenceClassification, TrainingArguments, pipeline
from transformers import enable_full_determinism

from datasets import Dataset

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Let's make sure GPU acceleration is available.

In [2]:
if torch.cuda.is_available():
    print("GPU acceleration is available!")
else:
    print("GPU acceleration is NOT available! Training, fine-tuning, and inference speed will be adversely impacted.")
    
enable_full_determinism(True)

GPU acceleration is available!


Let's now load FinBERT and classify a handful of test statments. The NLP pipeline produces a label and a prediction score.

In [3]:
model = BertForSequenceClassification.from_pretrained("yiyanghkust/finbert-tone",num_labels=3)
tokenizer = BertTokenizer.from_pretrained("yiyanghkust/finbert-tone")

nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

sentences = ["there is a shortage of capital, and we need extra financing",  
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]
results = nlp(sentences)

for sample in zip(sentences, results):
    print(sample)

('there is a shortage of capital, and we need extra financing', {'label': 'Negative', 'score': 0.9966173768043518})
('growth is strong and we have plenty of liquidity', {'label': 'Positive', 'score': 1.0})
('there are doubts about our finances', {'label': 'Negative', 'score': 0.9999710321426392})
('profits are flat', {'label': 'Neutral', 'score': 0.9889442920684814})


## Financial news headlines dataset

Let's now load the Financial news dataset. The dataset has two attributes:

* **sentence** - the news headline
* **label** - sentiment, which we will encode as follows:
    * neutral  : 0
    * positive : 1
    * negative : 2
    
Let's process the dataset and show the first 5 samples.

In [4]:
# Load from CSV
df = pd.read_csv("all-data.csv", delimiter=",", encoding="latin-1", header=None).fillna("")
df = df.rename(columns=lambda x: ["label", "sentence"][x])

# Encode labels
df["label"] = df["label"].replace(["neutral","positive","negative"],[0,1,2]) 

# Print first 5
df.head()

Unnamed: 0,label,sentence
0,0,"According to Gran , the company has no plans t..."
1,0,Technopolis plans to develop in stages an area...
2,2,The international electronic industry company ...
3,1,With the new production plant the company woul...
4,1,According to the company 's updated strategy f...


Next, we check for missing values.

In [5]:
df.isnull().values.any()

False

It appears that there are no missing value in the data. We can now proceed with splitting it into a training, test, and validation sets.

In [6]:
df["sentence"].map(len).max()

315

### Preparing training, test, and validation subset

Next, we split the training dataset into a training, test, and validation subsets.

In [7]:
df_train, df_test, = train_test_split(df, stratify=df["label"], test_size=0.1, random_state=42)
df_train, df_val = train_test_split(df_train, stratify=df_train["label"],test_size=0.1, random_state=42)
print("Samples in train      : {:d}".format(df_train.shape[0]))
print("Samples in validation : {:d}".format(df_val.shape[0]))
print("Samples in test       : {:d}".format(df_test.shape[0]))

Samples in train      : 3924
Samples in validation : 437
Samples in test       : 485


Now let's score the validation set using only the pretrained model.

In [8]:
sentences = df_test["sentence"].to_list()
results = nlp(sentences)

We can build a DataFrame with the ground truth and the prediction and see how the pretrained model is doing in terms of model performance.

In [9]:
results_df = pd.DataFrame.from_dict(results)
results_df["label"] = results_df["label"].replace(["Neutral", "Positive", "Negative"],[0,1,2]) 
results_df.columns = ["pred", "score"]
results_df.reset_index(drop=True, inplace=True)

results_df = pd.concat([df_test[["sentence", "label"]].reset_index(drop=True), results_df], axis=1)

results_df["Correct"] = results_df["label"].eq(results_df["pred"])

results_df.head()

Unnamed: 0,sentence,label,pred,score,Correct
0,Ruukki has signed a contract to deliver and in...,1,0,0.999923,False
1,- The Group - s cumulative sales during the r...,0,0,0.99998,True
2,The groups 's turnover for the full fiscal yea...,1,1,0.61673,True
3,"According to CEO Hannu Syrj+Ænen , a new commo...",0,0,0.952909,True
4,Finnish Suominen Corporation that makes wet wi...,2,0,0.999699,False


We can calculate the accuracy of the predictions:

In [10]:
accuracy = len(results_df[results_df["Correct"] == True]) / len(results_df)

print("Accuracy : {:.2f}".format(accuracy))

Accuracy : 0.81


We need to keep in mind that this is an imbalanced dataset, so it is good to look at the counts of the classes and the respective accuracy too: 

In [11]:
accuracy_df = pd.concat([results_df["label"].value_counts(), results_df.groupby("label")["Correct"].mean().mul(100).round(2)], axis=1)
accuracy_df = accuracy_df.reset_index()
accuracy_df.columns = ["Label", "Count", "Accuracy"]
accuracy_df.head()

Unnamed: 0,Label,Count,Accuracy
0,0,288,94.79
1,1,136,56.62
2,2,61,72.13


## Model Fine-tunning

The fine-tunning process takes the pretrained model (FinBERT) and performs additional training, tweaking it towards a more specialized use-case. Here, we'll use the training subset of the Sentiment Analysis for Financial News. This transfer learning approach will enables us to produce a more accurate model with a smaller training time.

### Datasets preparation

First, we need to prepare the three datasets (training, validation, and test) by tokenizing them and by setting the dataset format to be compatible with PyTorch.

In [12]:
dataset_train = Dataset.from_pandas(df_train)
dataset_val = Dataset.from_pandas(df_val)
dataset_test = Dataset.from_pandas(df_test)

dataset_train = dataset_train.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length", max_length=315), batched=True)
dataset_val = dataset_val.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length", max_length=315), batched=True)
dataset_test = dataset_test.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length" , max_length=315), batched=True)

dataset_train.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", "label"])
dataset_val.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", "label"])
dataset_test.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", "label"])


Map:   0%|          | 0/3924 [00:00<?, ? examples/s]

Map:   0%|          | 0/437 [00:00<?, ? examples/s]

Map:   0%|          | 0/485 [00:00<?, ? examples/s]

### Setting up and training

Next, we define the training metrics and some additional customization points like training epochs, size of batches etc.

In [13]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy" : accuracy_score(predictions, labels)}

args = TrainingArguments(
        output_dir = "temp/",
        evaluation_strategy = "epoch",
        learning_rate=0.00001,
        per_device_train_batch_size=32,
        per_device_eval_batch_size=32,
        num_train_epochs=1,
        weight_decay=0.01,
        metric_for_best_model="accuracy",
        save_total_limit = 2,
        save_strategy = "no",
        load_best_model_at_end=False,
        report_to = "none",
        optim="adamw_torch")

trainer = Trainer(
        model=model,
        args=args,
        train_dataset=dataset_train,
        eval_dataset=dataset_val,
        compute_metrics=compute_metrics)

We can now perform the training.

**Note that you will need a hardware tier with sufficient memory and compute, ideally a HW tier which provides GPU acceleration. Otherwise the training process can take a substantial amount of time or crash due to not having access to enough system memory**

In [14]:
trainer.train()  

The following columns in the training set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: sentence, __index_level_0__. If sentence, __index_level_0__ are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 3924
  Num Epochs = 1
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 123
  Number of trainable parameters = 109754115


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.450781,0.812357


The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: sentence, __index_level_0__. If sentence, __index_level_0__ are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 437
  Batch size = 32


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=123, training_loss=0.5987466951695884, metrics={'train_runtime': 229.4393, 'train_samples_per_second': 17.103, 'train_steps_per_second': 0.536, 'total_flos': 635203068587640.0, 'train_loss': 0.5987466951695884, 'epoch': 1.0})

### Model evaluation

We can now test the accuracy of the model using the test set.

In [15]:
accuracy_test = trainer.predict(dataset_test).metrics["test_accuracy"]
print("Accuracy on test: {:.2f}".format(accuracy_test))

The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: sentence, __index_level_0__. If sentence, __index_level_0__ are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 485
  Batch size = 32


Accuracy on test: 0.86


### Saving the fine-tuned model

Finally, we can save the fine-tuned model and used it for online predictions via a [Model API](https://docs.dominodatalab.com/en/latest/user_guide/8dbc91/host-models-as-rest-apis/).

In [16]:
trainer.save_model("finbert-sentiment/")

Saving model checkpoint to finbert-sentiment/
Configuration saved in finbert-sentiment/config.json
Model weights saved in finbert-sentiment/pytorch_model.bin
