# **Title : Sentiment Analysis on Financial Text Data Using BERT**
#### **Group Member Names : Group 9**
  * Bhagyesh Parmar (200568992)
  * Rahul Pandey   (200576239)


### **INTRODUCTION:**
* Sentiment analysis is a critical task in understanding the emotional tone behind a body of text. In the financial domain, this becomes particularly important as sentiments can influence market trends and investor decisions. This project uses a pre-trained BERT model to perform sentiment analysis on financial text data, categorizing the sentiments as either positive or negative.


*********************************************************************************************************************
#### **AIM :**
* To develop a sentiment analysis model using BERT that accurately classifies financial text data into positive or negative sentiments.

*********************************************************************************************************************
#### **Github Repo:** https://github.com/huggingface/transformers
[Group 9 Github Repo.](https://github.com/Bhagyesh200568992/AIDI_Group9_FinalProject)

*********************************************************************************************************************
#### **DESCRIPTION OF PAPER:**
* This project is based on the principles of transfer learning using the BERT model, which has been shown to perform exceptionally well in natural language understanding tasks. The approach involves fine-tuning a pre-trained BERT model on a specific financial dataset to achieve high accuracy in sentiment classification.


*********************************************************************************************************************
#### **PROBLEM STATEMENT :**
* The objective is to create a model capable of determining the sentiment (positive or negative) of a given piece of financial text data. Accurate sentiment classification can provide insights into market sentiments and help in making informed financial decisions.

*********************************************************************************************************************
#### **CONTEXT OF THE PROBLEM:**
* Sentiment analysis in finance is crucial as it helps in understanding market sentiment, which can significantly impact investment decisions, stock prices, and overall market stability. By accurately identifying sentiment, stakeholders can better predict market movements and adjust strategies accordingly.

*********************************************************************************************************************
#### **SOLUTION:**
* The solution involves fine-tuning a BERT model on the financial dataset, where the text is tokenized, and the model is trained to classify the sentiment of each text entry.



# **Background**
*********************************************************************************************************************

#### **Reference Explanation:**
* The project leverages the BERT model, which is based on transformers and is pre-trained on large text corpora. This makes it highly effective for various NLP tasks, including sentiment analysis.

#### **Dataset/Input:**
* The dataset used contains financial text data labeled as either positive or negative.

#### **Weakness:**
* While BERT provides excellent results, it requires substantial computational resources and may overfit if not tuned properly.



*********************************************************************************************************************






# **Implement paper code :**
*********************************************************************************************************************
* The project uses a Python-based implementation, utilizing libraries such as `transformers`, `datasets`, and `sklearn` to process the data and train the BERT model.



*********************************************************************************************************************
### **Contribution  Code :**
* Our implementation involved customizing the training loop, adding custom metrics for evaluation, and tuning the model to work effectively on financial text data.


### **Results :**


| Epoch | Training Loss | Validation Loss | Accuracy | F1 Score |
|-------|---------------|-----------------|----------|----------|
|   1   | Not logged    | 0.680939        | 0.700000 | 0.709890 |
|   2   | 0.683600      | 0.674550        | 0.600000 | 0.600000 |
|   3   | 0.683600      | 0.665544        | 0.800000 | 0.762500 |

*******************************************************************************************************************************


### **Detailed Explanation**

#### **1. Epoch**
- **Definition:** An epoch represents one complete cycle through the entire training dataset. During each epoch, the model sees every training example once.
- **In this table:** The model was trained over 3 epochs. Each row in the table corresponds to the metrics recorded after an epoch of training.

#### **2. Training Loss**
- **Definition:** Training loss measures how well the model is learning during training. It is the error calculated on the training dataset, indicating how well the model's predictions match the actual labels.
- **In this table:**
  - **Epoch 1:** The training loss wasn't logged for this epoch.
  - **Epoch 2 & 3:** The training loss is reported as 0.683600 for both epochs, which suggests there might be a logging issue or that the model wasn't improving significantly on the training data between these epochs. This could also indicate that the model was not overfitting, as the loss did not drastically decrease.

#### **3. Validation Loss**
- **Definition:** Validation loss is similar to training loss but is calculated on the validation dataset, which the model hasn't seen during training. It gives an indication of how well the model is likely to perform on unseen data.
- **In this table:**
  - **Epoch 1:** The validation loss is 0.680939, indicating the initial performance of the model on unseen data.
  - **Epoch 2:** The validation loss decreases slightly to 0.674550, suggesting the model is improving.
  - **Epoch 3:** The validation loss further decreases to 0.665544, showing continued improvement as the model learns better representations from the data.

#### **4. Accuracy**
- **Definition:** Accuracy is the proportion of correct predictions out of the total number of predictions made by the model. It is a straightforward metric to understand the model's performance.
- **In this table:**
  - **Epoch 1:** The model achieved 70% accuracy on the validation set, meaning it correctly predicted the sentiment for 70% of the validation examples.
  - **Epoch 2:** Accuracy dropped to 60%, which could indicate that the model may have struggled with the validation data during this epoch or that the model was starting to overfit to the training data.
  - **Epoch 3:** Accuracy significantly improved to 80%, suggesting that by the third epoch, the model had learned more robust features, leading to better generalization on the validation data.

#### **5. F1 Score**
- **Definition:** The F1 score is a weighted average of precision (the accuracy of positive predictions) and recall (the ability to find all positive instances). It is particularly useful when dealing with imbalanced datasets, as it balances the trade-off between precision and recall.
- **In this table:**
  - **Epoch 1:** The F1 score was 0.709890, indicating the model's ability to balance precision and recall was reasonably good at the start.
  - **Epoch 2:** The F1 score dropped to 0.600000, reflecting the drop in accuracy and possibly indicating that the model struggled with precision or recall during this epoch.
  - **Epoch 3:** The F1 score increased to 0.762500, which aligns with the improved accuracy and suggests that the model was better at balancing precision and recall, leading to more reliable sentiment predictions.

### **Overall Analysis**
- **Epoch 1:** The model started with a reasonably good performance on the validation data with an accuracy of 70% and an F1 score of 0.709890, even though the training loss wasn't logged.
- **Epoch 2:** The performance dropped in both accuracy and F1 score, which could indicate that the model was either encountering difficulties in generalizing from the training data or was overfitting slightly, though the training loss remained stable.
- **Epoch 3:** The model's performance improved significantly, achieving 80% accuracy and a strong F1 score of 0.762500. This suggests that the model eventually learned to generalize better to the validation data, resulting in more accurate and balanced predictions.

The pattern of the losses and metrics suggests that the model benefited from multiple epochs of training, with the third epoch being the most effective in terms of performance improvement.


### **Observations :**
*******************************************************************************************************************************
* The model shows consistent improvement across epochs, with the best performance occurring in the final epoch. The accuracy and F1 score indicate that the model is proficient at classifying sentiments within the financial text data.


### **Conclusion and Future Direction :**
* The model successfully performs sentiment analysis on financial text data with a good degree of accuracy. Future work may involve experimenting with larger datasets, further tuning the model, or exploring other transformer models to improve performance.
*******************************************************************************************************************************
#### **Learnings :**
- The importance of fine-tuning pre-trained models for specific tasks.
- Handling and processing text data for sentiment analysis.
- Understanding the trade-offs between model complexity and computational resources.

*******************************************************************************************************************************
#### **Results Discussion :**
* The model's accuracy improved with each epoch, demonstrating the effectiveness of fine-tuning. However, there is still room for improvement in handling edge cases and ensuring the model generalizes well to unseen data.


*******************************************************************************************************************************
#### **Limitations :**
- Limited by the size and diversity of the dataset.
- High computational requirements for training and fine-tuning BERT.


*******************************************************************************************************************************
#### **Future Extension :**
* Potential extensions include experimenting with different models, using larger and more diverse datasets, and applying the model to real-time financial sentiment analysis.


# References:

[1]:  [BERT Research Paper](https://arxiv.org/pdf/1810.04805)

[2] : [Transformer Librabry GitHub Documentation](https://github.com/huggingface/transformers)

[3]: [Finance Dataset](https://www.kaggle.com/datasets/prishasawhney/sentiment-analysis-evaluation-dataset)