# Use Case: Financial Sentiment Analysis

Financial sentiment analysis is at the intersection of finance and NLP, aiming to quantify and interpret the sentiment expressed in financial texts such as news articles, social media posts, earnings reports, and analyst commentaries. 

This domain seeks to extract meaningful insights from vast amounts of unstructured data, providing a nuanced understanding of market sentiments and their potential impact on financial markets. It is one of the oldest use cases in financial NLP domain and we can see soltuions from simple approaches like BoW to DL models. The task is generally to assess the polarity (positive, negative, or neutral) and intensity of sentiment. And, the aim is aiding investors, analysts, and policymakers in making informed decisions based on qualitative data.

We selected financial sentiment analysis as our use case due to its well-established nature and the availability of publicly accessible datasets and open-source models. These resources provide a robust foundation for our research, allowing us to focus on the integration of fairness solutions without the need to build the fundamental components from scratch. By leveraging these existing datasets and models, we can efficiently explore and address potential biases and fairness issues within the sentiment analysis pipeline. Our goal is to develop methodologies that ensure equitable outcomes across diverse demographic and socio-economic groups, enhancing the reliability and ethical standards of financial sentiment analysis.

In this notebook, we will simulate a developer-point-of-view process to create a financial sentiment analysis pipeline and explore the ways to identify biases in the pipeline. 

1. Use a pre-trained model: FinBERT.
2. Evaluate the prediction fairness using fairlearn and giskard.
3. Bias mitigation with data-augmentation (counterfactuals and adding mistakes).
4. Improving the ML-pipeline monitoring management.

# 1. Use pre-trained FinBERT

FinBERT is one of the early applications of general-capability transformer-based language models (BERT, GPT, etc.) in the financial domain. It is still relevant and used by practitioners and researchers. We will download the model from <https://huggingface.co/yiyanghkust/finbert-tone>

In [1]:
# tested in transformers==4.18.0 
from transformers import AutoTokenizer, BertTokenizer, AutoModel, BertForSequenceClassification, BertConfig, pipeline, utils
from tqdm import tqdm
import torch

The model is fine-tuned on 10,000 manually annotated sentences from analyst reports of S&P 500 firms.

**Input**: A financial text.
**Output**: Positive, Neutral or Negative.

In [2]:
finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3, output_attentions=True)
#model = AutoModel.from_pretrained('yiyanghkust/finbert-tone',num_labels=3, output_attentions=True)
model = AutoModel.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')
atokenizer = AutoTokenizer.from_pretrained('yiyanghkust/finbert-tone')

config = BertConfig.from_pretrained('yiyanghkust/finbert-tone')



In [3]:
# Let's test the model
sentence = "For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m ."

pipe = pipeline("text-classification", model=finbert, tokenizer=tokenizer)
results = pipe([sentence, 'growth is strong and we have plenty of liquidity.', 
               'there is a shortage of capital, and we need extra financing.',
              'formulation patents might protect Vasotec to a limited extent.'])
results



[{'label': 'Positive', 'score': 0.9999998807907104},
 {'label': 'Positive', 'score': 1.0},
 {'label': 'Negative', 'score': 0.9952379465103149},
 {'label': 'Neutral', 'score': 0.9979718327522278}]

In [4]:
print("The config has the following labels:" + str(config.id2label))
encoded_input = tokenizer(sentence, padding=True, return_tensors='pt')
output = finbert(**encoded_input)
probs = torch.softmax(output['logits'], dim=1)
label = config.id2label[torch.argmax(probs).item()]
label

The config has the following labels:{0: 'Neutral', 1: 'Positive', 2: 'Negative'}


'Positive'

Since the model is trained using three financial sentiment datasets: **(1)** Corporate Reports 10-K & 10-Q: 2.5B tokens, **(2)** Earnings Call Transcripts: 1.3B tokens, and **(3)** Analyst Reports: 1.1B tokens. So, we cannot use them for evaluation purposes. We will use Financial Phrasebank dataset. 

We will download the data from <https://huggingface.co/datasets/Jean-Baptiste/financial_news_sentiment>.  A more detailed explanation of downloading different finance datasets can be found in our [project home repo: fairness-monitoring](https://github.com/alan-turing-institute/fairness-monitoring/blob/main/notebooks/eda-fin-data.ipynb).

In [2]:
import numpy as np
import pandas as pd
from datasets import Dataset
from sklearn.metrics import (accuracy_score, 
                             classification_report, 
                             confusion_matrix)

In [3]:
filename = "../data/external/financialphrasebank.csv"
#DATASET_CONFIG = { "path": filename, "name": "sentiment"}
# LABEL_MAPPING = { 0: "negative", 1: "neutral", 2: "positive"}
TEXT_COLUMN = "text"
TARGET_COLUMN = "sentiment"
raw_data = pd.read_csv(filename, names=[TARGET_COLUMN, TEXT_COLUMN], encoding="utf-8", encoding_errors="replace")
raw_data.head()

Unnamed: 0,sentiment,text
0,neutral,"According to Gran , the company has no plans t..."
1,neutral,Technopolis plans to develop in stages an area...
2,negative,The international electronic industry company ...
3,positive,With the new production plant the company woul...
4,positive,According to the company 's updated strategy f...


In [4]:
# Profile the data
from ydata_profiling import ProfileReport

profile = ProfileReport(raw_data, title="Profiling Report")
profile.to_notebook_iframe()

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

In [5]:
json_data = profile.to_json()

Render JSON:   0%|          | 0/1 [00:00<?, ?it/s]

In [9]:
# string to json
import json
data = json.loads(json_data)

In [18]:
data.keys()
data["variables"].keys()

dict_keys(['sentiment', 'text'])

In [8]:
# In the profiling we found that there are duplicates in the data, remove them and run the profiling again
raw_data.drop_duplicates(subset=["text"], inplace=True)

In [9]:
# The lowest number of samples in a class is 604, so we will balance the data by sampling 604 samples from each class
X_eval_balanced = (raw_data
          .groupby('sentiment', group_keys=False)
          .apply(lambda x: x.sample(n=604, random_state=10, replace=True)))

eval_data = Dataset.from_pandas(X_eval_balanced)
X_eval_balanced.sentiment.value_counts()

  .apply(lambda x: x.sample(n=604, random_state=10, replace=True)))


sentiment
negative    604
neutral     604
positive    604
Name: count, dtype: int64

In [10]:
TARGET_STR_INT = config.label2id #{'positive': 2, 'neutral': 1, 'negative': 0}
TARGET_INT_STR = config.id2label #{2: 'positive', 1: 'neutral', 0: 'negative'}

def evaluate(y_true, y_pred):
    def map_func(x):
        return TARGET_STR_INT.get(x, 1)
    
    y_true = np.vectorize(map_func)(y_true)
    y_pred = np.vectorize(map_func)(y_pred)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_true=y_true, y_pred=y_pred)
    print(f'Accuracy: {accuracy:.3f}')
    
    # Generate accuracy report
    unique_labels = set(y_true)  # Get unique labels
    
    for label in unique_labels:
        label_indices = [i for i in range(len(y_true)) 
                         if y_true[i] == label]
        label_y_true = [y_true[i] for i in label_indices]
        label_y_pred = [y_pred[i] for i in label_indices]
        accuracy = accuracy_score(label_y_true, label_y_pred)
        print(f'Accuracy for label {label}: {accuracy:.3f}')
        
    # Generate classification report
    class_report = classification_report(y_true=y_true, y_pred=y_pred)
    print('\nClassification Report:')
    print(class_report)
    
    # Generate confusion matrix
    conf_matrix = confusion_matrix(y_true=y_true, y_pred=y_pred, labels=[0, 1, 2])
    print('\nConfusion Matrix:')
    print(conf_matrix)

In [11]:
def predict(X_test):
    y_pred = []
    for i in tqdm(range(len(X_test))):
        prompt = X_test.iloc[i].text
        result = pipe(prompt)
        answer = result[0]['label'].lower()
        if "positive" in answer:
            y_pred.append("positive")
        elif "negative" in answer:
            y_pred.append("negative")
        elif "neutral" in answer:
            y_pred.append("neutral")
        else:
            y_pred.append("none")
    return y_pred

In [11]:
# reminder: 'positive': 2, 'neutral': 1, 'negative': 0
y_pred = predict(X_eval_balanced)
y_true = X_eval_balanced.sentiment.values
evaluate(y_true, y_pred)

# Save the predictions with prompt to a CSV file
X_eval_balanced['predicted_sentiment'] = y_pred
X_eval_balanced.to_csv('../data/output/finbert/finbert_predictions_balanced.csv', index=False)

100%|██████████| 1812/1812 [01:37<00:00, 18.59it/s]

Accuracy: 0.730
Accuracy for label 0: 0.699
Accuracy for label 1: 0.916
Accuracy for label 2: 0.575

Classification Report:
              precision    recall  f1-score   support

           0       0.92      0.70      0.80       604
           1       0.58      0.92      0.71       604
           2       0.86      0.57      0.69       604

    accuracy                           0.73      1812
   macro avg       0.79      0.73      0.73      1812
weighted avg       0.79      0.73      0.73      1812


Confusion Matrix:
[[422 162  20]
 [ 15 553  36]
 [ 20 237 347]]





In [12]:
# and the unbalanced data
y_pred = predict(raw_data)
y_true = raw_data.sentiment.values
evaluate(y_true, y_pred)

# Save the predictions with prompt to a CSV file
raw_data_with_predictions = raw_data.copy()
raw_data_with_predictions['predicted_sentiment'] = y_pred
raw_data_with_predictions.to_csv('../data/output/finbert/finbert_predictions_unbalanced.csv', index=False)

100%|██████████| 4838/4838 [04:44<00:00, 16.98it/s]

Accuracy: 0.792
Accuracy for label 0: 0.684
Accuracy for label 1: 0.919
Accuracy for label 2: 0.575

Classification Report:
              precision    recall  f1-score   support

           0       0.79      0.68      0.73       604
           1       0.79      0.92      0.85      2872
           2       0.81      0.57      0.67      1362

    accuracy                           0.79      4838
   macro avg       0.80      0.73      0.75      4838
weighted avg       0.79      0.79      0.78      4838


Confusion Matrix:
[[ 413  177   14]
 [  60 2638  174]
 [  47  532  783]]





# 2. Further Evaluation of Bias

Evaluating bias in financial sentiment analysis is challenging due to the complex and nuanced nature of financial language. For example, a news statement often includes domain-specific jargon, idioms, and context-dependent expressions that can vary significantly across different sources and regions. Additionally, financial texts may inherently reflect the perspectives and biases of their authors. 

In our analysis, the dataset consists of fairly straightforward statements. However, we face another challenge which is the ambiguity in defining protected attributes. The financial documents or news rarely contain explicit demographic information, making it challenging to identify and analyze biases against specific groups. 

The lack of standardized benchmarks for measuring bias in this domain complicates the evaluation process, as traditional bias detection methods may not be directly applicable or sufficient. These challenges necessitate the development of specialized tools and methodologies to accurately identify and address bias in financial sentiment analysis, ensuring fair and reliable outcomes.

In this notebook, we will follow two approaches: 

1. We will use fairlearn and giskard to analyse what are the common characteristics of misclassified of each label.
2. We will use Ecco, a gradient visualisation tool, to understand which words or word combinations affected the misclassified cases.

## 2.1 Using giskard

In [12]:
from giskard import Dataset, Model, scan, testing

In [17]:
giskard_dataset = Dataset(
    df=raw_data.head(10),
    target=TARGET_COLUMN,
    name="Sentiment"
)

2024-06-19 15:08:33,800 pid:36870 MainThread giskard.datasets.base INFO     Your 'pandas.DataFrame' is successfully wrapped by Giskard's 'Dataset' wrapper class.


In [18]:
# lowercase TARGET_INT_STR values
TARGET_INT_STR = {k: v.lower() for k, v in TARGET_INT_STR.items()}

In [19]:
from scipy.special import softmax

def prediction_function(df: pd.DataFrame) -> np.ndarray:
    encoded_input = tokenizer(df[TEXT_COLUMN].to_list(), padding=True, return_tensors='pt')
    output = finbert(**encoded_input)
    return softmax(output['logits'].detach().numpy(), axis=1)


giskard_model = Model(
    model=prediction_function,  # A prediction function that encapsulates all the data pre-processing steps and that
    model_type="classification",  # Either regression, classification or text_generation.
    name="FinBERT for Financial News Sentiment Classification",  # Optional
    classification_labels= list(TARGET_INT_STR.values()),  # Their order MUST be identical to the prediction_function's
)

2024-06-19 15:08:35,071 pid:36870 MainThread giskard.models.automodel INFO     Your 'prediction_function' is successfully wrapped by Giskard's 'PredictionFunctionModel' wrapper class.


In [20]:
# This will display the results of the scan
results = scan(giskard_model, giskard_dataset)
display(results)

2024-06-19 15:08:35,668 pid:36870 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'text': 'object'} to {'text': 'object'}
2024-06-19 15:08:35,829 pid:36870 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (10, 2) executed in 0:00:00.165353
2024-06-19 15:08:35,833 pid:36870 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'text': 'object'} to {'text': 'object'}
2024-06-19 15:08:35,834 pid:36870 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (1, 2) executed in 0:00:00.004439
2024-06-19 15:08:35,838 pid:36870 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'text': 'object'} to {'text': 'object'}
2024-06-19 15:08:35,839 pid:36870 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (10, 2) executed in 0:00:00.004357
2024-06-19 15:08:35,843 pid:36870 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'text': 'ob

In [21]:
test_suite = results.generate_test_suite("Sentiment Analysis Test Suite")
test_suite.run()

2024-06-19 15:08:38,172 pid:36870 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'text': 'object'} to {'text': 'object'}
2024-06-19 15:08:38,173 pid:36870 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (10, 2) executed in 0:00:00.005180
2024-06-19 15:08:38,178 pid:36870 MainThread giskard.datasets.base INFO     Casting dataframe columns from {'text': 'object'} to {'text': 'object'}
2024-06-19 15:08:38,356 pid:36870 MainThread giskard.utils.logging_utils INFO     Predicted dataset with shape (10, 2) executed in 0:00:00.179895
2024-06-19 15:08:38,358 pid:36870 MainThread giskard.utils.logging_utils INFO     Perturb and predict data executed in 0:00:00.253405
2024-06-19 15:08:38,360 pid:36870 MainThread giskard.utils.logging_utils INFO     Compare and predict the data executed in 0:00:00.001349
Executed 'Invariance to “Add typos”' with arguments {'model': <giskard.models.function.PredictionFunctionModel object at 0x2fd88fca0>, 'data

# 2.2 Further Evaluation of Classificationn using Visualization of Gradients

Different libraries are available: Ecco, captum, bertviz.

In [25]:
from transformers import BertTokenizer, BertForSequenceClassification, BertConfig

from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch

In [36]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def predict(inputs):
    return finbert(inputs)[0]

ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, add_special_tokens=False)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_token_type_pair(input_ids, sep_ind=0):
    seq_len = input_ids.size(1)
    token_type_ids = torch.tensor([[0 if i <= sep_ind else 1 for i in range(seq_len)]], device=device)
    ref_token_type_ids = torch.zeros_like(token_type_ids, device=device)# * -1
    return token_type_ids, ref_token_type_ids

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)
    position_ids = torch.arange(seq_length, dtype=torch.long, device=device)
    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

def custom_forward(inputs):
    preds = predict(inputs)
    return torch.softmax(preds, dim = 1)[0][0].unsqueeze(-1)

In [34]:
lig = LayerIntegratedGradients(custom_forward, finbert.bert.embeddings)

input_ids, ref_input_ids, sep_id = construct_input_ref_pair(sentence, ref_token_id, sep_token_id, cls_token_id)
token_type_ids, ref_token_type_ids = construct_input_ref_token_type_pair(input_ids, sep_id)
position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
attention_mask = construct_attention_mask(input_ids)

indices = input_ids[0].detach().tolist()
all_tokens = tokenizer.convert_ids_to_tokens(indices)

In [39]:
score = predict(input_ids)

print('Question: ', sentence)
print('Predicted Answer: ' + str(torch.argmax(score[0]).numpy()) + ', prob ungrammatical: ' + str(torch.softmax(score, dim = 1)[0][0].detach().numpy()))

Question:  For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .
Predicted Answer: 1, prob ungrammatical: 8.832481e-08


In [40]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    return attributions

In [42]:
attributions, delta = lig.attribute(inputs=input_ids,
                                    baselines=ref_input_ids,
                                    return_convergence_delta=True)

attributions_sum = summarize_attributions(attributions)

In [44]:
# storing couple samples in an array for visualization purposes
score_vis = viz.VisualizationDataRecord(
                        attributions_sum,
                        torch.softmax(score, dim = 1)[0][0],
                        torch.argmax(torch.softmax(score, dim = 1)[0]),
                        1, # Positive Sentiment
                        sentence,
                        attributions_sum.sum(),       
                        all_tokens,
                        delta)

print('\033[1m', 'Visualization For Score', '\033[0m')
viz.visualize_text([score_vis])

[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.00),"For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .",3.01,"[CLS] for the last quarter of 2010 , component ##a ' s net sales doubled to eur ##13 ##1m from eur ##76 ##m for the same period a year earlier , while it moved to a zero pre - tax profit from a pre - tax loss of eur ##7m . [SEP]"
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.00),"For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .",3.01,"[CLS] for the last quarter of 2010 , component ##a ' s net sales doubled to eur ##13 ##1m from eur ##76 ##m for the same period a year earlier , while it moved to a zero pre - tax profit from a pre - tax loss of eur ##7m . [SEP]"
,,,,


## 2.3 Using our prior knowledge create protected attributes

We explored the data and model capabilities and drawbacks using a variety of libraries. Based on the results and existing literature knowledge, let's identify hidden protected attributes.

In [None]:
# Get the indices of the misclassified examples with "positive" labels
misclassified_pos_indices = [i for i in range(len(y_true)) 
                         if y_true[i] == 'positive' and y_pred[i] != 'positive']

# Get the indices of the misclassified examples with "negative" labels
misclassified_neg_indices = [i for i in range(len(y_true)) 
                         if y_true[i] == 'negative' and y_pred[i] != 'negative']

# Get the indices of the misclassified examples with "neutral" labels
misclassified_neu_indices = [i for i in range(len(y_true)) 
                         if y_true[i] == 'neutral' and y_pred[i] != 'neutral']

# 3. Mitigating Bias with Data Augmentation

In the analysis, "" and "" emerged as potential protected attributes in the training process. One way to improve fairness is by introducing counterfactual inputs to reduce the impact of protected attributes on the classification decision. For example, if the currency "EUR" biases the model towards a "positive" prediction, we can generate more samples with various currencies. For instance:

Original sentence: "For the last quarter of 2010, Componenta's net sales doubled to EUR131m from EUR76m for the same period a year earlier, while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m."
Sentiment: Positive

If all sentences with the EUR currency are labeled as positive, the model might incorrectly associate the occurrence of EUR with positivity. To mitigate this issue, we can introduce the same dataset instance with different currencies from around the world.


In [None]:
import sys
# caution: path[0] is reserved for script path (or '' in REPL)
sys.path.insert(1, '../../utils/')
from counterfactual_generator import generate_random_counterfactual, generate_counterfactuals

In [None]:
sentence = "For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m ."
vocab_path =  "../utils/codes-all.csv"
target = "AlphabeticCode"
example_cf = generate_random_counterfactual(sentence, vocab_path, target)
example_cf

In [None]:
# Now the example counterfactual is generated, we can use the pipeline to predict the sentiment of the counterfactual
# It is also important to note that the counterfactual is almost meaningless... It uses three different currencies and I have no idea if it is a positive or negative increase, but the overall statement is still positive.
print(pipe(sentence))
print(pipe(example_cf))

In [None]:
sentence = "According to Gran , the company has no plans to move all production to Germany, although that is where the company is growing ."
vocab_path =  "../../utils/codes-all.csv"
target = "Entity"
example_cf = generate_random_counterfactual(sentence, vocab_path, target)
example_cf

In [None]:
print(pipe(sentence))
print(pipe(example_cf))

In [None]:
vocab_path =  "../../utils/codes-all.csv"
target = "Entity"

# Save counterfactuals in a new dataframe with the sentiment

sents = []
cfarr = []

#for i in range(len(X_train)):
for i in range(1):
    sentiment = raw_data.iloc[i]['sentiment']
    cfs = generate_counterfactuals(raw_data.iloc[i]['text'], vocab_path, target)
    for cf in cfs:
        sents.append(sentiment)
        cfarr.append(cf)

cf_df = pd.DataFrame({'sentiment': sents, 'text': cfarr})

# Save it to file
cf_df.to_csv('../data/output/counterfactual/financialphrasebank_cfs.csv', index=False)

# Conclusion

Xing et al.'s recent review[^2] identifies six key challenges in financial sentiment analysis task: (1) irrealis mood, (2) rhetoric, (3) dependent opinions, (4) unspecified aspects, (5) unrecognized words, and (6) external references. 

- Irrealis mood (conditional mood, subjunctive mood, imperative mood): 
- Rhetoric (negative assertion, personification, sarcasm), 
- Dependent opinion, 
- Unspecified aspects, 
- Unrecognized words (entity, microtext, jargons), and e
- External reference.

![Financial Sentiment Analysis Overview Diagram](../../media/finsentiment-flow.png)

*Financial sentiment and impacting factor. Diagram is from [^1]*

I believe developing effective approaches in the financial domain can support both improving the accuracy and mitigating biases.

[^1]: Du, Kelvin, et al. "Financial Sentiment Analysis: Techniques and Applications." ACM Computing Surveys (2024). <https://dl.acm.org/doi/10.1145/3649451>
[^2]: Xing, Frank, et al. "Financial sentiment analysis: an investigation into common mistakes and silver bullets." Proceedings of the 28th international conference on computational linguistics. 2020. <https://aclanthology.org/2020.coling-main.85.pdf>