## Aspect-Based Sentiment Analysis of Financial News using Pretrained Transformers
This project evaluates the performance of pretrained transformer models on the SEntFiN dataset, which contains over 10,000 financial news headlines, each annotated with aspects (e.g., company names or financial entities) and their corresponding sentiment labels (positive, negative, or neutral).

Rather than training new models, this work focuses on leveraging existing transformer-based models to perform sentiment analysis directly. We compare a general-purpose model (Twitter-RoBERTa) with a domain-specific model (FinBERT) to assess their ability to identify the sentiment of financial headlines. The goal is to determine how well pretrained models generalize to financial data and to highlight the importance of domain adaptation in sentiment classification tasks.

First, we import the necessary tools to carry out the analysis. We use Pandas to manage and explore the dataset efficiently. To evaluate how well our models perform, we bring in accuracy and classification metrics from scikit-learn. Since some of the dataset columns may contain string representations of Python objects, we include the ast module to safely convert them into usable formats. Finally, we load the Hugging Face pipeline utility, which allows us to quickly apply powerful pretrained transformer models for sentiment analysis without additional training.

In [None]:
import pandas as pd
from sklearn.metrics import classification_report, accuracy_score
import ast
from transformers import pipeline

Next, we load the financial news dataset into a Pandas DataFrame for analysis. This dataset contains headlines along with labeled aspects and their corresponding sentiment classes. Displaying the first few rows helps us understand the structure of the data and verify that it has been loaded correctly.

In [None]:
# Load data into a DataFrame
df = pd.read_csv("SEntFiN-v1.1.csv")
df.head()


Unnamed: 0,S No.,Title,Decisions,Words
0,1,SpiceJet to issue 6.4 crore warrants to promoters,"{""SpiceJet"": ""neutral""}",8
1,2,MMTC Q2 net loss at Rs 10.4 crore,"{""MMTC"": ""neutral""}",8
2,3,"Mid-cap funds can deliver more, stay put: Experts","{""Mid-cap funds"": ""positive""}",8
3,4,Mid caps now turn into market darlings,"{""Mid caps"": ""positive""}",7
4,5,"Market seeing patience, if not conviction: Pra...","{""Market"": ""neutral""}",8


The dataset includes a column where the aspect-sentiment mappings are stored as strings representing Python dictionaries. To work with this data effectively, we first convert these strings into actual dictionaries using ast.literal_eval. Then, we restructure the dataset by flattening it—extracting each aspect-sentiment pair along with the corresponding news title into separate rows. This transformation results in a more analysis-friendly format, where each row clearly represents a single (title, aspect, sentiment) relationship.

In [None]:
# Convert string to dictionary
df['Decisions'] = df['Decisions'].apply(ast.literal_eval)

# Flatten the DataFrame
rows = []
for idx, row in df.iterrows():
    title = row['Title']
    for aspect, sentiment in row['Decisions'].items():
        rows.append({
            'Title': title,
            'Aspect': aspect,
            'Sentiment': sentiment
        })

# Create a new DataFrame
flat_df = pd.DataFrame(rows)
flat_df.head()


Unnamed: 0,Title,Aspect,Sentiment
0,SpiceJet to issue 6.4 crore warrants to promoters,SpiceJet,neutral
1,MMTC Q2 net loss at Rs 10.4 crore,MMTC,neutral
2,"Mid-cap funds can deliver more, stay put: Experts",Mid-cap funds,positive
3,Mid caps now turn into market darlings,Mid caps,positive
4,"Market seeing patience, if not conviction: Pra...",Market,neutral


We now apply a pretrained transformer model to predict sentiment from the financial news titles. Specifically, we use the cardiffnlp/twitter-roberta-base-sentiment model, which has shown strong performance on short text sentiment tasks. Using Hugging Face’s sentiment analysis pipeline, we run predictions in batches for efficiency. The model outputs sentiment labels for each title, which we convert to lowercase and store in a new column. This allows us to compare the model's predicted sentiment against the true labels for further evaluation.

In [None]:
# Load the sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment", tokenizer="cardiffnlp/twitter-roberta-base-sentiment", device=0)  # device=0 for GPU

# Run predictions in batches
titles = flat_df['Title'].tolist()
predictions = sentiment_pipeline(titles, batch_size=32)

# Extract predicted labels
pred_labels = [p['label'].lower() for p in predictions]

# Store in DataFrame
flat_df['Predicted_Sentiment'] = pred_labels

# Show a few rows
flat_df[['Title', 'Aspect', 'Sentiment', 'Predicted_Sentiment']].head()


Device set to use cuda:0


Unnamed: 0,Title,Aspect,Sentiment,Predicted_Sentiment
0,SpiceJet to issue 6.4 crore warrants to promoters,SpiceJet,neutral,label_1
1,MMTC Q2 net loss at Rs 10.4 crore,MMTC,neutral,label_0
2,"Mid-cap funds can deliver more, stay put: Experts",Mid-cap funds,positive,label_2
3,Mid caps now turn into market darlings,Mid caps,positive,label_1
4,"Market seeing patience, if not conviction: Pra...",Market,neutral,label_1


Once the model predictions are added, we observe that the predicted sentiment labels are returned in a generic format like label_0, label_1, and label_2. These do not directly correspond to the original sentiment classes (positive, neutral, negative) used in our dataset. To make meaningful comparisons and evaluate model performance, we will need to map these generic labels to the appropriate sentiment categories in the next step.


To align the model’s output with the dataset’s sentiment labels, we create a mapping from the model’s generic labels (label_0, label_1, label_2) to the actual sentiment classes (negative, neutral, positive). After applying this mapping, we can accurately evaluate the model’s performance. We compute the overall accuracy and generate a detailed classification report, which includes precision, recall, and F1-score for each sentiment class—providing a clearer picture of how well the model performs on this financial dataset.

In [None]:
# Mapping model's labels to dataset's sentiment classes
label_mapping = {
    'label_0': 'negative',
    'label_1': 'neutral',
    'label_2': 'positive'
}

# Apply the mapping to predicted labels
flat_df['Predicted_Sentiment'] = flat_df['Predicted_Sentiment'].map(label_mapping)

# Evaluate performance again
accuracy = accuracy_score(flat_df['Sentiment'], flat_df['Predicted_Sentiment'])
print(f"Accuracy: {accuracy:.4f}")

# Detailed classification report
print("\nClassification Report:")
print(classification_report(flat_df['Sentiment'], flat_df['Predicted_Sentiment']))


Accuracy: 0.5798

Classification Report:
              precision    recall  f1-score   support

    negative       0.82      0.43      0.56      3819
     neutral       0.48      0.86      0.61      5515
    positive       0.79      0.39      0.52      5075

    accuracy                           0.58     14409
   macro avg       0.69      0.56      0.57     14409
weighted avg       0.68      0.58      0.57     14409



The evaluation reveals that the model achieves an overall accuracy of about 58%, indicating moderate success in predicting sentiment on financial news headlines. While it shows strong precision for negative and positive classes, the recall is notably higher for the neutral class, suggesting the model tends to predict neutral more often, possibly at the expense of correctly identifying some positive and negative sentiments. These insights highlight the strengths and limitations of this pretrained model on domain-specific financial text.

Next, we evaluate a domain-specific model, FinBERT, which is specifically trained on financial text for sentiment analysis. Using the same pipeline approach, we generate sentiment predictions for all headlines. Since FinBERT’s outputs are already aligned with our dataset’s sentiment categories, we directly compare predictions to true labels. We then compute accuracy and a detailed classification report to assess how well FinBERT performs relative to the previous more general-purpose model.

In [None]:
# Load the FinBERT model
sentiment_pipeline = pipeline("sentiment-analysis", model="ProsusAI/finbert", tokenizer="ProsusAI/finbert", device=0)

# Run predictions
titles = flat_df['Title'].tolist()
predictions = sentiment_pipeline(titles, batch_size=32)

# Extract predicted labels (they will already be: 'positive', 'neutral', 'negative')
pred_labels = [p['label'].lower() for p in predictions]

# Store predictions
flat_df['Predicted_Sentiment'] = pred_labels

# Evaluate performance
from sklearn.metrics import accuracy_score, classification_report

accuracy = accuracy_score(flat_df['Sentiment'], flat_df['Predicted_Sentiment'])
print(f"Accuracy: {accuracy:.4f}")

print("\nClassification Report:")
print(classification_report(flat_df['Sentiment'], flat_df['Predicted_Sentiment']))

config.json:   0%|          | 0.00/758 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cuda:0


Accuracy: 0.6651

Classification Report:
              precision    recall  f1-score   support

    negative       0.69      0.71      0.70      3819
     neutral       0.60      0.69      0.64      5515
    positive       0.73      0.60      0.66      5075

    accuracy                           0.67     14409
   macro avg       0.68      0.67      0.67     14409
weighted avg       0.67      0.67      0.67     14409



Loading and running FinBERT required downloading a larger, specialized model tailored for financial language, which explains the longer load times. Once loaded on GPU, FinBERT achieved an improved accuracy of about 67%, outperforming the previous general sentiment model. The classification report shows a more balanced precision and recall across all sentiment classes, indicating FinBERT’s better ability to capture nuanced financial sentiments compared to the generic model. This highlights the advantage of using domain-specific pretrained models for specialized tasks like financial news sentiment analysis.








