 Clone Reference Repository
This command clones the StockEmotions GitHub repository, which contains financial sentiment data and models used as a reference for this project.

In [None]:
#!git clone https://github.com/adlnlp/StockEmotions.git ##remove this line

## COMMENT: It's better that your project dont have an internal dependency with an external repository (that it's not your own)
## COMMENT: Moreover, if that repo changes, you will have to update your project
## COMMENT: I've included those files in your project. The dataset/StockEmotions, it's a simple folder without the .git folder/that external repo.

This cell loads the tweet-level sentiment data from the StockEmotions repository (train, val, and test splits).
It filters relevant columns, renames them for consistency, merges all splits, parses the dates, and maps sentiment labels (0 → bearish, 1 → bullish).

In [None]:
import pandas as pd
from pathlib import Path

base = Path("dataset/StockEmotions/tweet")
df_list = []
for split in ["train_stockemo","val_stockemo","test_stockemo"]: ## We are joining three datasets (training, validation and test)
    path = base / f"{split}.csv"
    df = pd.read_csv(path)
    df = df[['date','ticker','original','senti_label']]
    df.rename(columns={'original':'text', 'senti_label':'sentiment'}, inplace=True)
    df_list.append(df)
df = pd.concat(df_list, ignore_index=True)
df['date'] = pd.to_datetime(df['date']).dt.date
print("Sentiment distribution:", df['sentiment'].value_counts())
# df['sentiment'] = df['sentiment'].map({0:'bearish',1:'bullish'}) ## Alert!!!!
# map_labels = {'positive': 'bullish', 'negative': 'bearish'} ## YOU USE THIS DICT IN NEXT CELLS: BETTER ### ALERT THE KEY_VALUE STRUCTURE
map_labels = {'bullish': 'positive', 'bearish': 'negative'}
df['sentiment'] = df['sentiment'].map(map_labels)
df.head()


Sentiment distribution: sentiment
bullish    5474
bearish    4526
Name: count, dtype: int64


Unnamed: 0,date,ticker,text,sentiment
0,2020-01-01,AMZN,$AMZN Dow futures up by 100 points already 🥳,1
1,2020-01-01,TSLA,$TSLA Daddy's drinkin' eArly tonight! Here's t...,1
2,2020-01-01,AAPL,$AAPL We’ll been riding since last December fr...,1
3,2020-01-01,TSLA,"$TSLA happy new year, 2020, everyone🍷🎉🙏",1
4,2020-01-01,TSLA,"$TSLA haha just a collection of greats...""Mars...",1


Shows dataset size, unique tickers

In [None]:
print("Samples in the dataset:", len(df))
print("Sentiment distribution:", df['sentiment'].value_counts())
print("Companies in the dataset:", df['ticker'].nunique())
print("\t+ in the dataset:", df['ticker'].unique())

Samples in the dataset: 10000
Companies in the dataset: 37
	+ in the dataset: ['AMZN' 'TSLA' 'AAPL' 'HD' 'NVDA' 'GOOGL' 'NFLX' 'FB' 'DIS' 'BA' 'WMT'
 'TSM' 'BABA' 'V' 'SBUX' 'BAC' 'UNH' 'XOM' 'MSFT' 'GOOG' 'PFE' 'CVX'
 'PYPL' 'MCD' 'JPM' 'NKE' 'BKNG' 'CCL' 'BRK.B' 'MA' 'JNJ' 'AMT' 'LOW' 'KO'
 'UPS' 'PG' 'ABNB']
Sentiment distribution: sentiment
1    5474
0    4526
Name: count, dtype: int64


Loads daily price data for each stock ticker and stores them in a dictionary.



In [None]:
price_dir = Path("dataset/StockEmotions/price")
price_dfs = {}
# for file in price_dir.glob("*.csv"): #Lets only load the price data from previous companies
for ticker in df['ticker'].unique():
    file = price_dir / f"{ticker.replace('.','-')}.csv" #SPECIAL CASE:  BRK.B.csv -> BRK-B.csv
    dfp = pd.read_csv(file, parse_dates=['Date'])
    dfp['date'] = dfp['Date'].dt.date
    dfp.set_index('date', inplace=True)
    price_dfs[ticker] = dfp[['Open','Close','Adj Close', 'High', 'Low', 'Volume']] ## Lets load all the price data: it's free!

print("Number of companies loaded in the dataset:", len(price_dfs))
assert(len(price_dfs) == len(df['ticker'].unique())) ## Double check, the previous for loop had generated one error


Number of companies loaded in the dataset: 37


In [17]:
price_dfs["AMZN"]

Unnamed: 0_level_0,Open,Close,Adj Close,High,Low,Volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-12-31,92.099998,92.391998,92.391998,92.663002,91.611504,50130000
2020-01-02,93.750000,94.900497,94.900497,94.900497,93.207497,80580000
2020-01-03,93.224998,93.748497,93.748497,94.309998,93.224998,75288000
2020-01-06,93.000000,95.143997,95.143997,95.184502,93.000000,81236000
2020-01-07,95.224998,95.343002,95.343002,95.694504,94.601997,80898000
...,...,...,...,...,...,...
2020-12-24,159.695007,158.634506,158.634506,160.100006,158.449997,29038000
2020-12-28,159.699997,164.197998,164.197998,165.199997,158.634506,113736000
2020-12-29,165.496994,166.100006,166.100006,167.532501,164.061005,97458000
2020-12-30,167.050003,164.292496,164.292496,167.104996,164.123505,64186000


#### Filter by Available Price Data
Removes tweets for tickers with no matching price history.

In [None]:
tickers = set(df['ticker'])
available = set(price_dfs.keys())
print("In tweets but missing price data:", tickers - available)
df = df[df['ticker'].isin(available)].reset_index(drop=True) #### ALERT! the problem was that BRK.B was missing but the file was named BRK-B.csv!! NoW it's solved!!

In tweets but missing price data: set()


### Load Pretrained Sentiment Model
Loads the twitter-roberta-base-sentiment model and tokenizer to classify tweet sentiment.

In [21]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "cardiffnlp/twitter-roberta-base-sentiment"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
labels = ['negative', 'neutral', 'positive']


### Define Sentiment Prediction Function
This function returns the predicted label and a sentiment score ∈ [–1, 1] based on model probabilities.

In [22]:
def infer_finbert_sentiment(text):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=1)[0].numpy()
    label = labels[probs.argmax()]
    score = probs[2] - probs[0]  # positive - negative → score ∈ [-1, 1]
    return label, float(score)


### Run Sentiment Inference on Tweets
Applies the sentiment model to each tweet and stores the predicted label and score.

In [None]:
from tqdm import tqdm
import pandas as pd

# df = df.dropna(subset=["text"]).copy().reset_index(drop=True) > ? copy and drop?  
textserie = df["text"] ## Better to use the text column, and not remove all data

labels_list = []
scores_list = []

for i,text in tqdm(enumerate(textserie)):
    label, score = infer_finbert_sentiment(text)
    labels_list.append(label)
    scores_list.append(score)

df['finbert_label'] = labels_list # We can do this, since we not alter the index/order
df['finbert_score'] = scores_list


10000it [09:59, 16.67it/s]


### Save Results
Exports the enriched dataset with FinBERT sentiment to a CSV file.

In [27]:
df.to_csv("results/exp0/tweets_with_finbert_sentiment.csv", index=False) 
# It's good to save the results in internal folder since these temporal files are not part of your core-project but from your experimentation. 
# And more easy to remove/clean!

## Experiment - option 1 ?


Charger la vérité terrain (senti_label)

In [None]:
## BETTER We dont lose data. we can commment it
## ALL is merged in the original DF


# base = Path("StockEmotions/tweet") 
# df_list = []

# for split in ["train_stockemo", "val_stockemo", "test_stockemo"]:
#     path = base / f"{split}.csv"
#     df = pd.read_csv(path)
#     df = df[['date','ticker','original','senti_label']]
#     df.rename(columns={'original': 'text'}, inplace=True)
#     df_list.append(df) 

# df_gold = pd.concat(df_list, ignore_index=True)
# df_gold['date'] = pd.to_datetime(df_gold['date']).dt.date


# df_pred = pd.read_csv("results/exp0/tweets_with_finbert_sentiment.csv")
# df_pred['date'] = pd.to_datetime(df_pred['date']).dt.date

# # df = pd.merge(df_pred, df_gold, on=['date', 'ticker', 'text'], how='inner')

# # Vérifions que tout est bien aligné
# df[['text', 'senti_label', 'finbert_label', 'finbert_score']].head()



# df = pd.read_csv("results/exp0/tweets_with_finbert_sentiment.csv")
# print(df.head())
# df["sentiment"] = df["sentiment"].map({1: 'positive', 0: 'negative'})
print(df.head())

         date ticker                                               text  \
0  2020-01-01   AMZN       $AMZN Dow futures up by 100 points already 🥳   
1  2020-01-01   TSLA  $TSLA Daddy's drinkin' eArly tonight! Here's t...   
2  2020-01-01   AAPL  $AAPL We’ll been riding since last December fr...   
3  2020-01-01   TSLA            $TSLA happy new year, 2020, everyone🍷🎉🙏   
4  2020-01-01   TSLA  $TSLA haha just a collection of greats..."Mars...   

   sentiment finbert_label  finbert_score  
0          1      positive       0.912322  
1          1      positive       0.869287  
2          1       neutral      -0.088177  
3          1      positive       0.978799  
4          1      positive       0.886038  
         date ticker                                               text  \
0  2020-01-01   AMZN       $AMZN Dow futures up by 100 points already 🥳   
1  2020-01-01   TSLA  $TSLA Daddy's drinkin' eArly tonight! Here's t...   
2  2020-01-01   AAPL  $AAPL We’ll been riding since last Dec

In [None]:

from sklearn.metrics import accuracy_score, classification_report
# Filtrer les cas où finbert_label est positive ou negative
# df_strict = df[df['finbert_label'].isin(['positive', 'negative'])].copy()
# map_labels = {'positive': 'bullish', 'negative': 'bearish'}
# df_strict['finbert_mapped'] = df_strict['finbert_label'].map(map_labels)

# Évaluer
### ALERT: Dont create more df with the same data
acc1 = accuracy_score(df['sentiment'], df['finbert_label']) 
print(f"Accuracy: {acc1:.4f}") 
# print(classification_report(df['sentiment'], df['finbert_label']))
print(classification_report(df['sentiment'], df['finbert_label'], labels=['positive', 'negative'], zero_division=0,digits=4))




Accuracy: 0.4160
              precision    recall  f1-score   support

    positive     0.7011    0.4850    0.5734      5474
    negative     0.6356    0.3325    0.4366      4526

   micro avg     0.6759    0.4160    0.5150     10000
   macro avg     0.6683    0.4088    0.5050     10000
weighted avg     0.6714    0.4160    0.5115     10000



### Note on the results:
- The prediction value is quite low (or bad) with this approximation, which is 0.5734 and 0.4366.
- This is normal; don't worry! We'll try to improve it.
- Now it's like flipping a coin (positive and negative - random binary). > 50%


## Sentiment Model Evaluation – Option 1 (Non-Neutral Predictions Only)

We evaluated the FinBERT-based sentiment classification model on the subset of the dataset where predictions were either `positive` or `negative`, excluding `neutral` cases to focus on high-confidence sentiment detection.

###  Evaluation Metrics

| Sentiment | Precision | Recall | F1-Score | Support |
|-----------|-----------|--------|----------|---------|
| Bearish   | 0.63      | 0.57   | 0.60     | 2632    |
| Bullish   | 0.70      | 0.75   | 0.73     | 3507    |
| **Overall Accuracy** |       |        |          | **67.49%** |




In [None]:
df.head() # Lets see the data

Unnamed: 0,date,ticker,text,sentiment,finbert_label,finbert_score
0,2020-01-01,AMZN,$AMZN Dow futures up by 100 points already 🥳,positive,positive,0.912322
1,2020-01-01,TSLA,$TSLA Daddy's drinkin' eArly tonight! Here's t...,positive,positive,0.869287
2,2020-01-01,AAPL,$AAPL We’ll been riding since last December fr...,positive,neutral,-0.088177
3,2020-01-01,TSLA,"$TSLA happy new year, 2020, everyone🍷🎉🙏",positive,positive,0.978799
4,2020-01-01,TSLA,"$TSLA haha just a collection of greats...""Mars...",positive,positive,0.886038


In [46]:
# df['finbert_binary'] = df['finbert_score'].apply(lambda x: 'bullish' if x > 0 else 'bearish')
mask_binary =  df[df["finbert_label"]!="neutral"].index
print("Samples with positive or negative values:", len(mask_binary), "from a total of", len(df), ", rate:", len(mask_binary)/len(df))
# Évaluer
# df_all = df.dropna(subset=['senti_label', 'finbert_binary'])
# acc2 = accuracy_score(df_all['senti_label'], df_all['finbert_binary'])
# print(f"Option 2 – Accuracy (all): {acc2:.4f}")
# print(classification_report(df_all['senti_label'], df_all['finbert_binary']))

print("Option 2")
acc2 = accuracy_score(df.loc[mask_binary,'sentiment'], df.loc[mask_binary]['finbert_label'])
print(f"Accuracy (all): {acc2:.4f}")
print(classification_report(df.loc[mask_binary,'sentiment'], df.loc[mask_binary]['finbert_label'], labels=['positive', 'negative'], zero_division=0,digits=4))

Samples with positive or negative values: 6155 from a total of 10000 , rate: 0.6155
Option 2
Accuracy (all): 0.6759
              precision    recall  f1-score   support

    positive     0.7011    0.7547    0.7269      3518
    negative     0.6356    0.5707    0.6014      2637

    accuracy                         0.6759      6155
   macro avg     0.6683    0.6627    0.6641      6155
weighted avg     0.6730    0.6759    0.6731      6155



# NOTE on the results:
- It's better, but we're losing the neutral value.

### Sorry, I dont understand the next code

##  Sentiment Model Evaluation – Option 2 (Full Dataset Using Score > 0)

In this evaluation, we used the entire dataset by converting the continuous `finbert_score` into binary sentiment:

- `bullish` if score > 0
- `bearish` if score <= 0

This includes all tweets, even those that may have had ambiguous or borderline sentiment.

###  Evaluation Metrics

| Sentiment | Precision | Recall | F1-Score | Support |
|-----------|-----------|--------|----------|---------|
| Bearish   | 0.61      | 0.54   | 0.58     | 4522    |
| Bullish   | 0.65      | 0.71   | 0.68     | 5464    |
| **Overall Accuracy** |       |        |          | **63.66%** |



##  Conclusion

The sentiment model based on FinBERT performs reasonably well, especially on clearly positive (bullish) tweets. It shows limitations on bearish detection but remains a solid foundation for further development.

- **Final model score** (macro F1 average across both tests): **0.645**

##  Next Step

- Use sentiment scores to **generate trading signals**
- Plan a **comparison with other models** ( FinGPT, LLMs with justification/explanation)
- Prepare for **adding explainability (XAI)** to better interpret each sentiment prediction