 Clone Reference Repository
This command clones the StockEmotions GitHub repository, which contains financial sentiment data and models used as a reference for this project.

In [None]:
!git clone https://github.com/adlnlp/StockEmotions.git

This cell loads the tweet-level sentiment data from the StockEmotions repository (train, val, and test splits).
It filters relevant columns, renames them for consistency, merges all splits, parses the dates, and maps sentiment labels (0 → bearish, 1 → bullish).

In [13]:
import pandas as pd
from pathlib import Path

base = Path("StockEmotions/tweet")
df_list = []
for split in ["train_stockemo","val_stockemo","test_stockemo"]:
    path = base / f"{split}.csv"
    df = pd.read_csv(path)
    df = df[['date','ticker','original','senti_label']]
    df.rename(columns={'original':'text', 'senti_label':'sentiment'}, inplace=True)
    df_list.append(df)
df = pd.concat(df_list, ignore_index=True)
df['date'] = pd.to_datetime(df['date']).dt.date
df['sentiment'] = df['sentiment'].map({0:'bearish',1:'bullish'})
df.head()


Unnamed: 0,date,ticker,text,sentiment
0,2020-01-01,AMZN,$AMZN Dow futures up by 100 points already 🥳,
1,2020-01-01,TSLA,$TSLA Daddy's drinkin' eArly tonight! Here's t...,
2,2020-01-01,AAPL,$AAPL We’ll been riding since last December fr...,
3,2020-01-01,TSLA,"$TSLA happy new year, 2020, everyone🍷🎉🙏",
4,2020-01-01,TSLA,"$TSLA haha just a collection of greats...""Mars...",


Shows dataset size, unique tickers

In [2]:
print(df.shape)
print(df['ticker'].nunique())

(10000, 4)
37


Loads daily price data for each stock ticker and stores them in a dictionary.



In [3]:
price_dir = Path("StockEmotions/price")
price_dfs = {}
for file in price_dir.glob("*.csv"):
    tk = file.stem
    dfp = pd.read_csv(file, parse_dates=['Date'])
    dfp['date'] = dfp['Date'].dt.date
    dfp.set_index('date', inplace=True)
    price_dfs[tk] = dfp[['Open','Close','Adj Close']]
len(price_dfs), list(price_dfs.keys())[:5]


(41, ['AAPL', 'ABNB', 'AMT', 'AMZN', 'BA'])

In [4]:
price_dfs["AMZN"]

Unnamed: 0_level_0,Open,Close,Adj Close
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-12-31,92.099998,92.391998,92.391998
2020-01-02,93.750000,94.900497,94.900497
2020-01-03,93.224998,93.748497,93.748497
2020-01-06,93.000000,95.143997,95.143997
2020-01-07,95.224998,95.343002,95.343002
...,...,...,...
2020-12-24,159.695007,158.634506,158.634506
2020-12-28,159.699997,164.197998,164.197998
2020-12-29,165.496994,166.100006,166.100006
2020-12-30,167.050003,164.292496,164.292496


#### Filter by Available Price Data
Removes tweets for tickers with no matching price history.

In [5]:
tickers = set(df['ticker'])
available = set(price_dfs.keys())
print("In tweets but missing price data:", tickers - available)
df = df[df['ticker'].isin(available)].reset_index(drop=True)

In tweets but missing price data: {'BRK.B'}


### Load Pretrained Sentiment Model
Loads the twitter-roberta-base-sentiment model and tokenizer to classify tweet sentiment.

In [6]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "cardiffnlp/twitter-roberta-base-sentiment"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
labels = ['negative', 'neutral', 'positive']


  from .autonotebook import tqdm as notebook_tqdm


### Define Sentiment Prediction Function
This function returns the predicted label and a sentiment score ∈ [–1, 1] based on model probabilities.

In [7]:
def infer_finbert_sentiment(text):
    inputs = tokenizer(text, return_tensors='pt', truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=1)[0].numpy()
    label = labels[probs.argmax()]
    score = probs[2] - probs[0]  # positive - negative → score ∈ [-1, 1]
    return label, float(score)


### Run Sentiment Inference on Tweets
Applies the sentiment model to each tweet and stores the predicted label and score.

In [9]:
from tqdm import tqdm
import pandas as pd

df = df.dropna(subset=["text"]).copy().reset_index(drop=True)

labels_list = []
scores_list = []

for i in tqdm(range(len(df))):
    label, score = infer_finbert_sentiment(df.loc[i, 'text'])
    labels_list.append(label)
    scores_list.append(score)

df['finbert_label'] = labels_list
df['finbert_score'] = scores_list




100%|██████████| 9986/9986 [1:36:03<00:00,  1.73it/s]


### Save Results
Exports the enriched dataset with FinBERT sentiment to a CSV file.

In [10]:
df.to_csv("tweets_with_finbert_sentiment.csv", index=False)

Charger la vérité terrain (senti_label)

In [17]:

base = Path("StockEmotions/tweet") 
df_list = []

for split in ["train_stockemo", "val_stockemo", "test_stockemo"]:
    path = base / f"{split}.csv"
    df = pd.read_csv(path)
    df = df[['date','ticker','original','senti_label']]
    df.rename(columns={'original': 'text'}, inplace=True)
    df_list.append(df)

df_gold = pd.concat(df_list, ignore_index=True)
df_gold['date'] = pd.to_datetime(df_gold['date']).dt.date


df_pred = pd.read_csv("tweets_with_finbert_sentiment.csv")
df_pred['date'] = pd.to_datetime(df_pred['date']).dt.date

df = pd.merge(df_pred, df_gold, on=['date', 'ticker', 'text'], how='inner')

# Vérifions que tout est bien aligné
df[['text', 'senti_label', 'finbert_label', 'finbert_score']].head()




Unnamed: 0,text,senti_label,finbert_label,finbert_score
0,$AMZN Dow futures up by 100 points already 🥳,bullish,positive,0.912322
1,$TSLA Daddy's drinkin' eArly tonight! Here's t...,bullish,positive,0.869287
2,$AAPL We’ll been riding since last December fr...,bullish,neutral,-0.088178
3,"$TSLA happy new year, 2020, everyone🍷🎉🙏",bullish,positive,0.978799
4,"$TSLA haha just a collection of greats...""Mars...",bullish,positive,0.886038


In [18]:
from sklearn.metrics import accuracy_score, classification_report

# Filtrer les cas où finbert_label est positive ou negative
df_strict = df[df['finbert_label'].isin(['positive', 'negative'])].copy()
map_labels = {'positive': 'bullish', 'negative': 'bearish'}
df_strict['finbert_mapped'] = df_strict['finbert_label'].map(map_labels)

# Évaluer
acc1 = accuracy_score(df_strict['senti_label'], df_strict['finbert_mapped'])
print(f"Option 1 – Accuracy (non-neutral only): {acc1:.4f}")
print(classification_report(df_strict['senti_label'], df_strict['finbert_mapped']))


Exception ignored in: <function tqdm.__del__ at 0x000002515AD2DD00>
Traceback (most recent call last):
  File "d:\thése_achraf\EDTA\.venv\Lib\site-packages\tqdm\std.py", line 1148, in __del__
    self.close()
  File "d:\thése_achraf\EDTA\.venv\Lib\site-packages\tqdm\notebook.py", line 279, in close
    self.disp(bar_style='danger', check_delay=False)
AttributeError: 'tqdm_notebook' object has no attribute 'disp'


Option 1 – Accuracy (non-neutral only): 0.6749
              precision    recall  f1-score   support

     bearish       0.63      0.57      0.60      2632
     bullish       0.70      0.75      0.73      3507

    accuracy                           0.67      6139
   macro avg       0.67      0.66      0.66      6139
weighted avg       0.67      0.67      0.67      6139



## Sentiment Model Evaluation – Option 1 (Non-Neutral Predictions Only)

We evaluated the FinBERT-based sentiment classification model on the subset of the dataset where predictions were either `positive` or `negative`, excluding `neutral` cases to focus on high-confidence sentiment detection.

###  Evaluation Metrics

| Sentiment | Precision | Recall | F1-Score | Support |
|-----------|-----------|--------|----------|---------|
| Bearish   | 0.63      | 0.57   | 0.60     | 2632    |
| Bullish   | 0.70      | 0.75   | 0.73     | 3507    |
| **Overall Accuracy** |       |        |          | **67.49%** |




In [20]:
df['finbert_binary'] = df['finbert_score'].apply(lambda x: 'bullish' if x > 0 else 'bearish')

# Évaluer
df_all = df.dropna(subset=['senti_label', 'finbert_binary'])
acc2 = accuracy_score(df_all['senti_label'], df_all['finbert_binary'])
print(f"Option 2 – Accuracy (all): {acc2:.4f}")
print(classification_report(df_all['senti_label'], df_all['finbert_binary']))


Option 2 – Accuracy (all): 0.6366
              precision    recall  f1-score   support

     bearish       0.61      0.54      0.58      4522
     bullish       0.65      0.71      0.68      5464

    accuracy                           0.64      9986
   macro avg       0.63      0.63      0.63      9986
weighted avg       0.63      0.64      0.63      9986



##  Sentiment Model Evaluation – Option 2 (Full Dataset Using Score > 0)

In this evaluation, we used the entire dataset by converting the continuous `finbert_score` into binary sentiment:

- `bullish` if score > 0
- `bearish` if score <= 0

This includes all tweets, even those that may have had ambiguous or borderline sentiment.

###  Evaluation Metrics

| Sentiment | Precision | Recall | F1-Score | Support |
|-----------|-----------|--------|----------|---------|
| Bearish   | 0.61      | 0.54   | 0.58     | 4522    |
| Bullish   | 0.65      | 0.71   | 0.68     | 5464    |
| **Overall Accuracy** |       |        |          | **63.66%** |



##  Conclusion

The sentiment model based on FinBERT performs reasonably well, especially on clearly positive (bullish) tweets. It shows limitations on bearish detection but remains a solid foundation for further development.

- **Final model score** (macro F1 average across both tests): **0.645**

##  Next Step

- Use sentiment scores to **generate trading signals**
- Plan a **comparison with other models** ( FinGPT, LLMs with justification/explanation)
- Prepare for **adding explainability (XAI)** to better interpret each sentiment prediction