# Sentiment Analysis

This notebook applies NLP-based sentiment analysis techniques
(FinBERT and VADER) to preprocessed financial text data and
generates quantitative sentiment scores for downstream analysis.


In [1]:
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [2]:
df = pd.read_csv("../data/processed/text_preprocessed.csv", parse_dates=["timestamp"])
df.head()

Unnamed: 0,timestamp,text,source,asset,channel,clean_text
0,2024-10-03 01:15:00+00:00,Momentum Funds : Momentum funds with 4x rise i...,,NIFTY,news_gdelt,momentum funds momentum funds with x rise in a...
1,2024-10-03 03:00:00+00:00,Indian stock market : 10 key things that chang...,,NIFTY,news_gdelt,indian stock market key things that changed fo...
2,2024-10-03 03:00:00+00:00,"Nifty 50 , Sensex today : What to expect from ...",,NIFTY,news_gdelt,nifty sensex today what to expect from indian ...
3,2024-10-03 06:00:00+00:00,Bitcoin Price Decline Forces $450M in Long Liq...,,BTC,news_gdelt,bitcoin price decline forces m in long liquida...
4,2024-10-03 07:00:00+00:00,Stock Market : शेयर बाजार में बड़ी गिरावट ... ...,,NIFTY,news_gdelt,stock market


In [3]:
tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")
model.eval()

vader = SentimentIntensityAnalyzer()

In [4]:
def finbert_score(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=512
    )

    with torch.no_grad():
        outputs = model(**inputs)

    probs = torch.softmax(outputs.logits, dim=1).numpy()[0]

    # FinBERT order: negative, neutral, positive
    return probs[2] - probs[0]

In [5]:
def vader_score(text):
    return vader.polarity_scores(text)["compound"]

In [6]:
df["finbert_score"] = df["clean_text"].apply(finbert_score)
df["vader_score"] = df["clean_text"].apply(vader_score)

In [7]:
df.to_csv("../data/processed/text_with_sentiment.csv", index=False)
print("Sentiment scoring completed and saved.")

Sentiment scoring completed and saved.


## Observations

- FinBERT produces smoother, context-aware sentiment scores
- VADER is more sensitive to surface-level polarity
- Differences are more pronounced in financial-specific language
