# Task 3 — Sentiment Analysis & Correlation with Stock Returns

This notebook performs Task-3 of the Financial News & Stock Movement project:
- Sentiment analysis on financial news headlines  
- Aggregation of sentiment by stock and date  
- Calculation of daily stock returns  
- Correlation analysis between sentiment and price movement  


In [None]:
import pandas as pd
import numpy as np
from textblob import TextBlob
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")


## 1. Load Cleaned News Dataset

I use the processed news dataset created in Task 1.  
This file should contain: headline, date, publisher, ticker, headline_length.


In [None]:
news_path = "../data/processed/news_cleaned.csv"
news = pd.read_csv(news_path)

# Convert date
news["date"] = pd.to_datetime(news["date"])
news.head()


## 2. Sentiment Analysis

compute:
- Polarity score → TextBlob sentiment (-1 to +1)
- Subjectivity score (optional)


In [None]:
def get_polarity(text):
    try:
        return TextBlob(text).sentiment.polarity
    except:
        return 0

news["sentiment"] = news["headline"].apply(get_polarity)
news.head()


## 3. Daily Sentiment Aggregation

Because multiple headlines occur per stock per day: 
compute:

- mean sentiment  
- max sentiment  
- min sentiment  
- headline count


In [None]:
daily_sentiment = (
    news.groupby(["ticker", news["date"].dt.date])
    .agg(
        avg_sentiment=("sentiment", "mean"),
        min_sentiment=("sentiment", "min"),
        max_sentiment=("sentiment", "max"),
        news_count=("headline", "count")
    )
    .reset_index()
    .rename(columns={"date": "Date"})
)

daily_sentiment["Date"] = pd.to_datetime(daily_sentiment["Date"])
daily_sentiment.head()


## 4. Load Technical Indicator Dataset

 merge sentiment with the indicator-enhanced price data created in Task-2.


In [None]:
price_path = "../data/processed/price_indicators.csv"
price = pd.read_csv(price_path)

price["Date"] = pd.to_datetime(price["Date"])
price.head()


## 5. Merge Datasets on Ticker and Date

This produces one dataset containing:
- sentiment features
- technical indicators
- prices
- volume


In [None]:
merged = pd.merge(price, daily_sentiment,
                  on=["ticker", "Date"],
                  how="left")

merged.head()


## 6. Daily Stock Returns

compute percentage change of closing price:



In [None]:
merged["daily_return"] = merged.groupby("ticker")["Close"].pct_change()
merged.head()


## 7. Correlation Between Sentiment and Returns

test:
- Pearson correlation  
- Scatter plots  
- 7-day rolling correlation  


In [None]:
corr = merged[["avg_sentiment", "daily_return"]].corr()
corr


## 8. Visualizations

 create:
- sentiment distribution plot  
- scatter plot sentiment vs return  
- rolling correlation (optional)


In [None]:
plt.figure(figsize=(7,5))
sns.scatterplot(data=merged, x="avg_sentiment", y="daily_return")
plt.title("Sentiment vs Daily Return")
plt.show()


## 9. Save Final Merged Dataset
Used for modeling in the next stage.


In [None]:
output_path = "../data/processed/final_merged_dataset.csv"
merged.to_csv(output_path, index=False)

output_path


# ✅ Task 3 Completed

now have:
- Sentiment scores  
- Aggregated daily sentiment  
- Technical indicators  
- Daily returns  
- Correlation analysis  
- Fully merged dataset for modeling  

