# 📊 Multi-Source Data Aggregation for Market Context Modeling

This notebook builds a consolidated pipeline for collecting and merging:

- **Market data** from Yahoo Finance  
- **Macroeconomic indicators** from the FRED API  
- **Sentiment signals** from Google Trends  

These sources power downstream analysis and modeling for the Pendle project.

## 📦 Setup & Imports

In [None]:
import sys, os
import pandas as pd
sys.path.append(os.path.abspath("../scripts"))

## 📈 Market Data: Yahoo Finance

Daily OHLCV data for:
- `^GSPC`, `^IXIC`, `^VIX`, `BTC-USD`  
📁 Output: `../data/raw/yahoo_market_data.csv`

In [None]:
from fetch_yahoo import fetch_market_data

# ⚠️ UNCOMMENT TO RUN ONCE
# market_df = fetch_market_data(start="2000-01-01", save_path="../data/raw/yahoo_market_data.csv")

# ✅ Load from cache
market_df = pd.read_csv("../data/raw/yahoo_market_data.csv", index_col=0, parse_dates=True)
print("📅 Yahoo Finance Date Range:", market_df.index.min(), "→", market_df.index.max())
market_df.head()

## 📊 Macroeconomic Indicators: FRED

Selected indicators:
- Bond Yields (`DGS10`)
- Inflation (`CPIAUCSL`)
- Unemployment (`UNRATE`)
- Interest Rates (`FEDFUNDS`)
- Consumer Sentiment (`UMCSENT`)
- GDP (`GDP`)  
📁 Output: `../data/raw/fred_macro_data.csv`

In [None]:
from fetch_fred import fetch_fred_series

fred_series = {
    "Bond Yields": "DGS10",
    "Inflation": "CPIAUCSL",
    "Unemployment": "UNRATE",
    "Interest Rate": "FEDFUNDS",
    "Consumer Sentiment": "UMCSENT",
    "GDP": "GDP"
}

# ⚠️ UNCOMMENT TO RUN ONCE
# macro_df = fetch_fred_series(fred_series, start="2000-01-01", save_path="../data/raw/fred_macro_data.csv")

# ✅ Load from cache
macro_df = pd.read_csv("../data/raw/fred_macro_data.csv", index_col=0, parse_dates=True)
print("📅 FRED Date Range:", macro_df.index.min(), "→", macro_df.index.max())
macro_df.head()

## 🧠 Sentiment Signals: Google Trends

Terms:
- `"market crash"`, `"recession"`, `"buy gold"`, `"stock market crash"`  
📁 Output: Chunked into 5 raw files → merged to `../data/processed/google_trends_full.csv`

In [None]:
from fetch_sentiment import fetch_google_trends_by_year

terms = ["market crash", "recession", "buy gold", "stock market crash"]

### ✅ Load and Merge Sentiment Chunks

In [None]:
sentiment_chunks = ["2004_2007", "2008_2011", "2012_2015", "2016_2019", "2020_2025"]

dfs = []
for span in sentiment_chunks:
    df = pd.read_csv(f"../data/raw/google_trends_{span}.csv", index_col=0, parse_dates=True)
    print(f"📅 Trends Chunk {span}:", df.index.min(), "→", df.index.max())
    dfs.append(df)

sentiment_df = pd.concat(dfs).sort_index()
sentiment_df.to_csv("../data/processed/google_trends_full.csv")
print("✅ Final merged sentiment dataset saved.")

## 📦 Final Merge: Market + Macro + Sentiment

In [None]:
# 📥 Load all cleaned datasets
market_df = pd.read_csv("../data/raw/yahoo_market_data.csv", index_col=0, parse_dates=True)
macro_df = pd.read_csv("../data/raw/fred_macro_data.csv", index_col=0, parse_dates=True)
sentiment_df = pd.read_csv("../data/processed/google_trends_full.csv", index_col=0, parse_dates=True)

# 🔗 Merge all sources
merged = market_df.join(macro_df, how="outer").join(sentiment_df, how="outer")
merged = merged.sort_index()

# 💾 Save full merged dataset
merged.to_csv("../data/processed/merged_all_sources.csv")

# 👀 Preview merged output
print("📊 Final Merged Dataset:")
print("Date Range:", merged.index.min(), "→", merged.index.max())
merged.head()

## ✅ Summary
We successfully built a modular and interpretable ingestion pipeline for:

- ✅ Daily market data (`yfinance`)
- ✅ Macroeconomic indicators (`FRED`)
- ✅ Sentiment trends (`Google Trends` via `pytrends`)

📁 All data has been cached, aligned, and merged into a single file:
```
../data/processed/merged_all_sources.csv
```

Next steps: **EDA, lag feature engineering, and predictive modeling.**