# 2. Backfill FRED Macroeconomic Data

Fetch historical DGS10 (10-year Treasury yield) and CPIAUCSL (CPI) from FRED and upload to Hopsworks.

**Important**: This notebook fetches RAW data only. Point-in-time correctness for CPI release dates is handled in notebook 5_macro_sentiment_features.

**Pipeline**: FRED API → Hopsworks Feature Groups (raw)

In [None]:
import sys
sys.path.append('..')

import pandas as pd
from utils.data_fetchers import fetch_dgs10, fetch_cpi
from utils.hopsworks_helpers import get_feature_store, create_feature_group
from dotenv import load_dotenv
import yaml

load_dotenv()

# Load config
with open('../config/config.yaml', 'r') as f:
    config = yaml.safe_load(f)

## Fetch 10-Year Treasury Yield (DGS10)

DGS10 is a daily series. Missing values (weekends/holidays) will be forward-filled when creating daily features (no look-ahead bias since yields are known in real-time).

In [None]:
start_date = config['data']['start_date']
end_date = config['data']['end_date']

print(f"Fetching DGS10 from {start_date} to {end_date}...")
dgs10_data = fetch_dgs10(start_date, end_date)

print(f"\nDGS10 data shape: {dgs10_data.shape}")
print(f"Date range: {dgs10_data['date'].min()} to {dgs10_data['date'].max()}")
print(f"Missing values: {dgs10_data['dgs10'].isna().sum()}")
dgs10_data.head(10)

In [None]:
# Basic statistics
dgs10_data['dgs10'].describe()

## Fetch CPI Data (CPIAUCSL)

**CRITICAL**: CPI is a monthly series. The 'date' column in FRED represents the **reference month** (e.g., 2024-01-01 for January 2024), NOT the release date.

To avoid look-ahead bias:
- CPI for month M is typically released ~15th of month M+1
- Point-in-time alignment is handled in notebook 5 using `make_macro_daily_features()`

In [None]:
print(f"Fetching CPIAUCSL from {start_date} to {end_date}...")
cpi_data = fetch_cpi(start_date, end_date)

print(f"\nCPI data shape: {cpi_data.shape}")
print(f"Date range: {cpi_data['date'].min()} to {cpi_data['date'].max()}")
print(f"Missing values: {cpi_data['cpiaucsl'].isna().sum()}")
print(f"\nCPI is monthly, so we have ~{cpi_data.shape[0]} observations for ~2 years")
cpi_data.head(10)

In [None]:
# Verify monthly frequency
cpi_data['month_diff'] = cpi_data['date'].diff().dt.days
print("\nDays between CPI observations (should be ~28-31):")
print(cpi_data['month_diff'].describe())
cpi_data = cpi_data.drop(columns=['month_diff'])

In [None]:
# Basic statistics
cpi_data['cpiaucsl'].describe()

## Upload to Hopsworks Feature Store

Create raw feature groups for FRED data. These will be read by notebook 5 for point-in-time correct feature engineering.

In [None]:
# Connect to Hopsworks
print("Connecting to Hopsworks...")
fs = get_feature_store()
print(f"✓ Connected to feature store: {fs.name}")

In [None]:
# Create DGS10 feature group
print("Creating DGS10 feature group...")
dgs10_fg = create_feature_group(
    fs,
    name='dgs10_raw',
    df=dgs10_data,
    primary_key=['date'],
    description='Raw 10-year Treasury Constant Maturity Rate (DGS10) from FRED - daily series'
)
print(f"✓ Created feature group: dgs10_raw (version {dgs10_fg.version})")

In [None]:
# Create CPI feature group
print("Creating CPI feature group...")
cpi_fg = create_feature_group(
    fs,
    name='cpi_raw',
    df=cpi_data,
    primary_key=['date'],
    description='Raw Consumer Price Index (CPIAUCSL) from FRED - monthly series with REFERENCE month dates (not release dates)'
)
print(f"✓ Created feature group: cpi_raw (version {cpi_fg.version})")

## Summary

✅ FRED data successfully uploaded to Hopsworks Feature Store:
- **dgs10_raw**: Daily 10-year Treasury yield
- **cpi_raw**: Monthly Consumer Price Index (reference month dates)

**⚠️ Important Note on CPI**:
- The dates in `cpi_raw` represent the **reference month** (data collection month)
- CPI for January 2024 (date=2024-01-01) is typically **released** in mid-February 2024
- Notebook 5 will handle release date alignment to ensure point-in-time correctness

**Next steps**:
- Run notebook 3 to backfill news sentiment data
- Run notebook 5 to create point-in-time correct macro features