# ERIS Course Pipeline — Full Walkthrough

This notebook walks through the **ML for empirical asset pricing** pipeline: load data, expanding-window validation, baseline models, regime-aware NN, regime detection, portfolio, and interpretability. It matches what `scripts/run_offline_pipeline.py` does and is provided as the **clean .ipynb deliverable** for the course.

**Prerequisites:** Parquet files in `Data1/` (2001–2021). Run from **project root** or set `sys.path` accordingly.

In [None]:
import sys
from pathlib import Path
ROOT = Path.cwd().parent if "notebooks" in str(Path.cwd()) else Path.cwd()
sys.path.insert(0, str(ROOT))

## 1. Load panel and feature columns

In [None]:
from data.loaders.course_data import load_course_panel, get_feature_columns

panel = load_course_panel()
cols = get_feature_columns(panel)
feature_cols = cols["all_features"]
macro_cols = cols["macro"]
char_cols = cols["characteristic"]
print("Panel shape:", panel.shape)
print("Features:", len(feature_cols), "Macro:", len(macro_cols), "Char:", len(char_cols))

## 2. Expanding-window validation and baselines

In [None]:
from ml.baselines import run_expanding_window_baselines

predictions_df, baseline_metrics = run_expanding_window_baselines(
    panel, feature_cols, first_prediction_year=2010,
    model_names=["OLS", "Ridge", "RF", "XGBoost"],
)
print("OOS R²:", {k: round(v["oos_r2"], 4) for k, v in baseline_metrics.items()})

## 3. Regime detection (HMM) and stress index

In [None]:
from ml.regime_detection import run_regime_and_stress

regime_df, macro_monthly = run_regime_and_stress(panel, macro_cols)
print(regime_df["regime_label"].value_counts())

## 4. Portfolio (decile long–short)

In [None]:
from ml.portfolio import portfolio_metrics

pred_col = "pred_XGBoost" if "pred_XGBoost" in predictions_df.columns else "pred_OLS"
port_df, port_metrics = portfolio_metrics(predictions_df, panel, pred_col=pred_col)
print("Sharpe:", port_metrics["sharpe_ratio"])
print("Max DD:", port_metrics["max_drawdown"])
print("Alpha (ann.):", port_metrics["annualized_alpha"])

## 5. Regime-conditional OOS R²

In [None]:
from ml.validation import regime_conditional_r2

r2_by_regime = regime_conditional_r2(predictions_df, regime_df, pred_col=pred_col)
import pandas as pd
print(pd.DataFrame(r2_by_regime).T)

## 6. Summary

The full pipeline (including Regime-Aware NN and SHAP by regime) is run via:

```bash
python scripts/run_offline_pipeline.py
```

Results are written to `data/processed/course/` and can be viewed in the Streamlit **Course ML** page or the static `dashboard/`.