# Yield Curve Lab: Exploration Notebook

This notebook is designed for people who just cloned the repo and want a guided tour of the outputs.

You will:
- verify required files exist,
- inspect the US Treasury yield curve history,
- review Nelson-Siegel fitted parameters,
- review PCA factor structure,
- inspect scenario and risk outputs.

If outputs are missing, run scripts `01` to `06` first.

## 1) Setup and File Checks

In [None]:
from __future__ import annotations

from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.style.use("default")
pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 140)

PROJECT_ROOT = Path(".").resolve().parent
DATA_PROCESSED = PROJECT_ROOT / "data" / "processed"

print("Project root:", PROJECT_ROOT)
print("Processed data path:", DATA_PROCESSED)

In [None]:
required_outputs = [
    "yield_curve_long.parquet",
    "yield_curve_wide.parquet",
    "nelson_siegel_params.parquet",
    "pca_loadings.parquet",
    "pca_scores.parquet",
    "pca_explained_variance.parquet",
    "scenario_summary.parquet",
    "scenario_curves_parametric.parquet",
    "risk_metrics.csv",
]

status_rows = []
for name in required_outputs:
    path = DATA_PROCESSED / name
    status_rows.append({"file": name, "exists": path.exists()})

status_df = pd.DataFrame(status_rows)
display(status_df)

missing = status_df.loc[~status_df["exists"], "file"].tolist()
if missing:
    print("\nMissing outputs detected. Run scripts 01 -> 06 before continuing:")
    for i in range(1, 7):
        print(f"python scripts/{i:02d}_*.py")
else:
    print("\nAll required outputs are present.")

## 2) Load Core Datasets

In [None]:
long_df = pd.read_parquet(DATA_PROCESSED / "yield_curve_long.parquet")
wide_df = pd.read_parquet(DATA_PROCESSED / "yield_curve_wide.parquet")

long_df["date"] = pd.to_datetime(long_df["date"])
wide_df.index = pd.to_datetime(wide_df.index)
wide_df = wide_df.sort_index().sort_index(axis=1)

print("Long shape:", long_df.shape)
print("Wide shape:", wide_df.shape)
print("Date range:", wide_df.index.min().date(), "to", wide_df.index.max().date())

display(long_df.head())
display(wide_df.tail())

In [None]:
# Latest curve plot (percent)
latest_curve = wide_df.iloc[-1]
x = latest_curve.index.astype(float)
y = latest_curve.values * 100.0

fig, ax = plt.subplots(figsize=(8, 4.5))
ax.plot(x, y, marker="o")
ax.set_title(f"Latest Treasury Curve ({wide_df.index[-1].date()})")
ax.set_xlabel("Maturity (Years)")
ax.set_ylabel("Yield (%)")
ax.grid(alpha=0.3)
plt.show()

## 3) Daily Yield Changes (for PCA intuition)

In [None]:
delta = wide_df.diff().dropna(how="all").interpolate(limit_direction="both").dropna(how="any")
delta_bp = delta * 10_000.0

print("Delta shape:", delta_bp.shape)
display(delta_bp.describe().T[["mean", "std", "min", "max"]].round(3))

fig, ax = plt.subplots(figsize=(9, 4.5))
for maturity in [2.0, 5.0, 10.0, 30.0]:
    if maturity in delta_bp.columns:
        ax.plot(delta_bp.index, delta_bp[maturity], label=f"{maturity:g}Y")
ax.set_title("Daily Yield Changes (bp)")
ax.set_xlabel("Date")
ax.set_ylabel("Change (bp)")
ax.grid(alpha=0.3)
ax.legend()
plt.show()

## 4) Nelson-Siegel Parameter Review

In [None]:
ns_params = pd.read_parquet(DATA_PROCESSED / "nelson_siegel_params.parquet")
ns_params.index = pd.to_datetime(ns_params.index)

display(ns_params.tail())

fig, axes = plt.subplots(2, 2, figsize=(10, 6), sharex=True)
axes = axes.flatten()
for ax, col in zip(axes, ["beta0", "beta1", "beta2", "tau"]):
    ax.plot(ns_params.index, ns_params[col])
    ax.set_title(col)
    ax.grid(alpha=0.3)
fig.suptitle("Nelson-Siegel Parameter History", y=1.02)
fig.tight_layout()
plt.show()

## 5) PCA Factors and Interpretation

In [None]:
loadings = pd.read_parquet(DATA_PROCESSED / "pca_loadings.parquet")
scores = pd.read_parquet(DATA_PROCESSED / "pca_scores.parquet")
explained = pd.read_parquet(DATA_PROCESSED / "pca_explained_variance.parquet")
scores.index = pd.to_datetime(scores.index)

display(explained)

fig, ax = plt.subplots(figsize=(8, 4.5))
for pc in loadings.index:
    ax.plot(loadings.columns.astype(float), loadings.loc[pc], marker="o", label=pc)
ax.set_title("PCA Loadings by Maturity")
ax.set_xlabel("Maturity (Years)")
ax.set_ylabel("Loading")
ax.grid(alpha=0.3)
ax.legend()
plt.show()

fig, ax = plt.subplots(figsize=(10, 4.5))
for col in scores.columns:
    ax.plot(scores.index, scores[col], label=col)
ax.set_title("PCA Factor Scores")
ax.set_xlabel("Date")
ax.set_ylabel("Score")
ax.grid(alpha=0.3)
ax.legend()
plt.show()

## 6) Scenario Results

In [None]:
scenario_summary = pd.read_parquet(DATA_PROCESSED / "scenario_summary.parquet")
display(scenario_summary.groupby("method")[["y10_change_bp", "s2s10_change_bp"]].describe().round(2))

fig, axes = plt.subplots(1, 2, figsize=(11, 4.2))
axes[0].hist(scenario_summary["y10_change_bp"], bins=40)
axes[0].set_title("10Y Yield Change Distribution")
axes[0].set_xlabel("bp")
axes[0].grid(alpha=0.3)

axes[1].hist(scenario_summary["s2s10_change_bp"], bins=40)
axes[1].set_title("2s10s Spread Change Distribution")
axes[1].set_xlabel("bp")
axes[1].grid(alpha=0.3)

fig.tight_layout()
plt.show()

## 7) Portfolio Risk Summary

In [None]:
risk_metrics = pd.read_csv(DATA_PROCESSED / "risk_metrics.csv")
display(risk_metrics)

print("Interpretation:")
print("- VaR_95 is the loss level exceeded only 5% of the time (under these scenarios).")
print("- ES_95 is the average loss in that worst 5% tail.")

## 8) Re-run Pipeline Commands (PowerShell)

Use these commands from the project root:

```powershell
python scripts/01_download_and_plot_curve.py
python scripts/02_build_history_dataset.py --n-days 252
python scripts/03_fit_nelson_siegel.py
python scripts/04_pca_factors.py --n-components 3
python scripts/05_generate_scenarios.py --n-scenarios 1000 --seed 42
python scripts/06_portfolio_risk_demo.py
pytest -q
```

Tip: Because `data/` is gitignored, each user should run the pipeline locally to generate outputs.