# Option Viz — Full Suite Test

End‑to‑end validation of the pipeline:

1. **Data**: fetch option chains (equity via `yfinance`, crypto via OKX public)
2. **Preprocess**: robust mids/flags → forward & log‑moneyness helpers
3. **SVI (per expiry)**: fit total variance smiles
4. **Surface (across maturities)**: smooth params over time; calendar checks
5. **No‑arb diagnostics**: butterfly convexity on raw call mids
6. **Artifacts**: smile overlay plots and tidy CSV exports

> Designed to be **copy‑runnable** on a fresh clone after installing `requirements.txt`. No API keys required.

## 📦 What this notebook uses

- `src/data`: `registry`, `historical_loader`, `risk_free`
- `src/preprocess`: `midprice` (+ optional `pcp`, `forward` if present)
- `src/vol`: `svi`, `surface`, `no_arb`

If a helper is missing (e.g., `estimate_forward_from_pcp`), we **fallback** to a basic carry model forward.

In [None]:
# Make src/ importable and enable nested asyncio (required in Jupyter)
import sys, platform
from pathlib import Path

repo = Path.cwd()
sys.path.insert(0, str((repo / 'src').resolve()))

import nest_asyncio

nest_asyncio.apply()

import asyncio
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

print(
    'Python:',
    sys.version.split()[0],
    '| OS:',
    platform.system(),
    platform.release(),
)
%matplotlib inline

## 🔧 Parameters

- You can test **multiple (asset_class, underlying)** pairs in one run.
- We fit a handful of **nearest expiries** to keep execution quick.

In [None]:
# Pairs to run (toggle as you wish)
runs = [
    ("equity", "AAPL"),
    ("crypto", "BTC"),
]

# How many expiries per asset to fit
n_expiries = 3
expiry_offset = 0  # set to 1 if the very front is too illiquid

# Preprocess thresholds
wide_rel_threshold = 0.20  # flag wide if (ask-bid)/mid > 20%

# Output dir for artifacts
out_dir = Path('out')
out_dir.mkdir(exist_ok=True)

# Plotting
figsize = (6, 4)
show_scatter = True

## 🧱 Imports from the project

We import backends, preprocess helpers, and the SVI/surface/no‑arb modules.

In [None]:
from data.registry import get_fetcher
from data.historical_loader import chain_to_dataframe
from data.risk_free import RiskFreeProvider, RiskFreeConfig
from preprocess.midprice import add_midprice_columns

try:
    from preprocess.forward import estimate_forward_from_pcp

    HAVE_FORWARD = True
except Exception:
    HAVE_FORWARD = False
try:
    from preprocess.pcp import synthesize_missing_leg

    HAVE_PCP = True
except Exception:
    HAVE_PCP = False

from vol.svi import prepare_smile_data
from vol.surface import fit_surface_from_frames, smooth_params, sample_grid
from vol.no_arb import butterfly_violations

print('Helpers: forward=', HAVE_FORWARD, ', pcp=', HAVE_PCP)

## 🧮 Helper functions

Small utilities to compute year fractions, forwards, and to fetch & prepare frames/maps needed for SVI fits.

In [None]:
def yearfrac(asof: datetime, expiry: datetime) -> float:
    """ACT/365.25 approximation; guard against T≈0."""
    return max((expiry - asof).total_seconds() / 31557600.0, 1e-12)


def forward_from_carry(spot: float, r: float, T: float) -> float:
    """No-dividend forward: F = S * exp(rT)."""
    return float(spot) * float(np.exp(r * T))


def extract_call_curve(df: pd.DataFrame):
    """Return sorted (K, C_mid) arrays for call quotes only, if present."""
    d = df.copy()
    d = d[d['type'].astype(str).str.upper().str.startswith('C')]
    d = d.dropna(subset=['strike', 'mid'])
    d = d.sort_values('strike')
    return d['strike'].to_numpy(float), d['mid'].to_numpy(float)


async def build_frames(
    asset_class: str,
    underlying: str,
    *,
    n_expiries: int,
    expiry_offset: int,
    wide_rel_threshold: float,
):
    """
    Fetch `n_expiries` maturities for (asset_class, underlying), build mids/flags, and
    assemble per-expiry T/F/r maps.
    - r: SOFR via RiskFreeProvider if available, else 0.0 fallback
    - F: prefer PCP estimator if provided; otherwise carry model
    """
    fetcher = get_fetcher(asset_class)
    expiries = sorted(await fetcher.list_expiries(underlying))
    chosen = expiries[expiry_offset : expiry_offset + n_expiries]
    rf = RiskFreeProvider(RiskFreeConfig())

    frames = {}
    T_map, F_map, r_map = {}, {}, {}
    asof_global = None

    for exp in chosen:
        chain = await fetcher.fetch_chain(underlying, exp)
        if asof_global is None:
            asof_global = chain.asof_utc
        df = chain_to_dataframe(chain)
        df = add_midprice_columns(df, wide_rel_threshold=wide_rel_threshold)

        # Risk-free (graceful fallback to 0.0)
        try:
            r = float(rf.get_rate(chain.asof_utc.date()))
        except Exception:
            r = 0.0

        # Time to maturity
        T = yearfrac(chain.asof_utc, exp)

        # Forward estimate
        if HAVE_FORWARD:
            try:
                F_est = float(estimate_forward_from_pcp(df, r=r, T=T))
            except Exception:
                F_est = forward_from_carry(chain.spot, r, T)
        else:
            F_est = forward_from_carry(chain.spot, r, T)

        frames[exp] = df
        T_map[exp], F_map[exp], r_map[exp] = T, F_est, r

    return frames, T_map, F_map, r_map, asof_global

## 🚀 Run the suite for each (asset_class, underlying)

For each pair we:
1. Fetch & preprocess
2. Fit SVI per expiry and stitch a surface
3. Smooth params over T
4. Calendar and butterfly checks
5. Plot a smile overlay and export artifacts
6. Append metrics to a consolidated results table

In [None]:
all_rows = []

for asset_class, underlying in runs:
    print(f"\n=== {asset_class.upper()} — {underlying} ===")
    try:
        frames_by_expiry, T_map, F_map, r_map, ASOF = asyncio.run(
            build_frames(
                asset_class,
                underlying,
                n_expiries=n_expiries,
                expiry_offset=expiry_offset,
                wide_rel_threshold=wide_rel_threshold,
            )
        )

        # Fit + smooth surface
        surface = fit_surface_from_frames(
            frames_by_expiry,
            T_by_expiry=T_map,
            F_by_expiry=F_map,
            r_by_expiry=r_map,
        )
        surf_smooth = smooth_params(surface, method='cubic_spline')

        # Calendar diag
        k_grid = np.linspace(-1.0, 1.0, 101)
        cal = surf_smooth.calendar_violations(k_grid=k_grid, tol=-1e-9)

        # Butterfly on first expiry raw calls
        first_exp = sorted(frames_by_expiry.keys())[0]
        df0 = frames_by_expiry[first_exp]
        K_calls, C_calls = extract_call_curve(df0)
        if K_calls.size >= 5:
            butter = butterfly_violations(K_calls, C_calls, tol=1e-8)
        else:
            butter = {"fraction": 0.0, "count": 0, "n_interior": 0}

        # Plot smile overlay for first expiry
        F0 = float(F_map[first_exp])
        K_line = np.linspace(
            float(df0['strike'].min()), float(df0['strike'].max()), 200
        )
        iv_line = surf_smooth.iv(K_line, first_exp)
        k_line = np.log(K_line / F0)
        plt.figure(figsize=figsize)
        if show_scatter and 'iv' in df0.columns and df0['iv'].notna().any():
            plt.scatter(
                np.log(df0['strike'] / F0), df0['iv'], s=10, label='market iv'
            )
        plt.plot(k_line, iv_line, label='SVI fit (surface)')
        plt.xlabel('log-moneyness  k')
        plt.ylabel('implied vol')
        plt.legend()
        plt.tight_layout()
        plt.show()

        # Export artifacts
        png_path = out_dir / f'smile_{asset_class}_{underlying}.png'
        plt.figure(figsize=(6, 4))
        plt.plot(k_line, iv_line)
        plt.xlabel('log-moneyness  k')
        plt.ylabel('implied vol')
        plt.tight_layout()
        plt.savefig(png_path, dpi=144)

        # Grid export over all selected expiries
        K_list = np.linspace(
            float(df0['strike'].min()), float(df0['strike'].max()), 25
        )
        grid_df = sample_grid(
            surf_smooth,
            K_list=K_list,
            expiry_list=sorted(frames_by_expiry.keys()),
        )
        csv_path = out_dir / f'grid_{asset_class}_{underlying}.csv'
        grid_df.to_csv(csv_path, index=False)

        # Summarize per-expiry fit metrics
        for exp, fit in surf_smooth.fits.items():
            all_rows.append(
                {
                    "asset_class": asset_class,
                    "underlying": underlying,
                    "asof": ASOF.isoformat() if ASOF else None,
                    "expiry": exp.date().isoformat(),
                    "T": float(T_map[exp]),
                    "F": float(F_map[exp]),
                    "rho": float(fit.rho),
                    "sigma": float(fit.sigma),
                    "loss": float(getattr(fit, 'loss', np.nan)),
                    "n_used": int(getattr(fit, 'n_used', 0)),
                    "calendar_frac": float(cal.get('fraction', np.nan)),
                    "butterfly_frac": float(butter.get('fraction', 0.0)),
                    "notes": getattr(fit, 'notes', ''),
                }
            )

    except Exception as e:
        print('ERROR:', e)
        continue

summary_df = pd.DataFrame(all_rows)
summary_df

## 📊 Results summary

- `loss` is the (vega‑weighted) SVI fit objective (lower is better)
- `n_used` counts quotes included in the fit
- `calendar_frac` is the fraction of k‑grid points that violate calendar monotonicity (target **≈ 0**) 
- `butterfly_frac` is the share of interior points where raw call mids show negative discrete curvature (target **≈ 0**)

In [None]:
sum_csv = out_dir / 'suite_summary.csv'
summary_df.to_csv(sum_csv, index=False)
sum_csv

## 🧩 Appendix — troubleshooting tips

- If the very front expiry is noisy/empty, set `expiry_offset = 1`.
- If `RiskFreeProvider` has no SOFR CSV in your environment, the rate falls back to **0.0** (fine for a quick test).
- If `estimate_forward_from_pcp` raises (sparse pairs), we fallback to **carry**: `F = S * exp(rT)`.
- Calendar checks use a simple grid `k ∈ [−1, 1]`. For assets with very wide moneyness, expand this grid.
- Plots hide styling to keep dependencies minimal; feel free to adapt for your report.