# Overview of Project

# Data Generation 

# Explanation/ Reasoning 

## Data Generation for Worst-of Basket Options (Train & Test)

## 1) Product & Pricing Model

We generate labels for a **European call on the worst-of an equity basket** with strike $K$:

$$
\text{payoff} \;=\; \max\!\big(0,\, \min_i S_i(T) - K\big).
$$

Under the risk-neutral measure, each asset $S_i$ follows geometric Brownian motion with **flat** vol and **constant** short rate $r$:

$$
\frac{dS_i(t)}{S_i(t)} \;=\; r\,dt \;+\; \sigma_i\,dW_i(t), 
\qquad \mathbb{E}[dW_i\,dW_j]=\rho_{ij}\,dt .
$$

For a maturity $T$, terminal prices along a Monte-Carlo path are simulated in log-space:

$$
S_i(T) \;=\; \exp\!\Big(\log S_i(0) + (r-\tfrac12\sigma_i^2)T + \sigma_i\sqrt{T}\,Y_i\Big),
$$

where $Y \sim \mathcal{N}(0,\Sigma)$ with $\Sigma$ built from a sampled correlation matrix $\rho$, and the discounted payoff is $e^{-rT}$ times the expression above.

**Why worst-of?** It yields a moderately high-dimensional but still tractable example; its valuation is typically **Monte-Carlo-only**, and it exhibits strong curvature for **short-dated, near-ATM** scenarios—exactly the kind of landscape that exposes strengths/weaknesses in data sampling. This mirrors the canonical setup in Ferguson & Green (2018).&#x20;

---

## 2) Scenario Sampling Strategy (Inputs → Features)

We follow the *“random scenarios, broad but purposeful coverage”* philosophy from Ferguson & Green, adapted to your implementation:

* **Spots $S_{0,i}$:**
  Sampled as $100\cdot e^Z$ with $Z\sim \mathcal{N}(\mu=0.5,\sigma^2=0.25)$.
  *Rationale:* lognormal draws keep prices positive and concentrate mass near realistic levels while exploring a wide range. (Same spirit as the paper.)&#x20;

* **Volatilities $\sigma_i$:**
  Uniform on $[0.05, 0.8]$ (paper used $[0,1]$).
  *Rationale:* trim extremely low vols (numerical fragility for Greeks) and extremely high vols (rare in practice) while still covering a broad regime.&#x20;

* **Maturity $T$:**
  $T = U^2/252$ with $U\sim\text{Uniform}\{1,\dots,43\}$.
  *Rationale:* **Quadratic** mapping biases samples to **shorter maturities**—where convexity is highest and worst-of options change rapidly—matching the paper’s emphasis on short-dated richness.&#x20;

* **Correlations $\rho$:**
  Drawn via a **C-vine** scheme with $\tilde{\rho}=2\cdot \text{Beta}(5,2)-1$ (elementwise), then projected to the nearest PSD matrix (eigenvalue clip) before Cholesky.
  *Rationale:* produces realistic, varied correlation structures over $[-1,1]$ while ensuring positive-definiteness for multivariate normals; this mirrors the paper’s C-vine approach.&#x20;

* **Strike & rate:**
  $K=100$ fixed; $r=0.03$ (paper used $r\approx 0$ for simplicity; using $3\%$ is a benign generalization).

> **Independence assumption.** Inputs are sampled independently (except correlations), as in the paper’s base setup; this is fast and broad-coverage. In production, one could add stratification/importance sampling to emphasize boundary/sensitive regions.&#x20;

---

## 3) Monte-Carlo Engine (Paths → Labels)

* **Multi-GPU + chunking.**
  A total of `n_paths` is split across visible GPUs; each device processes chunks of size `SIM_CHUNK` to control memory.

* **Terminal draw:**
  For each chunk, draw $Z\sim\mathcal{N}(0,I)$, map to correlated $Y=ZL^\top$ using **Cholesky** $L=\text{chol}(\rho)$, then form $S(T)$ with the closed-form GBM formula above.

* **Estimator:**
  $\widehat{V}=\exp(-rT)\cdot \frac1{N}\sum_{k=1}^N \max(0,\min_i S^{(k)}_i(T)-K)$.
  The code also tracks the **standard error** from the sample variance to quantify MC noise per scenario.

* **Precision:**
  Pricing uses CUDA autocast to `float16` for speed (inside the path loop) but returns single-precision CPU scalars; Greeks routines run in **float64** for stability.

---

## 4) Determinism & Seeding (CRNs)

To make finite-difference (FD) Greeks **variance-reduced and reproducible**, we use **Common Random Numbers (CRN)**: the *same* Gaussian draws drive up/down bumps.

* **Seed formula:**
  `manual_seed = base_seed + 1_000_003*dev_idx + chunk_idx`.
  *Rationale:* the large prime `1,000,003` guarantees different, non-overlapping subsequences per **device** and **chunk**, while the per-scenario `base_seed` enforces run-to-run reproducibility.

---

## 5) Greeks in the Dataset (Why we store both)

Although this section focuses on generation rather than learning, it’s useful context:

* **FD Δ/ν with CRN (`delta_vega_fd_crn`)**
  Central bumps with **relative step** $h=\max(\text{rel}\cdot x, 10^{-6})$. Reuse the same $Y$ for up/down to slash variance.

* **AAD Δ/ν (`delta_vega_aad`)**
  Build a scalar price and call `torch.autograd.grad(price, (S0, σ))` to get per-scenario **sensitivities** in one reverse pass. Accumulate chunk-wise to form averages.

These labels support your later comparisons (FD vs AAD speed/accuracy; model-based gradients vs MC).

---

## 6) Train vs Test Configurations

### 6.1 Train (large rows, moderate paths)

* **Defaults:** `rows=5,000,000`, `paths=1,000,000`.
* **Purpose:** generate a **broad** training corpus where each row’s price/Greeks carry modest MC noise, but the sheer *volume* of scenarios covers the state space.
* **Outputs per row:**

  * Inputs: $\{S0_i\}$, $\{\sigma_i\}$, pairwise $\rho_{ij}$, $K, r, T$.
  * Labels: price, price standard error, **FD** $\Delta_i$, $\nu_i$, and (optionally) **AAD** $\Delta_i$, $\nu_i$.
* **I/O:** Streamed appends to a single Parquet via `pyarrow` with compression; periodic **timing checkpoints** to CSV + cumulative runtime plot.

### 6.2 Test (small rows, ultra-high paths)

* **Defaults:** `rows=50,000`, `paths=100,000,000`.
* **Purpose:** produce a **“gold”** test set with *negligible MC noise*—a faithful ground truth for out-of-sample evaluation and for quantifying how well learned models beat MC noise.
* **Provenance:** This mirrors the paper’s design: tiny test set, **huge** per-row path count (e.g., 100M paths → $\sim$ 1-cent accuracy) to decouple test error from MC randomness.&#x20;

---

## 7) Why These Distributions? (Design Rationale)

* **Short maturities emphasized.** Worst-of payoffs are most nonlinear near expiry; a quadratic map from integers ($U^2/252$) oversamples that region to give the learner rich curvature.&#x20;
* **Wide spans for $S_0$, $\sigma$, $\rho$.** Keeps the network honest: it must interpolate across realistic but varied regimes (low/high vols, negative/positive correlations).&#x20;
* **Independence (first pass).** Simple, fast, and surprisingly effective; the paper shows that **more, noisier** training examples can outperform **fewer, cleaner** ones—the NN learns to average away MC noise. Your **Train**/**Test** split reflects this insight.&#x20;

---

## 8) Implementation Notes & Safeguards

* **Correlation PSD fix:** After C-vine sampling, eigenvalues are clipped to ensure a valid covariance (required for Cholesky).
* **Bump floors:** FD bumps use $\max(\text{rel}\cdot x,10^{-6})$ to avoid tiny denominators and catastrophic cancellation.
* **Mixed precision where safe:** MC price loops run under autocast; all Greeks are float64.
* **Streaming Parquet:** `CHUNK_MAX` rows per table to bound memory; one **schema** across batches for a single file.

---

## 9) Differences vs. Ferguson & Green (and why they’re okay)

| Aspect      | Paper                | This code             | Comment                                                       |
| ----------- | -------------------- | --------------------- | ------------------------------------------------------------- |
| Basket size | 6 names              | 3 names               | Dimensionality reduced for dev speed; methodology identical.  |
| Vol range   | $[0,1]$              | $[0.05,0.8]$          | Slight trimming for numerical stability; still broad.         |
| Rate $r$    | \~0%                 | 3%                    | Adds realism; GBM formula unchanged.                          |
| Test design | 5k rows @ 100M paths | 50k rows @ 100M paths | Same principle: tiny noise, high-fidelity labels.             |

---

## 10) Column Schema (what lands in Parquet)

* **Features:**
  `S0_0..S0_{N-1}`, `sigma_0..sigma_{N-1}`, `corr_i_j` (upper triangle), `K`, `r`, `T`
* **Labels & diagnostics:**
  `price`, `price_se`, `delta_0..delta_{N-1}`, `vega_0..vega_{N-1}`, and (if enabled) `delta_aad_*`, `vega_aad_*`
* **Instrumentation (separate files):**
  `timelog.csv` (cumulative times), and PNG plot comparing **FD vs AAD** runtime scaling.

---

## 11) Summary

* The **Train** program creates **millions** of broadly sampled scenarios with **moderate** path counts per row—cheap, dense coverage whose MC noise a neural net can learn to smooth.
* The **Test** program creates a **small** but **ultra-accurate** set (100M paths/row) that isolates *true* generalization error from MC randomness.
* The sampling distributions (lognormal spots, uniform vols, short-biased maturities, C-vine correlations) and the **CRN + AAD/FD** labeling directly follow the successful recipe documented by Ferguson & Green, adapted to your hardware and dimensionality.&#x20;

---

*Primary reference for methodology and sampling choices: Ferguson & Green (2018), “Deeply Learning Derivatives.”*&#x20;

---


## Code

### Train

In [1]:
import os, math, time, argparse, pathlib, sys
import numpy as np
import torch
import pyarrow as pa, pyarrow.parquet as pq

# Optional plotting/logging deps
import matplotlib
matplotlib.use("Agg")  # safe for headless / servers
import matplotlib.pyplot as plt
import csv

# --------- global knobs ---------
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark       = True
torch.set_default_dtype(torch.float32)

N_ASSETS   = 3
R_RATE     = 0.03
SEED_BASE  = 42
CHUNK_MAX  = 100_000                 # rows before flushing Parquet
NGPU       = torch.cuda.device_count()
DEVICES    = [torch.device(f"cuda:{i}") for i in range(NGPU)]
SIM_CHUNK  = 1_000_000               # per-GPU sub-batch size (paths)

if not NGPU:
    sys.exit("No CUDA GPU visible – aborting.")

# --------- correlation sampler ---------
def cvine_corr_np(d, a: float = 5.0, b: float = 2.0) -> torch.Tensor:
    P = np.eye(d)
    for k in range(d - 1):
        for i in range(k + 1, d):
            rho = 2.0 * np.random.beta(a, b) - 1.0
            for m in range(k - 1, -1, -1):
                rho = rho * np.sqrt((1 - P[m, i] ** 2) * (1 - P[m, k] ** 2)) + P[m, i] * P[m, k]
            P[k, i] = P[i, k] = rho
    ev, evec = np.linalg.eigh(P)
    P = evec @ np.diag(np.clip(ev, 1e-6, None)) @ evec.T
    return torch.as_tensor(P, dtype=torch.float32, device="cuda")

# --------- random scenario generator ---------
def fg_sample():
    z     = np.random.normal(0.5, math.sqrt(0.25), N_ASSETS)
    S0    = 100 * np.exp(z)
    sigma = np.random.uniform(0.05, 0.8, N_ASSETS)
    T     = (np.random.randint(1, 44) ** 2) / 252.0
    return dict(S0=S0.astype(np.float32), sigma=sigma.astype(np.float32),
                T=float(T), rho=cvine_corr_np(N_ASSETS), K=100.0, r=R_RATE)

# --------- helpers ---------
def _split_across_devices(total: int, ndev: int):
    base = total // ndev
    rem  = total % ndev
    return [base + (1 if i < rem else 0) for i in range(ndev)]

@torch.no_grad()
def terminal_prices(S0, sigma, T, rho, *, n_paths, r, device, gen=None):
    L = torch.linalg.cholesky(rho.to(device))
    Z = torch.randn(n_paths, N_ASSETS, device=device, generator=gen)
    with torch.autocast('cuda', dtype=torch.float16):
        drift     = (r - 0.5 * sigma**2) * T
        diffusion = sigma * math.sqrt(T) * (Z @ L.T)
        return torch.exp(torch.log(S0) + drift + diffusion)

# --------- streaming MC price (fast path) ---------
def price_mc(params, n_paths, return_se=False):
    counts      = _split_across_devices(n_paths, NGPU)
    total_sum   = 0.0
    total_sumsq = 0.0
    disc        = math.exp(-float(params['r']) * float(params['T']))

    for dev, count in zip(DEVICES, counts):
        if count == 0: continue
        S0    = torch.tensor(params['S0'],    device=dev)
        sigma = torch.tensor(params['sigma'], device=dev)
        T     = torch.tensor(params['T'],     device=dev)
        r     = torch.tensor(params['r'],     device=dev)
        K     = torch.tensor(params['K'],     device=dev)

        for offset in range(0, count, SIM_CHUNK):
            sz  = min(SIM_CHUNK, count - offset)
            ST  = terminal_prices(S0, sigma, T, params['rho'], n_paths=sz, r=r, device=dev)
            pay = torch.clamp(ST.min(dim=1).values - K, 0.0)
            arr = (disc * pay).float().cpu().numpy()
            total_sum   += arr.sum()
            total_sumsq += (arr * arr).sum()

    mean = total_sum / n_paths
    if not return_se:
        return mean
    var  = (total_sumsq / n_paths) - mean * mean
    se   = math.sqrt(max(var, 0.0) / n_paths)
    return mean, se

# ===================== FD Δ & ν WITH CRN =====================
def delta_vega_fd_crn(params, n_paths, rel=1e-4, base_seed=None):
    """
    FD Δ/ν using common random numbers (same Z/Y reused for up/down).
    float64 throughout; chunked + multi-GPU; deterministic with base_seed.
    Returns (delta[N_ASSETS], vega[N_ASSETS]) as numpy float64 arrays.
    """
    disc = math.exp(-float(params['r']) * float(params['T']))
    delta_num = np.zeros(N_ASSETS, dtype=np.float64)
    vega_num  = np.zeros(N_ASSETS, dtype=np.float64)

    counts = _split_across_devices(n_paths, NGPU)
    for dev_idx, (dev, count) in enumerate(zip(DEVICES, counts)):
        if count == 0: continue

        S0f  = torch.tensor(params['S0'],    dtype=torch.float64, device=dev)
        sigf = torch.tensor(params['sigma'], dtype=torch.float64, device=dev)
        Tf   = torch.tensor(float(params['T']), dtype=torch.float64, device=dev)
        rf   = torch.tensor(float(params['r']), dtype=torch.float64, device=dev)
        Kf   = torch.tensor(float(params['K']), dtype=torch.float64, device=dev)
        L    = torch.linalg.cholesky(params['rho'].to(dev).to(torch.float64))

        for chunk_idx, offset in enumerate(range(0, count, SIM_CHUNK)):
            sz   = min(SIM_CHUNK, count - offset)

            gen = None
            if base_seed is not None:
                gen = torch.Generator(device=dev)
                gen.manual_seed(int(base_seed) + 1_000_003*dev_idx + chunk_idx)

            Z = torch.randn(sz, N_ASSETS, dtype=torch.float64, device=dev, generator=gen)
            Y = Z @ L.T

            drift_b = (rf - 0.5 * sigf**2) * Tf
            diff_b  = sigf * torch.sqrt(Tf) * Y

            for i in range(N_ASSETS):
                # Δ bump
                hS   = float(max(rel * float(S0f[i].item()), 1e-6))
                S_up = S0f.clone(); S_up[i] += hS
                S_dn = S0f.clone(); S_dn[i] -= hS

                ST_up = torch.exp(torch.log(S_up) + drift_b + diff_b)
                ST_dn = torch.exp(torch.log(S_dn) + drift_b + diff_b)
                pay_up = torch.clamp(ST_up.min(dim=1).values - Kf, 0.0)
                pay_dn = torch.clamp(ST_dn.min(dim=1).values - Kf, 0.0)

                delta_num[i] += (disc * (pay_up.sum().double() - pay_dn.sum().double()) / (2.0 * hS)).cpu().item()

                # ν bump (reuse Y)
                hV     = float(max(rel * float(sigf[i].item()), 1e-6))
                sig_up = sigf.clone(); sig_up[i] += hV
                sig_dn = sigf.clone(); sig_dn[i] -= hV

                drift_up = (rf - 0.5 * sig_up**2) * Tf
                diff_up  = sig_up * torch.sqrt(Tf) * Y
                drift_dn = (rf - 0.5 * sig_dn**2) * Tf
                diff_dn  = sig_dn * torch.sqrt(Tf) * Y

                ST_vup  = torch.exp(torch.log(S0f) + drift_up + diff_up)
                ST_vdn  = torch.exp(torch.log(S0f) + drift_dn + diff_dn)
                pay_vup = torch.clamp(ST_vup.min(dim=1).values - Kf, 0.0)
                pay_vdn = torch.clamp(ST_vdn.min(dim=1).values - Kf, 0.0)

                vega_num[i]  += (disc * (pay_vup.sum().double() - pay_vdn.sum().double()) / (2.0 * hV)).cpu().item()

    delta = delta_num / n_paths
    vega  = vega_num  / n_paths
    return delta, vega

# ===================== AAD Δ & ν (float64, chunked) =====================
def delta_vega_aad(params, n_paths, base_seed=None):
    counts = _split_across_devices(n_paths, NGPU)
    sum_dS = np.zeros(N_ASSETS, dtype=np.float64)
    sum_dV = np.zeros(N_ASSETS, dtype=np.float64)

    for dev_idx, (dev, count) in enumerate(zip(DEVICES, counts)):
        if count == 0: continue
        L  = torch.linalg.cholesky(params['rho'].to(dev).to(torch.float64))
        Tf = torch.tensor(float(params['T']), dtype=torch.float64, device=dev)
        rf = torch.tensor(float(params['r']), dtype=torch.float64, device=dev)
        Kf = torch.tensor(float(params['K']), dtype=torch.float64, device=dev)

        for chunk_idx, offset in enumerate(range(0, count, SIM_CHUNK)):
            sz  = min(SIM_CHUNK, count - offset)

            gen = None
            if base_seed is not None:
                gen = torch.Generator(device=dev)
                gen.manual_seed(int(base_seed) + 1_000_003*dev_idx + chunk_idx)

            # Leaf vars with grads
            S0  = torch.tensor(params['S0'],    dtype=torch.float64, device=dev, requires_grad=True)
            sig = torch.tensor(params['sigma'], dtype=torch.float64, device=dev, requires_grad=True)

            Z = torch.randn(sz, N_ASSETS, dtype=torch.float64, device=dev, generator=gen)
            Y = Z @ L.T

            drift = (rf - 0.5 * sig**2) * Tf
            diff  = sig * torch.sqrt(Tf) * Y
            ST    = torch.exp(torch.log(S0) + drift + diff)
            payoff = torch.clamp(ST.min(dim=1).values - Kf, 0.0)
            price  = torch.exp(-rf * Tf) * payoff.mean()

            dS, dV = torch.autograd.grad(price, (S0, sig), retain_graph=False, create_graph=False)
            # accumulate sums (note: host transfer happens here)
            sum_dS += dS.detach().double().cpu().numpy() * sz
            sum_dV += dV.detach().double().cpu().numpy() * sz

    return sum_dS / n_paths, sum_dV / n_paths

# ----------------- main driver -----------------
def main():
    ap = argparse.ArgumentParser()
    ap.add_argument('--rows',        type=int, default=5_000_000)
    ap.add_argument('--paths',       type=int, default=1_000_000)
    ap.add_argument('--seed_offset', type=int, default=0)
    ap.add_argument('--out',         type=str, default='Train.parquet')
    ap.add_argument('--no-aad',      dest='do_aad', action='store_false', default=True, help='disable AAD Δ/ν')

    # Instrumentation & outputs
    ap.add_argument('--checkpoint',  type=int, default=10_000, help='rows between timing checkpoints')
    ap.add_argument('--timelog',     type=str, default='timelog.csv', help='CSV to write timing checkpoints')
    ap.add_argument('--plot',        type=str, default='fd_vs_aad_runtime.png', help='PNG plot output')

    # (Jupyter sometimes injects -f/--f; we’ll ignore unknowns in notebooks)
    ap.add_argument('-f', '--f', default=None, help=argparse.SUPPRESS)

    # Robust parsing across CLI & notebooks
    in_notebook = False
    try:
        from IPython import get_ipython  # type: ignore
        in_notebook = get_ipython() is not None
    except Exception:
        in_notebook = False

    if in_notebook:
        args, _ = ap.parse_known_args([])   # ignore Jupyter's flags, use defaults
    else:
        args, _ = ap.parse_known_args()

    np.random.seed(SEED_BASE + args.seed_offset)
    torch.manual_seed(SEED_BASE + args.seed_offset)

    out_path  = pathlib.Path(args.out)
    writer    = None
    first     = True

    # cumulative timers
    total_t0  = time.time()
    sample_t = price_t = fd_t = aad_t = 0.0

    # checkpoint series (cumulative)
    ck_rows, ck_fd, ck_aad, ck_wall = [], [], [], []

    print(f"Launching Monte-Carlo for {args.rows:,} rows …", flush=True)
    global_row_idx = 0

    rows_left = args.rows
    try:
        while rows_left:
            batch = min(rows_left, CHUNK_MAX)
            records = []

            for _ in range(batch):
                # ---- sample scenario
                t0 = time.perf_counter()
                p  = fg_sample()
                sample_t += time.perf_counter() - t0

                # ---- price & SE
                t1 = time.perf_counter()
                price, price_se = price_mc(p, args.paths, return_se=True)
                price_t += time.perf_counter() - t1

                # ---- FD Δ/ν
                scen_seed = SEED_BASE + args.seed_offset + global_row_idx
                t2 = time.perf_counter()
                delta, vega = delta_vega_fd_crn(p, args.paths, rel=1e-4, base_seed=scen_seed)
                fd_t += time.perf_counter() - t2

                # ---- AAD Δ/ν (optional)
                if args.do_aad:
                    t3 = time.perf_counter()
                    delta_aad, vega_aad = delta_vega_aad(p, args.paths, base_seed=scen_seed)
                    aad_t += time.perf_counter() - t3
                else:
                    delta_aad = vega_aad = None

                # ---- flatten correlation matrix
                corr_mat = p['rho'].detach().cpu().numpy()
                corr_fields = {
                    f"corr_{i}_{j}": float(corr_mat[i, j])
                    for i in range(N_ASSETS) for j in range(i + 1, N_ASSETS)
                }

                # ---- assemble record
                rec = {
                    **{f"S0_{i}":     float(p['S0'][i])     for i in range(N_ASSETS)},
                    **{f"sigma_{i}":  float(p['sigma'][i])  for i in range(N_ASSETS)},
                    **corr_fields,
                    "K": float(p['K']),
                    "r": float(p['r']),
                    "T": float(p['T']),
                    "price":    float(price),
                    "price_se": float(price_se),
                    **{f"delta_{i}": float(delta[i]) for i in range(N_ASSETS)},
                    **{f"vega_{i}":  float(vega[i])  for i in range(N_ASSETS)},
                }
                if delta_aad is not None:
                    rec.update({f"delta_aad_{i}": float(delta_aad[i]) for i in range(N_ASSETS)})
                    rec.update({f"vega_aad_{i}":  float(vega_aad[i])  for i in range(N_ASSETS)})

                records.append(rec)
                global_row_idx += 1

                # ---- checkpoint logging
                if (global_row_idx % args.checkpoint) == 0:
                    ck_rows.append(global_row_idx)
                    ck_fd.append(fd_t)
                    ck_aad.append(aad_t if args.do_aad else 0.0)
                    ck_wall.append(time.time() - total_t0)

            # ---- write Parquet (single file)
            table = pa.Table.from_pylist(records)
            if first:
                writer = pq.ParquetWriter(str(out_path), table.schema, compression='zstd')
                first = False
            writer.write_table(table)
            rows_left -= batch

    finally:
        if writer is not None:
            writer.close()

    total_elapsed = time.time() - total_t0

    # ---- console summary (paper-ready numbers)
    print(f"Sampling: {sample_t:.2f}s | Pricing: {price_t:.2f}s | FD: {fd_t:.2f}s | AAD: {aad_t:.2f}s")
    print(f"TOTAL (wall): {total_elapsed:.2f}s for {args.rows:,} rows @ {args.paths:,} paths")

    # ---- write timing CSV
    if len(ck_rows) > 0:
        with open(args.timelog, "w", newline="") as f:
            wr = csv.writer(f)
            wr.writerow(["rows", "fd_time_sec_cum", "aad_time_sec_cum", "wall_time_sec_cum"])
            for r, fdv, aadv, w in zip(ck_rows, ck_fd, ck_aad, ck_wall):
                wr.writerow([r, f"{fdv:.6f}", f"{aadv:.6f}", f"{w:.6f}"])
        print(f"Wrote timing checkpoints → {args.timelog}")

    # ---- make plot (cumulative runtime vs rows)
    if len(ck_rows) > 0:
        plt.figure(figsize=(7,5))
        plt.plot(ck_rows, ck_fd,  label="FD cumulative runtime (s)")
        if args.do_aad:
            plt.plot(ck_rows, ck_aad, label="AAD cumulative runtime (s)")
        plt.xlabel("Rows (scenarios)")
        plt.ylabel("Cumulative runtime (seconds)")
        plt.title("FD vs AAD runtime scaling")
        plt.legend()
        plt.tight_layout()
        plt.savefig(args.plot, dpi=150)
        print(f"Wrote runtime plot → {args.plot}")

        # ---- find crossover (first rows where AAD faster than FD cumulatively)
        if args.do_aad:
            crossover_idx = None
            for i in range(len(ck_rows)):
                if ck_aad[i] < ck_fd[i]:
                    crossover_idx = i
                    break
            if crossover_idx is not None:
                print(f"AAD becomes faster than FD (cumulative) at ~{ck_rows[crossover_idx]:,} rows "
                      f"(FD={ck_fd[crossover_idx]:.2f}s, AAD={ck_aad[crossover_idx]:.2f}s).")
            else:
                print("No cumulative AAD<FD crossover within sampled rows.")

if __name__ == "__main__":
    main()


Launching Monte-Carlo for 5,000,000 rows …
Sampling: 1200.47s | Pricing: 22507.23s | FD: 54858.84s | AAD: 18505.20s
TOTAL (wall): 97596.84s for 5,000,000 rows @ 1,000,000 paths
Wrote timing checkpoints → timelog.csv
Wrote runtime plot → fd_vs_aad_runtime.png
AAD becomes faster than FD (cumulative) at ~10,000 rows (FD=112.40s, AAD=38.14s).


### Test

In [2]:
import os, math, time, argparse, pathlib, sys
import numpy as np
import torch
import pyarrow as pa, pyarrow.parquet as pq

# Optional plotting/logging deps
import matplotlib
matplotlib.use("Agg")  # safe for headless / servers
import matplotlib.pyplot as plt
import csv

# --------- global knobs ---------
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark       = True
torch.set_default_dtype(torch.float32)

N_ASSETS   = 3
R_RATE     = 0.03
SEED_BASE  = 42
CHUNK_MAX  = 100_000                 # rows before flushing Parquet
NGPU       = torch.cuda.device_count()
DEVICES    = [torch.device(f"cuda:{i}") for i in range(NGPU)]
SIM_CHUNK  = 1_000_000               # per-GPU sub-batch size (paths)

if not NGPU:
    sys.exit("No CUDA GPU visible – aborting.")

# --------- correlation sampler ---------
def cvine_corr_np(d, a: float = 5.0, b: float = 2.0) -> torch.Tensor:
    P = np.eye(d)
    for k in range(d - 1):
        for i in range(k + 1, d):
            rho = 2.0 * np.random.beta(a, b) - 1.0
            for m in range(k - 1, -1, -1):
                rho = rho * np.sqrt((1 - P[m, i] ** 2) * (1 - P[m, k] ** 2)) + P[m, i] * P[m, k]
            P[k, i] = P[i, k] = rho
    ev, evec = np.linalg.eigh(P)
    P = evec @ np.diag(np.clip(ev, 1e-6, None)) @ evec.T
    return torch.as_tensor(P, dtype=torch.float32, device="cuda")

# --------- random scenario generator ---------
def fg_sample():
    z     = np.random.normal(0.5, math.sqrt(0.25), N_ASSETS)
    S0    = 100 * np.exp(z)
    sigma = np.random.uniform(0.05, 0.8, N_ASSETS)
    T     = (np.random.randint(1, 44) ** 2) / 252.0
    return dict(S0=S0.astype(np.float32), sigma=sigma.astype(np.float32),
                T=float(T), rho=cvine_corr_np(N_ASSETS), K=100.0, r=R_RATE)

# --------- helpers ---------
def _split_across_devices(total: int, ndev: int):
    base = total // ndev
    rem  = total % ndev
    return [base + (1 if i < rem else 0) for i in range(ndev)]

@torch.no_grad()
def terminal_prices(S0, sigma, T, rho, *, n_paths, r, device, gen=None):
    L = torch.linalg.cholesky(rho.to(device))
    Z = torch.randn(n_paths, N_ASSETS, device=device, generator=gen)
    with torch.autocast('cuda', dtype=torch.float16):
        drift     = (r - 0.5 * sigma**2) * T
        diffusion = sigma * math.sqrt(T) * (Z @ L.T)
        return torch.exp(torch.log(S0) + drift + diffusion)

# --------- streaming MC price (fast path) ---------
def price_mc(params, n_paths, return_se=False):
    counts      = _split_across_devices(n_paths, NGPU)
    total_sum   = 0.0
    total_sumsq = 0.0
    disc        = math.exp(-float(params['r']) * float(params['T']))

    for dev, count in zip(DEVICES, counts):
        if count == 0: continue
        S0    = torch.tensor(params['S0'],    device=dev)
        sigma = torch.tensor(params['sigma'], device=dev)
        T     = torch.tensor(params['T'],     device=dev)
        r     = torch.tensor(params['r'],     device=dev)
        K     = torch.tensor(params['K'],     device=dev)

        for offset in range(0, count, SIM_CHUNK):
            sz  = min(SIM_CHUNK, count - offset)
            ST  = terminal_prices(S0, sigma, T, params['rho'], n_paths=sz, r=r, device=dev)
            pay = torch.clamp(ST.min(dim=1).values - K, 0.0)
            arr = (disc * pay).float().cpu().numpy()
            total_sum   += arr.sum()
            total_sumsq += (arr * arr).sum()

    mean = total_sum / n_paths
    if not return_se:
        return mean
    var  = (total_sumsq / n_paths) - mean * mean
    se   = math.sqrt(max(var, 0.0) / n_paths)
    return mean, se

# ===================== FD Δ & ν WITH CRN =====================
def delta_vega_fd_crn(params, n_paths, rel=1e-4, base_seed=None):
    """
    FD Δ/ν using common random numbers (same Z/Y reused for up/down).
    float64 throughout; chunked + multi-GPU; deterministic with base_seed.
    Returns (delta[N_ASSETS], vega[N_ASSETS]) as numpy float64 arrays.
    """
    disc = math.exp(-float(params['r']) * float(params['T']))
    delta_num = np.zeros(N_ASSETS, dtype=np.float64)
    vega_num  = np.zeros(N_ASSETS, dtype=np.float64)

    counts = _split_across_devices(n_paths, NGPU)
    for dev_idx, (dev, count) in enumerate(zip(DEVICES, counts)):
        if count == 0: continue

        S0f  = torch.tensor(params['S0'],    dtype=torch.float64, device=dev)
        sigf = torch.tensor(params['sigma'], dtype=torch.float64, device=dev)
        Tf   = torch.tensor(float(params['T']), dtype=torch.float64, device=dev)
        rf   = torch.tensor(float(params['r']), dtype=torch.float64, device=dev)
        Kf   = torch.tensor(float(params['K']), dtype=torch.float64, device=dev)
        L    = torch.linalg.cholesky(params['rho'].to(dev).to(torch.float64))

        for chunk_idx, offset in enumerate(range(0, count, SIM_CHUNK)):
            sz   = min(SIM_CHUNK, count - offset)

            gen = None
            if base_seed is not None:
                gen = torch.Generator(device=dev)
                gen.manual_seed(int(base_seed) + 1_000_003*dev_idx + chunk_idx)

            Z = torch.randn(sz, N_ASSETS, dtype=torch.float64, device=dev, generator=gen)
            Y = Z @ L.T

            drift_b = (rf - 0.5 * sigf**2) * Tf
            diff_b  = sigf * torch.sqrt(Tf) * Y

            for i in range(N_ASSETS):
                # Δ bump
                hS   = float(max(rel * float(S0f[i].item()), 1e-6))
                S_up = S0f.clone(); S_up[i] += hS
                S_dn = S0f.clone(); S_dn[i] -= hS

                ST_up = torch.exp(torch.log(S_up) + drift_b + diff_b)
                ST_dn = torch.exp(torch.log(S_dn) + drift_b + diff_b)
                pay_up = torch.clamp(ST_up.min(dim=1).values - Kf, 0.0)
                pay_dn = torch.clamp(ST_dn.min(dim=1).values - Kf, 0.0)

                delta_num[i] += (disc * (pay_up.sum().double() - pay_dn.sum().double()) / (2.0 * hS)).cpu().item()

                # ν bump (reuse Y)
                hV     = float(max(rel * float(sigf[i].item()), 1e-6))
                sig_up = sigf.clone(); sig_up[i] += hV
                sig_dn = sigf.clone(); sig_dn[i] -= hV

                drift_up = (rf - 0.5 * sig_up**2) * Tf
                diff_up  = sig_up * torch.sqrt(Tf) * Y
                drift_dn = (rf - 0.5 * sig_dn**2) * Tf
                diff_dn  = sig_dn * torch.sqrt(Tf) * Y

                ST_vup  = torch.exp(torch.log(S0f) + drift_up + diff_up)
                ST_vdn  = torch.exp(torch.log(S0f) + drift_dn + diff_dn)
                pay_vup = torch.clamp(ST_vup.min(dim=1).values - Kf, 0.0)
                pay_vdn = torch.clamp(ST_vdn.min(dim=1).values - Kf, 0.0)

                vega_num[i]  += (disc * (pay_vup.sum().double() - pay_vdn.sum().double()) / (2.0 * hV)).cpu().item()

    delta = delta_num / n_paths
    vega  = vega_num  / n_paths
    return delta, vega

# ===================== AAD Δ & ν (float64, chunked) =====================
def delta_vega_aad(params, n_paths, base_seed=None):
    counts = _split_across_devices(n_paths, NGPU)
    sum_dS = np.zeros(N_ASSETS, dtype=np.float64)
    sum_dV = np.zeros(N_ASSETS, dtype=np.float64)

    for dev_idx, (dev, count) in enumerate(zip(DEVICES, counts)):
        if count == 0: continue
        L  = torch.linalg.cholesky(params['rho'].to(dev).to(torch.float64))
        Tf = torch.tensor(float(params['T']), dtype=torch.float64, device=dev)
        rf = torch.tensor(float(params['r']), dtype=torch.float64, device=dev)
        Kf = torch.tensor(float(params['K']), dtype=torch.float64, device=dev)

        for chunk_idx, offset in enumerate(range(0, count, SIM_CHUNK)):
            sz  = min(SIM_CHUNK, count - offset)

            gen = None
            if base_seed is not None:
                gen = torch.Generator(device=dev)
                gen.manual_seed(int(base_seed) + 1_000_003*dev_idx + chunk_idx)

            # Leaf vars with grads
            S0  = torch.tensor(params['S0'],    dtype=torch.float64, device=dev, requires_grad=True)
            sig = torch.tensor(params['sigma'], dtype=torch.float64, device=dev, requires_grad=True)

            Z = torch.randn(sz, N_ASSETS, dtype=torch.float64, device=dev, generator=gen)
            Y = Z @ L.T

            drift = (rf - 0.5 * sig**2) * Tf
            diff  = sig * torch.sqrt(Tf) * Y
            ST    = torch.exp(torch.log(S0) + drift + diff)
            payoff = torch.clamp(ST.min(dim=1).values - Kf, 0.0)
            price  = torch.exp(-rf * Tf) * payoff.mean()

            dS, dV = torch.autograd.grad(price, (S0, sig), retain_graph=False, create_graph=False)
            # accumulate sums (note: host transfer happens here)
            sum_dS += dS.detach().double().cpu().numpy() * sz
            sum_dV += dV.detach().double().cpu().numpy() * sz

    return sum_dS / n_paths, sum_dV / n_paths

# ----------------- main driver -----------------
def main():
    ap = argparse.ArgumentParser()
    ap.add_argument('--rows',        type=int, default=50_000)
    ap.add_argument('--paths',       type=int, default=100_000_000)
    ap.add_argument('--seed_offset', type=int, default=0)
    ap.add_argument('--out',         type=str, default='Test.parquet')
    ap.add_argument('--no-aad',      dest='do_aad', action='store_false', default=True, help='disable AAD Δ/ν')

    # Instrumentation & outputs
    ap.add_argument('--checkpoint',  type=int, default=500, help='rows between timing checkpoints')
    ap.add_argument('--timelog',     type=str, default='timelog_test.csv', help='CSV to write timing checkpoints')
    ap.add_argument('--plot',        type=str, default='fd_vs_aad_runtime_test.png', help='PNG plot output')

    # (Jupyter sometimes injects -f/--f; we’ll ignore unknowns in notebooks)
    ap.add_argument('-f', '--f', default=None, help=argparse.SUPPRESS)

    # Robust parsing across CLI & notebooks
    in_notebook = False
    try:
        from IPython import get_ipython  # type: ignore
        in_notebook = get_ipython() is not None
    except Exception:
        in_notebook = False

    if in_notebook:
        args, _ = ap.parse_known_args([])   # ignore Jupyter's flags, use defaults
    else:
        args, _ = ap.parse_known_args()

    np.random.seed(SEED_BASE + args.seed_offset)
    torch.manual_seed(SEED_BASE + args.seed_offset)

    out_path  = pathlib.Path(args.out)
    writer    = None
    first     = True

    # cumulative timers
    total_t0  = time.time()
    sample_t = price_t = fd_t = aad_t = 0.0

    # checkpoint series (cumulative)
    ck_rows, ck_fd, ck_aad, ck_wall = [], [], [], []

    print(f"Launching Monte-Carlo for {args.rows:,} rows …", flush=True)
    global_row_idx = 0

    rows_left = args.rows
    try:
        while rows_left:
            batch = min(rows_left, CHUNK_MAX)
            records = []

            for _ in range(batch):
                # ---- sample scenario
                t0 = time.perf_counter()
                p  = fg_sample()
                sample_t += time.perf_counter() - t0

                # ---- price & SE
                t1 = time.perf_counter()
                price, price_se = price_mc(p, args.paths, return_se=True)
                price_t += time.perf_counter() - t1

                # ---- FD Δ/ν
                scen_seed = SEED_BASE + args.seed_offset + global_row_idx
                t2 = time.perf_counter()
                delta, vega = delta_vega_fd_crn(p, args.paths, rel=1e-4, base_seed=scen_seed)
                fd_t += time.perf_counter() - t2

                # ---- AAD Δ/ν (optional)
                if args.do_aad:
                    t3 = time.perf_counter()
                    delta_aad, vega_aad = delta_vega_aad(p, args.paths, base_seed=scen_seed)
                    aad_t += time.perf_counter() - t3
                else:
                    delta_aad = vega_aad = None

                # ---- flatten correlation matrix
                corr_mat = p['rho'].detach().cpu().numpy()
                corr_fields = {
                    f"corr_{i}_{j}": float(corr_mat[i, j])
                    for i in range(N_ASSETS) for j in range(i + 1, N_ASSETS)
                }

                # ---- assemble record
                rec = {
                    **{f"S0_{i}":     float(p['S0'][i])     for i in range(N_ASSETS)},
                    **{f"sigma_{i}":  float(p['sigma'][i])  for i in range(N_ASSETS)},
                    **corr_fields,
                    "K": float(p['K']),
                    "r": float(p['r']),
                    "T": float(p['T']),
                    "price":    float(price),
                    "price_se": float(price_se),
                    **{f"delta_{i}": float(delta[i]) for i in range(N_ASSETS)},
                    **{f"vega_{i}":  float(vega[i])  for i in range(N_ASSETS)},
                }
                if delta_aad is not None:
                    rec.update({f"delta_aad_{i}": float(delta_aad[i]) for i in range(N_ASSETS)})
                    rec.update({f"vega_aad_{i}":  float(vega_aad[i])  for i in range(N_ASSETS)})

                records.append(rec)
                global_row_idx += 1

                # ---- checkpoint logging
                if (global_row_idx % args.checkpoint) == 0:
                    ck_rows.append(global_row_idx)
                    ck_fd.append(fd_t)
                    ck_aad.append(aad_t if args.do_aad else 0.0)
                    ck_wall.append(time.time() - total_t0)

            # ---- write Parquet (single file)
            table = pa.Table.from_pylist(records)
            if first:
                writer = pq.ParquetWriter(str(out_path), table.schema, compression='zstd')
                first = False
            writer.write_table(table)
            rows_left -= batch

    finally:
        if writer is not None:
            writer.close()

    total_elapsed = time.time() - total_t0

    # ---- console summary (paper-ready numbers)
    print(f"Sampling: {sample_t:.2f}s | Pricing: {price_t:.2f}s | FD: {fd_t:.2f}s | AAD: {aad_t:.2f}s")
    print(f"TOTAL (wall): {total_elapsed:.2f}s for {args.rows:,} rows @ {args.paths:,} paths")

    # ---- write timing CSV
    if len(ck_rows) > 0:
        with open(args.timelog, "w", newline="") as f:
            wr = csv.writer(f)
            wr.writerow(["rows", "fd_time_sec_cum", "aad_time_sec_cum", "wall_time_sec_cum"])
            for r, fdv, aadv, w in zip(ck_rows, ck_fd, ck_aad, ck_wall):
                wr.writerow([r, f"{fdv:.6f}", f"{aadv:.6f}", f"{w:.6f}"])
        print(f"Wrote timing checkpoints → {args.timelog}")

    # ---- make plot (cumulative runtime vs rows)
    if len(ck_rows) > 0:
        plt.figure(figsize=(7,5))
        plt.plot(ck_rows, ck_fd,  label="FD cumulative runtime (s)")
        if args.do_aad:
            plt.plot(ck_rows, ck_aad, label="AAD cumulative runtime (s)")
        plt.xlabel("Rows (scenarios)")
        plt.ylabel("Cumulative runtime (seconds)")
        plt.title("FD vs AAD runtime scaling")
        plt.legend()
        plt.tight_layout()
        plt.savefig(args.plot, dpi=150)
        print(f"Wrote runtime plot → {args.plot}")

        # ---- find crossover (first rows where AAD faster than FD cumulatively)
        if args.do_aad:
            crossover_idx = None
            for i in range(len(ck_rows)):
                if ck_aad[i] < ck_fd[i]:
                    crossover_idx = i
                    break
            if crossover_idx is not None:
                print(f"AAD becomes faster than FD (cumulative) at ~{ck_rows[crossover_idx]:,} rows "
                      f"(FD={ck_fd[crossover_idx]:.2f}s, AAD={ck_aad[crossover_idx]:.2f}s).")
            else:
                print("No cumulative AAD<FD crossover within sampled rows.")

if __name__ == "__main__":
    main()


Launching Monte-Carlo for 50,000 rows …
Sampling: 12.02s | Pricing: 22267.82s | FD: 50801.53s | AAD: 16086.47s
TOTAL (wall): 89169.91s for 50,000 rows @ 100,000,000 paths
Wrote timing checkpoints → timelog_test.csv
Wrote runtime plot → fd_vs_aad_runtime_test.png
AAD becomes faster than FD (cumulative) at ~500 rows (FD=512.34s, AAD=162.18s).


## Cleaning 

In [4]:
import pandas as pd
from pathlib import Path

def clean_parquet(src_path, dst_path):
    """
    • Divides every S0_* and price by the row’s K
    • Renames the results to `S0_i/K` and `price/k`
    • Removes *_se, the raw S0_*, price, and K columns
    • Drops rows where price/k < 1e-6
    """
    df = pd.read_parquet(src_path)

    # 1) locate relevant columns
    s0_cols = [c for c in df.columns if c.lower().startswith("s0_")]
    se_cols = [c for c in df.columns if c.endswith("_se")]

    # 2) create scaled columns with the requested names
    for col in s0_cols:
        df[f"{col}/K"] = df[col] / df["K"]
    df["price/k"] = df["price"] / df["K"]

    # 3) filter out extremely small payoffs
    df = df[df["price/k"] >= 1e-6]

    # 4) drop the original, now-redundant columns
    df = df.drop(columns=s0_cols + ["price", "K"] + se_cols)

    # 5) save the cleaned file
    df.to_parquet(dst_path, index=False)
    print(f"✔ Saved cleaned file → {dst_path!s}")

# Apply to both datasets
clean_parquet("Train.parquet", "Train_Clean.parquet")
clean_parquet("Test.parquet",  "Test_Clean.parquet")


✔ Saved cleaned file → Train_Clean.parquet
✔ Saved cleaned file → Test_Clean.parquet


# Adjoint Neural Pricer (ANP)

## Explanantion/ Reasoning 

## Code

In [None]:
# Cell 1 — imports & device setup
import os, math, random, time, pathlib, re
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
import re

from torchmetrics import R2Score

# Use inline plots
%matplotlib inline

os.environ.setdefault("NCCL_P2P_LEVEL", "0")
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Running on:", DEVICE)

torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

# 1) Read cleaned Parquet files
train_df = pd.read_parquet("Train_Clean.parquet", engine="pyarrow")
test_df  = pd.read_parquet("Test_Clean.parquet",  engine="pyarrow")

# 2) Identify and remove any Greek columns (delta*, vega*, incl. *_aad variants)
def find_greek_cols(df: pd.DataFrame):
    return [c for c in df.columns if ("delta" in c.lower()) or ("vega" in c.lower())]

greeks_to_drop = sorted(set(find_greek_cols(train_df)) | set(find_greek_cols(test_df)))
if greeks_to_drop:
    print(f"Removing {len(greeks_to_drop)} Greek columns: {greeks_to_drop}")

# 3) Define feature columns = all non-target, non-Greeks
TARGET_COL = "price/k"
feature_cols = [c for c in train_df.columns if c not in greeks_to_drop and c != TARGET_COL]

# Sanity: ensure test has the same feature columns
missing_in_test = [c for c in feature_cols if c not in test_df.columns]
if missing_in_test:
    raise KeyError(f"Test set is missing expected feature columns: {missing_in_test}")

# >>> Print which columns ARE included
print(f"\nIncluded feature columns ({len(feature_cols)}):")
for col in feature_cols:
    print("  •", col)


# 4) Build X/y with the agreed feature set
X_full = train_df[feature_cols].values.astype(np.float32)
y_full = train_df[TARGET_COL].values.astype(np.float32)

X_test = test_df[feature_cols].values.astype(np.float32)
y_test = test_df[TARGET_COL].values.astype(np.float32)

# 5) Split → train/val
X_tr_np, X_val_np, y_tr_np, y_val_np = train_test_split(
    X_full, y_full, test_size=0.01, random_state=42
)

# 6) TensorDatasets
train_ds = TensorDataset(torch.from_numpy(X_tr_np),  torch.from_numpy(y_tr_np))
val_ds   = TensorDataset(torch.from_numpy(X_val_np), torch.from_numpy(y_val_np))
test_ds  = TensorDataset(torch.from_numpy(X_test),   torch.from_numpy(y_test))

print(f"\ntrain {len(train_ds):,} rows")
print(f"valid {len(val_ds):,} rows")
print(f" test {len(test_ds):,} rows")


Running on: cuda
Removing 12 Greek columns: ['delta_0', 'delta_1', 'delta_2', 'delta_aad_0', 'delta_aad_1', 'delta_aad_2', 'vega_0', 'vega_1', 'vega_2', 'vega_aad_0', 'vega_aad_1', 'vega_aad_2']

Included feature columns (11):
  • sigma_0
  • sigma_1
  • sigma_2
  • corr_0_1
  • corr_0_2
  • corr_1_2
  • r
  • T
  • S0_0/K
  • S0_1/K
  • S0_2/K

train 4,773,294 rows
valid 48,216 rows
 test 48,224 rows


In [None]:
# ---- imports (safe even if already imported in earlier cells) ----
import os, time, random
from datetime import datetime

import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchmetrics import R2Score

# ---- helper for accurate timing on GPU ----
def _now_sync():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.perf_counter()

# ---- model ----
class BasketNet(nn.Module):
    def __init__(self, in_dim: int, width: int, layers: int):
        super().__init__()
        blocks = [nn.Linear(in_dim, width), nn.ReLU()]
        for _ in range(layers - 1):
            blocks += [nn.Linear(width, width), nn.ReLU()]
        blocks.append(nn.Linear(width, 1))
        self.net = nn.Sequential(*blocks)
        # Xavier init
        for m in self.net:
            if isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)

    def forward(self, x):
        return self.net(x)

# ---- training run (save FINAL only) ----
def run_experiment_updates_per_update(
    width: int = 300,
    layers: int = 4,
    batch_size: int = 50_000,
    n_updates: int = 100_000,
    lr: float = 1e-3,
    optimizer_name: str = "Adam",   # "Adam" | "SGD" | "LBFGS"
    seed: int = 42,
    save_model: bool = True,        # save final only
    save_dir: str = "checkpoints",
    log_every: int = 1
):
    """
    Train for exactly n_updates minibatches, time the run, and save ONLY the FINAL checkpoint.
    Returns a dict with timing and final metrics.
    """
    # 1) Seed + loaders
    torch.manual_seed(seed); np.random.seed(seed); random.seed(seed)
    train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True,  pin_memory=True)
    val_loader   = DataLoader(val_ds,   batch_size=batch_size, shuffle=False, pin_memory=True)
    test_loader  = DataLoader(test_ds,  batch_size=batch_size, shuffle=False, pin_memory=True)

    # 2) Model
    in_dim = train_ds[0][0].shape[0]
    model = BasketNet(in_dim, width, layers).to(DEVICE)
    if DEVICE.type == "cuda" and torch.cuda.device_count() > 1:
        model = nn.DataParallel(model)

    # 3) Loss & optimizer
    criterion = nn.MSELoss()
    if optimizer_name == "Adam":
        optimizer = optim.Adam(model.parameters(), lr=lr)
    elif optimizer_name == "SGD":
        optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
    elif optimizer_name == "LBFGS":
        optimizer = optim.LBFGS(model.parameters(), lr=lr, max_iter=20)
    else:
        raise ValueError(f"Unknown optimizer: {optimizer_name}")

    # 4) Infinite train iterator
    def infinite_train_iter(loader):
        while True:
            for batch in loader:
                yield batch
    inf_train = infinite_train_iter(train_loader)

    # 5) Logging
    train_losses, valid_losses, steps_recorded = [], [], []
    print_every = max(1, n_updates // 10)

    # 6) Saving prep (FINAL only)
    os.makedirs(save_dir, exist_ok=True)
    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
    tag = f"w{width}_L{layers}_bs{batch_size}_upd{n_updates}_{optimizer_name}_lr{lr:g}_{timestamp}"
    final_path = os.path.join(save_dir, f"model_{tag}_final.pt")

    def _model_to_save(m):
        return m.module if isinstance(m, nn.DataParallel) else m

    def _save_final(val_mse_last):
        chk = {
            "model_state": _model_to_save(model).state_dict(),
            "optimizer_state": optimizer.state_dict(),
            "config": {
                "in_dim": in_dim, "width": width, "layers": layers,
                "batch_size": batch_size, "n_updates": n_updates,
                "lr": lr, "optimizer_name": optimizer_name, "seed": seed
            },
            "val_mse": float(val_mse_last)
        }
        torch.save(chk, final_path)
        print(f"[SAVE:final] -> {final_path}  (val MSE={val_mse_last:.3e})")

    # 7) Train (timed, GPU-synced)
    t0 = _now_sync()
    model.train()
    for step in range(1, n_updates + 1):
        Xb, yb = next(inf_train)
        Xb, yb = Xb.to(DEVICE), yb.to(DEVICE).unsqueeze(1)

        if optimizer_name == "LBFGS":
            def closure():
                optimizer.zero_grad(set_to_none=True)
                out = model(Xb)
                loss = criterion(out, yb)
                loss.backward()
                return loss
            loss = optimizer.step(closure)
            loss_val = float(loss.item())
        else:
            optimizer.zero_grad(set_to_none=True)
            out = model(Xb)
            loss = criterion(out, yb)
            loss.backward()
            optimizer.step()
            loss_val = float(loss.item())

        # periodic validation
        if step == 1 or step == n_updates or (step % log_every == 0):
            model.eval()
            tot_val, cnt_val = 0.0, 0
            with torch.no_grad():
                for Xv, yv in val_loader:
                    Xv, yv = Xv.to(DEVICE), yv.to(DEVICE).unsqueeze(1)
                    ov = model(Xv)
                    tot_val += criterion(ov, yv).item() * Xv.size(0)
                    cnt_val += Xv.size(0)
            val_mse = tot_val / cnt_val

            steps_recorded.append(step)
            train_losses.append(loss_val)
            valid_losses.append(val_mse)

            if step == 1 or step == n_updates or (step % print_every == 0):
                print(f"[upd {step}/{n_updates}] train MSE={loss_val:.3e}  val MSE={val_mse:.3e}")

            model.train()

    train_time_sec = _now_sync() - t0
    updates_per_sec = n_updates / train_time_sec if train_time_sec > 0 else float("inf")
    samples_per_sec = (n_updates * batch_size) / train_time_sec if train_time_sec > 0 else float("inf")

    # 8) Plot curves (cum-min, same style as before)
    train_min = np.minimum.accumulate(train_losses)
    valid_min = np.minimum.accumulate(valid_losses)
    plt.figure(figsize=(7, 4))
    plt.plot(steps_recorded, np.log10(train_min), label="train (cum-min)")
    plt.plot(steps_recorded, np.log10(valid_min), label="valid (cum-min)")
    plt.xlabel("update step"); plt.ylabel("log10 MSE")
    plt.title(f"{layers}×{width} • bs={batch_size} • {optimizer_name}")
    plt.legend(); plt.tight_layout(); plt.show()

    # 9) Test (evaluates the last trained weights)
    model.eval()
    all_preds, all_truths = [], []
    with torch.no_grad():
        for Xb, yb in test_loader:
            pr = model(Xb.to(DEVICE)).cpu().squeeze()
            all_preds.append(pr); all_truths.append(yb)
    preds, truths = torch.cat(all_preds), torch.cat(all_truths)
    test_mse = criterion(preds.unsqueeze(1), truths.unsqueeze(1)).item()
    test_r2  = R2Score()(preds, truths).cpu().item()

    print(f"\nTrain wall time: {train_time_sec:.2f}s | {updates_per_sec:.2f} upd/s | {samples_per_sec:.0f} samples/s")
    print(f"Test results (final weights) → MSE = {test_mse:.3e}   R² = {test_r2:.4f}")

    # 10) Save FINAL (only)
    if save_model:
        _save_final(valid_losses[-1])

    return {
        "train_time_sec": train_time_sec,
        "updates_per_sec": updates_per_sec,
        "samples_per_sec": samples_per_sec,
        "final_val_mse": valid_losses[-1],
        "test_mse": test_mse,
        "test_r2": test_r2,
        "final_path": final_path if save_model else None
    }

# ---- example call ----
res = run_experiment_updates_per_update(
    width=250,
    layers=5,
    batch_size=5_000,
    n_updates=100_000,
    lr=1e-3,
    optimizer_name="Adam",
    seed=42,
    save_model=True,        # FINAL only
    save_dir="checkpoints",
    log_every=100
)
print(res)


[upd 1/100000] train MSE=7.066e-02  val MSE=1.460e-01


### Out of Sample Test

In [None]:
# Cell — Predict price + AAD Greeks from the model and save

import torch, pandas as pd, numpy as np
from pathlib import Path
from torch import nn
from collections import OrderedDict

# ---------------- config ----------------
TEST_PARQUET   = "Test_Clean.parquet"      # or "Test_clean_5k.parquet"
MODEL_CKPT     = "model_w250_L5_bs5000_upd100000_Adam_lr0.001.pt"
OUT_PARQUET    = "Test_ModelAAD_price+greeks.parquet"
TARGET_COL     = "price/k"
N_ASSETS       = 3
DEVICE         = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.set_default_dtype(torch.float32)

# ------------ helpers ------------
def drop_greek_cols(df: pd.DataFrame):
    """Remove any columns containing 'delta' or 'vega' (case-insensitive)."""
    bad = [c for c in df.columns if ("delta" in c.lower()) or ("vega" in c.lower())]
    return df.drop(columns=bad), bad

def load_features(df: pd.DataFrame):
    """Derive feature columns by removing Greeks + target."""
    df_no_greeks, dropped = drop_greek_cols(df)
    if dropped:
        print(f"Removed {len(dropped)} Greek columns: {sorted(dropped)}")
    feature_cols = [c for c in df_no_greeks.columns if c != TARGET_COL]
    print(f"Using {len(feature_cols)} feature columns:")
    for c in feature_cols: print("  •", c)
    return feature_cols

def ensure_spot_vol_indices(feature_cols):
    name_to_idx = {c: i for i, c in enumerate(feature_cols)}
    spot_names  = [f"S0_{i}/K" for i in range(N_ASSETS)]
    vol_names   = [f"sigma_{i}" for i in range(N_ASSETS)]
    missing = [n for n in (spot_names + vol_names) if n not in name_to_idx]
    if missing:
        raise KeyError(f"Required columns not found in features: {missing}")
    spot_idx = [name_to_idx[n] for n in spot_names]
    vol_idx  = [name_to_idx[n] for n in vol_names]
    return spot_idx, vol_idx

def flexible_load_state_dict(model: nn.Module, ckpt_path: str):
    """Load either a raw state_dict or a checkpoint dict with 'model_state'. Handles module. prefixes."""
    obj = torch.load(ckpt_path, map_location=DEVICE)
    state = obj.get("model_state", obj) if isinstance(obj, dict) else obj
    try:
        model.load_state_dict(state, strict=True)
        return
    except RuntimeError:
        # strip potential 'module.' prefixes (from DataParallel)
        fixed = OrderedDict((k.replace("module.", ""), v) for k, v in state.items())
        model.load_state_dict(fixed, strict=True)

# ------------ model ------------
class BasketNet(nn.Module):
    def __init__(self, d, w=250, L=5):
        super().__init__()
        layers = [nn.Linear(d,w), nn.ReLU()]
        for _ in range(L-1):
            layers += [nn.Linear(w,w), nn.ReLU()]
        layers.append(nn.Linear(w,1))
        self.net = nn.Sequential(*layers)
        for m in self.net:
            if isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)
    def forward(self, x):
        return self.net(x).squeeze(-1)

# ------------ load data & features ------------
df = pd.read_parquet(TEST_PARQUET, engine="pyarrow")
feature_cols = load_features(df)
spot_idx, vol_idx = ensure_spot_vol_indices(feature_cols)

# ------------ build model & load weights ------------
model = BasketNet(d=len(feature_cols), w=250, L=5).to(DEVICE)
flexible_load_state_dict(model, MODEL_CKPT)
model.eval()

# ------------ predict in batches (price + AAD Greeks) ------------
BATCH = 100_000  # safe and fast; adjust if needed
n = len(df)
prices, deltas, vegas = [], [], []

for i in range(0, n, BATCH):
    block = df.iloc[i:i+BATCH][feature_cols].values.astype(np.float32)
    X = torch.from_numpy(block).to(DEVICE).requires_grad_(True)

    # forward price
    p = model(X)  # (B,)

    # AAD gradients wrt inputs
    g = torch.autograd.grad(outputs=p, inputs=X, grad_outputs=torch.ones_like(p))[0]  # (B, F)

    # slice Greeks
    d_blk = g[:, spot_idx].detach().cpu().numpy()
    v_blk = g[:, vol_idx].detach().cpu().numpy()
    prices.append(p.detach().cpu().numpy())
    deltas.append(d_blk)
    vegas.append(v_blk)

# stack outputs
price_pred = np.concatenate(prices, axis=0)
delta_pred = np.vstack(deltas)
vega_pred  = np.vstack(vegas)

# ------------ assemble & save ------------
delta_cols = [f"delta_{i}" for i in range(N_ASSETS)]
vega_cols  = [f"vega_{i}"  for i in range(N_ASSETS)]
out = pd.DataFrame(
    np.column_stack([price_pred, delta_pred, vega_pred]),
    columns=[TARGET_COL] + delta_cols + vega_cols,
    index=df.index
)
out.to_parquet(OUT_PARQUET, engine="pyarrow", compression="zstd", index=True)
print(f"Wrote {OUT_PARQUET} with columns: {[TARGET_COL] + delta_cols + vega_cols}")


# Dual Headed Neural Net 

## Explananation/ Reasoning 

## Code

In [None]:
import os
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

# 1) Device & seeds
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Running on:", DEVICE)
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

# 2) Load dataset with price and FD Greeks
df = pd.read_parquet("Train_clean_5m_aad_greeks.parquet", engine="pyarrow")

# 3) Define target and feature columns
target_cols = ["price/k"] + [f"delta_{i}" for i in range(3)] + [f"vega_{i}" for i in range(3)]
feature_cols = [c for c in df.columns if c not in target_cols]

# 4) Extract NumPy arrays
X = df[feature_cols].values.astype(np.float32)
y = df[target_cols].values.astype(np.float32)

# 5) Train/val split (99/1)
X_tr, X_val, y_tr, y_val = train_test_split(X, y, test_size=0.01, random_state=42)

# 6) Separate price, delta, vega targets
price_tr, delta_tr, vega_tr = y_tr[:,0], y_tr[:,1:4], y_tr[:,4:7]
price_val, delta_val, vega_val = y_val[:,0], y_val[:,1:4], y_val[:,4:7]

# 7) Build PyTorch datasets
train_ds = TensorDataset(
    torch.from_numpy(X_tr),
    torch.from_numpy(price_tr),
    torch.from_numpy(delta_tr),
    torch.from_numpy(vega_tr)
)
val_ds = TensorDataset(
    torch.from_numpy(X_val),
    torch.from_numpy(price_val),
    torch.from_numpy(delta_val),
    torch.from_numpy(vega_val)
)
print(f"train {len(train_ds):,} rows")
print(f"valid {len(val_ds):,} rows")

# 8) Define single-output BasketNet model with Softplus activations
class BasketNet(nn.Module):
    def __init__(self, in_dim, width=300, layers=4):
        super().__init__()
        blocks = [nn.Linear(in_dim, width), nn.Softplus()]
        for _ in range(layers - 1):
            blocks += [nn.Linear(width, width), nn.Softplus()]
        blocks.append(nn.Linear(width, 1))
        self.net = nn.Sequential(*blocks)
        # Xavier init for all Linear layers
        for m in self.net:
            if isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)

    def forward(self, x):
        # returns shape (batch,)
        return self.net(x).squeeze(1)

# 9) Feature indices for Greeks
delta_idx = [feature_cols.index(f"S0_{i}/K") for i in range(3)]
vega_idx  = [feature_cols.index(f"sigma_{i}") for i in range(3)]

# 10) Training loop with Sobolev (differential) loss
def run_differential(
    width=300,
    layers=4,
    batch_size=50_000,
    n_updates=100_000,
    lr=1e-3,
    λ=1.0,
    log_every=5_000
):
    torch.manual_seed(42)
    np.random.seed(42)
    random.seed(42)

    train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, pin_memory=True)
    val_loader   = DataLoader(val_ds,   batch_size=batch_size, shuffle=False, pin_memory=True)

    model     = BasketNet(len(feature_cols), width, layers).to(DEVICE)
    optimizer = optim.Adam(model.parameters(), lr=lr)
    criterion = nn.MSELoss()

    train_losses, valid_losses, steps = [], [], []

    # infinite iterator for training
    def inf_iter(loader):
        while True:
            for batch in loader:
                yield batch
    train_iter = inf_iter(train_loader)

    model.train()
    for step in range(1, n_updates + 1):
        Xb, p_true, d_true, v_true = next(train_iter)
        Xb = Xb.to(DEVICE).requires_grad_(True)
        p_true, d_true, v_true = [t.to(DEVICE) for t in (p_true, d_true, v_true)]

        optimizer.zero_grad()

        # forward price
        p_pred = model(Xb)
        loss_p = criterion(p_pred, p_true)

        # compute autograd Greeks (retain graph for Sobolev loss)
        grad = torch.autograd.grad(
            outputs=p_pred,
            inputs=Xb,
            grad_outputs=torch.ones_like(p_pred),
            create_graph=True
        )[0]
        # extract predictions
        d_pred = grad[:, delta_idx]
        v_pred = grad[:, vega_idx]
        loss_g = criterion(d_pred, d_true) + criterion(v_pred, v_true)

        # total loss
        loss = loss_p + λ * loss_g
        loss.backward()
        optimizer.step()

        # validation & logging
        if step == 1 or step % log_every == 0 or step == n_updates:
            model.eval()
            tot_val, cnt = 0.0, 0
            for Xv, pv, dv, vv in val_loader:
                Xv = Xv.to(DEVICE).requires_grad_(True)
                pv, dv, vv = [t.to(DEVICE) for t in (pv, dv, vv)]

                pp = model(Xv)
                lp = criterion(pp, pv)

                g = torch.autograd.grad(
                    outputs=pp,
                    inputs=Xv,
                    grad_outputs=torch.ones_like(pp),
                    create_graph=False
                )[0]
                lg = criterion(g[:, delta_idx], dv) + criterion(g[:, vega_idx], vv)

                tot_val += (lp + λ * lg).item() * Xv.size(0)
                cnt     += Xv.size(0)

            val_loss = tot_val / cnt
            train_losses.append(loss.item())
            valid_losses.append(val_loss)
            steps.append(step)
            print(f"[upd {step}/{n_updates}] train loss={loss.item():.3e}  val loss={val_loss:.3e}")
            model.train()

    # plot training curves
    plt.figure(figsize=(7,4))
    plt.plot(steps, np.log10(train_losses), label="train")
    plt.plot(steps, np.log10(valid_losses), label="valid")
    plt.xlabel("update step")
    plt.ylabel("log10 loss")
    plt.legend()
    plt.tight_layout()
    plt.show()

    return model

# 11) Run training
model = run_differential(
    width=250,
    layers=5,
    batch_size=5_000,
    n_updates=100_000,
    lr=1e-3,
    λ=1.0,
    log_every=5_000
)

# 12) Test evaluation
test_df = pd.read_parquet("Test_clean_5k_aad_greeks.parquet", engine="pyarrow")
X_test = torch.from_numpy(test_df[feature_cols].values.astype(np.float32)).to(DEVICE).requires_grad_(True)
y_test = test_df[target_cols].values.astype(np.float32)

model.eval()
# forward pass
p_pred_tensor = model(X_test)
# compute AAD Greeks
grad_test = torch.autograd.grad(
    outputs=p_pred_tensor,
    inputs=X_test,
    grad_outputs=torch.ones_like(p_pred_tensor)
)[0]

# convert to numpy
price_pred = p_pred_tensor.detach().cpu().numpy()
delta_pred = grad_test[:, delta_idx].detach().cpu().numpy()
vega_pred  = grad_test[:, vega_idx].detach().cpu().numpy()

# true targets
p_true = y_test[:, 0]
d_true = y_test[:, 1:4]
v_true = y_test[:, 4:7]

# 13) Metrics & plots
def report_and_plot(name, pred, true):
    mae  = np.mean(np.abs(pred - true))
    rmse = np.sqrt(np.mean((pred - true)**2))
    r2   = np.corrcoef(pred, true)[0,1]**2
    print(f"{name:>8s}: MAE={mae:.3e}, RMSE={rmse:.3e}, R²={r2:.4f}")
    plt.figure(figsize=(4,4))
    plt.scatter(true, pred, s=3, alpha=0.3)
    lo, hi = min(true.min(), pred.min()), max(true.max(), pred.max())
    plt.plot([lo,hi],[lo,hi],'k--')
    plt.title(f"{name}: MAE={mae:.3e}, RMSE={rmse:.3e}, R²={r2:.4f}")
    plt.xlabel("True")
    plt.ylabel("Predicted")
    plt.grid(ls=":")
    plt.tight_layout()
    plt.show()

print("\nTest set performance:")
report_and_plot("Price", price_pred, p_true)
for i in range(3):
    report_and_plot(f"Delta_{i}", delta_pred[:, i], d_true[:, i])
for i in range(3):
    report_and_plot(f"Vega_{i}", vega_pred[:, i], v_true[:, i])

# 14) Save NN predictions for Test and Train (5k) in comparison-ready format
pred_cols = ["price/k"] + [f"delta_{i}" for i in range(3)] + [f"vega_{i}" for i in range(3)]

def predict_df(df_like: pd.DataFrame, batch_size: int = 100_000) -> pd.DataFrame:
    """Return a DataFrame with columns ['price/k','delta_0..2','vega_0..2'] for rows in df_like."""
    model.eval()
    outs = []
    idxs = []
    n = len(df_like)
    for i in range(0, n, batch_size):
        Xb_np = df_like.iloc[i:i+batch_size][feature_cols].values.astype(np.float32)
        Xb = torch.from_numpy(Xb_np).to(DEVICE)
        Xb.requires_grad_(True)

        pb = model(Xb)  # (B,)
        gb = torch.autograd.grad(outputs=pb, inputs=Xb, grad_outputs=torch.ones_like(pb))[0]  # (B, F)

        block = np.column_stack([
            pb.detach().cpu().numpy(),
            gb[:, delta_idx].detach().cpu().numpy(),
            gb[:, vega_idx].detach().cpu().numpy()
        ])
        outs.append(block)
        idxs.append(df_like.index[i:i+batch_size])

    out = np.vstack(outs) if len(outs) > 1 else outs[0]
    pred = pd.DataFrame(out, columns=pred_cols, index=pd.Index(np.concatenate([np.array(ix) for ix in idxs])))
    pred = pred.loc[df_like.index]  # preserve original order
    return pred

# Test predictions saved as Test_clean_5k_NN.parquet
test_pred_df = predict_df(test_df)
test_pred_df.to_parquet("Test_clean_5k_NN.parquet", engine="pyarrow", index=True)

# Train 5k sample predictions saved as Train_clean_5k_NN.parquet
train_5k = df.sample(n=5000, random_state=42)
train_pred_df = predict_df(train_5k)
train_pred_df.to_parquet("Train_clean_5k_NN.parquet", engine="pyarrow", index=True)

print("Saved: Test_clean_5k_NN.parquet  and  Train_clean_5k_NN.parquet")


# Results

In [1]:
import pandas as pd
import numpy as np

# ——— load as before ———
aad_mc     = pd.read_parquet("Test_clean_5k_aad_greeks.parquet")
fd_mc      = pd.read_parquet("Test_clean_5k_fd_greeks.parquet")
model_aad  = pd.read_parquet("Test_clean_5k_Model+AAD_greeks.parquet")
model_nn   = pd.read_parquet("Test_clean_5k_NN.parquet")   # <-- added NN predictions

delta_cols = [f"delta_{i}" for i in range(3)]
vega_cols  = [f"vega_{i}"  for i in range(3)]

def enhanced_stats(stored: pd.DataFrame, model: pd.DataFrame, cols):
    diffs     = model[cols] - stored[cols]
    abs_diffs = diffs.abs()
    sq_diffs  = diffs ** 2

    stats = pd.DataFrame(index=cols)
    stats['count']     = diffs.count().values
    stats['mean_diff'] = diffs.mean().values
    stats['std_diff']  = diffs.std().values
    stats['min_diff']  = diffs.min().values
    stats['25%']       = diffs.quantile(0.25).values
    stats['50%']       = diffs.median().values
    stats['75%']       = diffs.quantile(0.75).values
    stats['max_diff']  = diffs.max().values

    # additional error metrics
    stats['MAE']  = abs_diffs.mean().values
    stats['MSE']  = sq_diffs.mean().values
    stats['RMSE'] = np.sqrt(stats['MSE'])

    # R^2 = (Pearson r)^2
    r2_list = []
    for col in cols:
        r = stored[col].corr(model[col])
        r2_list.append(r**2)
    stats['R2'] = r2_list

    return stats

# ——— compute vs Model+AAD (existing) ———
delta_aad_vs_mc_aad = enhanced_stats(aad_mc,    model_aad, delta_cols)
delta_aad_vs_mc_fd  = enhanced_stats(fd_mc,     model_aad, delta_cols)
vega_aad_vs_mc_aad  = enhanced_stats(aad_mc,    model_aad, vega_cols)
vega_aad_vs_mc_fd   = enhanced_stats(fd_mc,     model_aad, vega_cols)

# ——— compute vs Model NN (added) ———
delta_nn_vs_mc_aad = enhanced_stats(aad_mc,    model_nn, delta_cols)
delta_nn_vs_mc_fd  = enhanced_stats(fd_mc,     model_nn, delta_cols)
vega_nn_vs_mc_aad  = enhanced_stats(aad_mc,    model_nn, vega_cols)
vega_nn_vs_mc_fd   = enhanced_stats(fd_mc,     model_nn, vega_cols)

# ——— display ———
print("=== Delta: Model AAD vs MC AAD ===")
display(delta_aad_vs_mc_aad)

print("\n=== Delta: Model AAD vs MC FD ===")
display(delta_aad_vs_mc_fd)

print("\n=== Vega: Model AAD vs MC AAD ===")
display(vega_aad_vs_mc_aad)

print("\n=== Vega: Model AAD vs MC FD ===")
display(vega_aad_vs_mc_fd)

print("\n=== Delta: Model NN vs MC AAD ===")
display(delta_nn_vs_mc_aad)

print("\n=== Delta: Model NN vs MC FD ===")
display(delta_nn_vs_mc_fd)

print("\n=== Vega: Model NN vs MC AAD ===")
display(vega_nn_vs_mc_aad)

print("\n=== Vega: Model NN vs MC FD ===")
display(vega_nn_vs_mc_fd)


=== Delta: Model AAD vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
delta_0,4806,-0.000707,0.014806,-0.15729,-0.005265,-0.000404,0.004152,0.110328,0.008789,0.00022,0.014821,0.993985
delta_1,4806,0.000362,0.016276,-0.152547,-0.004731,4.4e-05,0.004693,0.160013,0.009169,0.000265,0.016278,0.993185
delta_2,4806,-0.000707,0.014941,-0.124199,-0.005217,-0.000312,0.003866,0.183293,0.008755,0.000224,0.014956,0.993885



=== Delta: Model AAD vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
delta_0,4806,-0.000707,0.014806,-0.15729,-0.005265,-0.000404,0.004151,0.110327,0.008789,0.00022,0.014821,0.993985
delta_1,4806,0.000362,0.016276,-0.152546,-0.004731,4.4e-05,0.004693,0.16001,0.009169,0.000265,0.016278,0.993185
delta_2,4806,-0.000707,0.014941,-0.124199,-0.005217,-0.000312,0.003866,0.183294,0.008755,0.000224,0.014956,0.993885



=== Vega: Model AAD vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
vega_0,4806,0.000123,0.023004,-0.176788,-0.009497,0.000137,0.008973,0.197933,0.014824,0.000529,0.023002,0.985453
vega_1,4806,0.000515,0.024158,-0.311557,-0.008432,0.000171,0.009013,0.248155,0.014766,0.000584,0.024161,0.982299
vega_2,4806,-0.000697,0.024758,-0.297383,-0.009155,-0.000226,0.008687,0.199936,0.014985,0.000613,0.024765,0.982617



=== Vega: Model AAD vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
vega_0,4806,0.000123,0.023004,-0.176788,-0.009497,0.000138,0.008973,0.197929,0.014824,0.000529,0.023002,0.985453
vega_1,4806,0.000515,0.024158,-0.311557,-0.008432,0.000171,0.009013,0.248155,0.014766,0.000584,0.024161,0.982299
vega_2,4806,-0.000697,0.024758,-0.297384,-0.009155,-0.000226,0.008687,0.199935,0.014985,0.000613,0.024765,0.982617



=== Delta: Model NN vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
delta_0,4806,0.000782,0.004059,-0.0585,-0.000192,0.00048,0.001433,0.108188,0.001932,1.7e-05,0.004133,0.999558
delta_1,4806,-0.000553,0.005777,-0.113954,-0.001832,-0.000571,0.00026,0.109329,0.002547,3.4e-05,0.005803,0.99914
delta_2,4806,0.000767,0.003752,-0.045523,-0.000206,0.000585,0.001585,0.061217,0.001995,1.5e-05,0.003829,0.999615



=== Delta: Model NN vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
delta_0,4806,0.000782,0.004059,-0.0585,-0.000192,0.00048,0.001433,0.108188,0.001932,1.7e-05,0.004133,0.999558
delta_1,4806,-0.000553,0.005777,-0.113954,-0.001832,-0.000571,0.00026,0.109329,0.002547,3.4e-05,0.005803,0.99914
delta_2,4806,0.000767,0.003752,-0.045524,-0.000205,0.000584,0.001585,0.061217,0.001995,1.5e-05,0.003829,0.999615



=== Vega: Model NN vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
vega_0,4806,0.001093,0.00536,-0.079406,-0.000852,0.000774,0.002787,0.115532,0.003029,3e-05,0.00547,0.99921
vega_1,4806,0.000312,0.006909,-0.20994,-0.001675,-0.000168,0.001805,0.16233,0.003273,4.8e-05,0.006915,0.998535
vega_2,4806,-0.000766,0.006298,-0.14359,-0.002406,-0.000534,0.001242,0.07115,0.003396,4e-05,0.006344,0.998876



=== Vega: Model NN vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
vega_0,4806,0.001093,0.00536,-0.079406,-0.000852,0.000773,0.002787,0.115532,0.003029,3e-05,0.00547,0.99921
vega_1,4806,0.000312,0.006909,-0.209942,-0.001675,-0.000168,0.001805,0.16233,0.003273,4.8e-05,0.006915,0.998535
vega_2,4806,-0.000766,0.006298,-0.143589,-0.002406,-0.000534,0.001242,0.07115,0.003396,4e-05,0.006344,0.998876


In [1]:
import pandas as pd
import numpy as np

# ——— load as before ———
aad_mc     = pd.read_parquet("Test_clean_5k_aad_greeks.parquet")
fd_mc      = pd.read_parquet("Test_clean_5k_fd_greeks.parquet")
model_aad  = pd.read_parquet("Test_clean_5k_Model+AAD_greeks.parquet")
model_nn   = pd.read_parquet("Test_clean_5k_NN.parquet")   # <-- added NN predictions

delta_cols = [f"delta_{i}" for i in range(3)]
vega_cols  = [f"vega_{i}"  for i in range(3)]

# --- helpers ---
def find_price_col(df: pd.DataFrame) -> str:
    candidates = ["price/k", "price_norm", "price", "option_price", "y_price"]
    for c in candidates:
        if c in df.columns:
            return c
    raise KeyError("No price column found. Tried: " + ", ".join(candidates))

PRICE_COL_AAD  = find_price_col(aad_mc)
PRICE_COL_FD   = find_price_col(fd_mc)
PRICE_COL_MAAD = find_price_col(model_aad)
PRICE_COL_MNN  = find_price_col(model_nn)

def enhanced_stats(stored: pd.DataFrame, model: pd.DataFrame, cols):
    """General stats for vectors (Greeks or price)."""
    diffs     = model[cols] - stored[cols]
    abs_diffs = diffs.abs()
    sq_diffs  = diffs ** 2

    stats = pd.DataFrame(index=cols)
    stats['count']     = diffs.count().values
    stats['mean_diff'] = diffs.mean().values
    stats['std_diff']  = diffs.std().values
    stats['min_diff']  = diffs.min().values
    stats['25%']       = diffs.quantile(0.25).values
    stats['50%']       = diffs.median().values
    stats['75%']       = diffs.quantile(0.75).values
    stats['max_diff']  = diffs.max().values

    # additional error metrics
    stats['MAE']  = abs_diffs.mean().values
    stats['MSE']  = sq_diffs.mean().values
    stats['RMSE'] = np.sqrt(stats['MSE'])

    # R^2 = (Pearson r)^2
    r2_list = []
    for col in cols:
        r = stored[col].corr(model[col])
        r2_list.append((0.0 if pd.isna(r) else r**2))
    stats['R2'] = r2_list

    return stats

def price_stats(stored: pd.Series, model: pd.Series) -> pd.DataFrame:
    """Price-focused stats including MAPE/sMAPE and bias."""
    diffs = model - stored
    abs_diffs = diffs.abs()
    sq_diffs = diffs**2

    # Avoid divide-by-zero in MAPE: exclude zeros
    nonzero = stored != 0
    mape = (abs_diffs[nonzero] / stored[nonzero]).mean() if nonzero.any() else np.nan
    mpe  = (diffs[nonzero] / stored[nonzero]).mean() if nonzero.any() else np.nan
    # sMAPE: symmetric MAPE (robust when scale varies)
    denom = (stored.abs() + model.abs())
    smape = (2 * abs_diffs / denom.replace(0, np.nan)).mean()

    r = stored.corr(model)
    r2 = 0.0 if pd.isna(r) else r**2

    out = pd.DataFrame({
        "count":       [diffs.count()],
        "mean_diff":   [diffs.mean()],
        "std_diff":    [diffs.std()],
        "min_diff":    [diffs.min()],
        "25%":         [diffs.quantile(0.25)],
        "50%":         [diffs.median()],
        "75%":         [diffs.quantile(0.75)],
        "max_diff":    [diffs.max()],
        "MAE":         [abs_diffs.mean()],
        "MSE":         [sq_diffs.mean()],
        "RMSE":        [np.sqrt(sq_diffs.mean())],
        "MAPE":        [mape],
        "sMAPE":       [smape],
        "MPE_bias":    [mpe],
        "R2":          [r2],
    }, index=["price"])
    return out

# ——— Greeks vs Model+AAD (existing) ———
delta_aad_vs_mc_aad = enhanced_stats(aad_mc,    model_aad, delta_cols)
delta_aad_vs_mc_fd  = enhanced_stats(fd_mc,     model_aad, delta_cols)
vega_aad_vs_mc_aad  = enhanced_stats(aad_mc,    model_aad, vega_cols)
vega_aad_vs_mc_fd   = enhanced_stats(fd_mc,     model_aad, vega_cols)

# ——— Greeks vs Model NN (added) ———
delta_nn_vs_mc_aad = enhanced_stats(aad_mc,    model_nn, delta_cols)
delta_nn_vs_mc_fd  = enhanced_stats(fd_mc,     model_nn, delta_cols)
vega_nn_vs_mc_aad  = enhanced_stats(aad_mc,    model_nn, vega_cols)
vega_nn_vs_mc_fd   = enhanced_stats(fd_mc,     model_nn, vega_cols)

# ——— Price vs Model+AAD ———
price_aad_vs_mc_aad = price_stats(aad_mc[PRICE_COL_AAD], model_aad[PRICE_COL_MAAD])
price_aad_vs_mc_fd  = price_stats(fd_mc[PRICE_COL_FD],   model_aad[PRICE_COL_MAAD])

# ——— Price vs Model NN ———
price_nn_vs_mc_aad  = price_stats(aad_mc[PRICE_COL_AAD], model_nn[PRICE_COL_MNN])
price_nn_vs_mc_fd   = price_stats(fd_mc[PRICE_COL_FD],   model_nn[PRICE_COL_MNN])

# ——— display ———
print("=== Delta: Model AAD vs MC AAD ===")
display(delta_aad_vs_mc_aad)

print("\n=== Delta: Model AAD vs MC FD ===")
display(delta_aad_vs_mc_fd)

print("\n=== Vega: Model AAD vs MC AAD ===")
display(vega_aad_vs_mc_aad)

print("\n=== Vega: Model AAD vs MC FD ===")
display(vega_aad_vs_mc_fd)

print("\n=== Delta: Model NN vs MC AAD ===")
display(delta_nn_vs_mc_aad)

print("\n=== Delta: Model NN vs MC FD ===")
display(delta_nn_vs_mc_fd)

print("\n=== Vega: Model NN vs MC AAD ===")
display(vega_nn_vs_mc_aad)

print("\n=== Vega: Model NN vs MC FD ===")
display(vega_nn_vs_mc_fd)

print("\n=== PRICE: Model AAD vs MC AAD ===")
display(price_aad_vs_mc_aad)

print("\n=== PRICE: Model AAD vs MC FD ===")
display(price_aad_vs_mc_fd)

print("\n=== PRICE: Model NN vs MC AAD ===")
display(price_nn_vs_mc_aad)

print("\n=== PRICE: Model NN vs MC FD ===")
display(price_nn_vs_mc_fd)


KeyError: 'No price column found. Tried: price/k, price_norm, price, option_price, y_price'

# Conclusion

In [3]:
import pandas as pd

# Load all the parquet files
aad_mc     = pd.read_parquet("Test_clean_5k_aad_greeks.parquet")
fd_mc      = pd.read_parquet("Test_clean_5k_fd_greeks.parquet")
model_aad  = pd.read_parquet("Test_clean_5k_Model+AAD_greeks.parquet")
model_nn   = pd.read_parquet("Test_clean_5k_NN.parquet")

# Print column names
print("=== Columns in aad_mc ===")
print(aad_mc.columns.tolist(), "\n")

print("=== Columns in fd_mc ===")
print(fd_mc.columns.tolist(), "\n")

print("=== Columns in model_aad ===")
print(model_aad.columns.tolist(), "\n")

print("=== Columns in model_nn ===")
print(model_nn.columns.tolist(), "\n")


=== Columns in aad_mc ===
['sigma_0', 'sigma_1', 'sigma_2', 'corr_0_1', 'corr_0_2', 'corr_1_2', 'r', 'T', 'S0_0/K', 'S0_1/K', 'S0_2/K', 'price/k', 'delta_0', 'delta_1', 'delta_2', 'vega_0', 'vega_1', 'vega_2'] 

=== Columns in fd_mc ===
['sigma_0', 'sigma_1', 'sigma_2', 'corr_0_1', 'corr_0_2', 'corr_1_2', 'r', 'T', 'S0_0/K', 'S0_1/K', 'S0_2/K', 'price/k', 'delta_0', 'delta_1', 'delta_2', 'vega_0', 'vega_1', 'vega_2'] 

=== Columns in model_aad ===
['delta_0', 'delta_1', 'delta_2', 'vega_0', 'vega_1', 'vega_2', 'price/k'] 

=== Columns in model_nn ===
['price/k', 'delta_0', 'delta_1', 'delta_2', 'vega_0', 'vega_1', 'vega_2'] 



In [5]:
import pandas as pd
import numpy as np

# ——— load as before ———
aad_mc     = pd.read_parquet("Test_clean_5k_aad_greeks.parquet")
fd_mc      = pd.read_parquet("Test_clean_5k_fd_greeks.parquet")
model_aad  = pd.read_parquet("Test_clean_5k_Model+AAD_greeks.parquet")
model_nn   = pd.read_parquet("Test_clean_5k_NN.parquet")   # <-- added NN predictions

delta_cols = [f"delta_{i}" for i in range(3)]
vega_cols  = [f"vega_{i}"  for i in range(3)]
price_col  = "price/k"   # fixed now

# --- helpers ---
def enhanced_stats(stored: pd.DataFrame, model: pd.DataFrame, cols):
    diffs     = model[cols] - stored[cols]
    abs_diffs = diffs.abs()
    sq_diffs  = diffs ** 2

    stats = pd.DataFrame(index=cols)
    stats['count']     = diffs.count().values
    stats['mean_diff'] = diffs.mean().values
    stats['std_diff']  = diffs.std().values
    stats['min_diff']  = diffs.min().values
    stats['25%']       = diffs.quantile(0.25).values
    stats['50%']       = diffs.median().values
    stats['75%']       = diffs.quantile(0.75).values
    stats['max_diff']  = diffs.max().values

    # additional error metrics
    stats['MAE']  = abs_diffs.mean().values
    stats['MSE']  = sq_diffs.mean().values
    stats['RMSE'] = np.sqrt(stats['MSE'])

    # R^2 = (Pearson r)^2
    r2_list = []
    for col in cols:
        r = stored[col].corr(model[col])
        r2_list.append((0.0 if pd.isna(r) else r**2))
    stats['R2'] = r2_list

    return stats

def price_stats(stored: pd.Series, model: pd.Series) -> pd.DataFrame:
    diffs = model - stored
    abs_diffs = diffs.abs()
    sq_diffs = diffs**2

    # Avoid divide-by-zero in MAPE: exclude zeros
    nonzero = stored != 0
    mape = (abs_diffs[nonzero] / stored[nonzero]).mean() if nonzero.any() else np.nan
    mpe  = (diffs[nonzero] / stored[nonzero]).mean() if nonzero.any() else np.nan
    # sMAPE
    denom = (stored.abs() + model.abs())
    smape = (2 * abs_diffs / denom.replace(0, np.nan)).mean()

    r = stored.corr(model)
    r2 = 0.0 if pd.isna(r) else r**2

    out = pd.DataFrame({
        "count":       [diffs.count()],
        "mean_diff":   [diffs.mean()],
        "std_diff":    [diffs.std()],
        "min_diff":    [diffs.min()],
        "25%":         [diffs.quantile(0.25)],
        "50%":         [diffs.median()],
        "75%":         [diffs.quantile(0.75)],
        "max_diff":    [diffs.max()],
        "MAE":         [abs_diffs.mean()],
        "MSE":         [sq_diffs.mean()],
        "RMSE":        [np.sqrt(sq_diffs.mean())],
        "MAPE":        [mape],
        "sMAPE":       [smape],
        "MPE_bias":    [mpe],
        "R2":          [r2],
    }, index=["price/k"])
    return out

# ——— Greeks vs Model+AAD ———
delta_aad_vs_mc_aad = enhanced_stats(aad_mc,    model_aad, delta_cols)
delta_aad_vs_mc_fd  = enhanced_stats(fd_mc,     model_aad, delta_cols)
vega_aad_vs_mc_aad  = enhanced_stats(aad_mc,    model_aad, vega_cols)
vega_aad_vs_mc_fd   = enhanced_stats(fd_mc,     model_aad, vega_cols)

# ——— Greeks vs Model NN ———
delta_nn_vs_mc_aad = enhanced_stats(aad_mc,    model_nn, delta_cols)
delta_nn_vs_mc_fd  = enhanced_stats(fd_mc,     model_nn, delta_cols)
vega_nn_vs_mc_aad  = enhanced_stats(aad_mc,    model_nn, vega_cols)
vega_nn_vs_mc_fd   = enhanced_stats(fd_mc,     model_nn, vega_cols)

# ——— Price vs Model+AAD ———
price_aad_vs_mc_aad = price_stats(aad_mc[price_col], model_aad[price_col])
price_aad_vs_mc_fd  = price_stats(fd_mc[price_col],   model_aad[price_col])

# ——— Price vs Model NN ———
price_nn_vs_mc_aad  = price_stats(aad_mc[price_col], model_nn[price_col])
price_nn_vs_mc_fd   = price_stats(fd_mc[price_col],   model_nn[price_col])

# ——— display ———
print("=== Delta: Model AAD vs MC AAD ===")
display(delta_aad_vs_mc_aad)

print("\n=== Delta: Model AAD vs MC FD ===")
display(delta_aad_vs_mc_fd)

print("\n=== Vega: Model AAD vs MC AAD ===")
display(vega_aad_vs_mc_aad)

print("\n=== Vega: Model AAD vs MC FD ===")
display(vega_aad_vs_mc_fd)

print("\n=== Delta: Model NN vs MC AAD ===")
display(delta_nn_vs_mc_aad)

print("\n=== Delta: Model NN vs MC FD ===")
display(delta_nn_vs_mc_fd)

print("\n=== Vega: Model NN vs MC AAD ===")
display(vega_nn_vs_mc_aad)

print("\n=== Vega: Model NN vs MC FD ===")
display(vega_nn_vs_mc_fd)

print("\n=== PRICE: Model AAD vs MC AAD ===")
display(price_aad_vs_mc_aad)

print("\n=== PRICE: Model AAD vs MC FD ===")
display(price_aad_vs_mc_fd)

print("\n=== PRICE: Model NN vs MC AAD ===")
display(price_nn_vs_mc_aad)

print("\n=== PRICE: Model NN vs MC FD ===")
display(price_nn_vs_mc_fd)


=== Delta: Model AAD vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
delta_0,4806,-0.000385,0.014825,-0.155182,-0.005033,-0.000296,0.004435,0.111641,0.008801,0.00022,0.014829,0.993985
delta_1,4806,0.000685,0.01634,-0.152339,-0.004484,0.000105,0.00489,0.161546,0.009199,0.000267,0.016353,0.993185
delta_2,4806,-0.00039,0.014958,-0.122574,-0.004973,-0.000209,0.004159,0.184779,0.008755,0.000224,0.014962,0.993885



=== Delta: Model AAD vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
delta_0,4806,-0.000385,0.014825,-0.155182,-0.005033,-0.000297,0.004435,0.11164,0.008801,0.00022,0.014829,0.993985
delta_1,4806,0.000685,0.01634,-0.152339,-0.004483,0.000105,0.00489,0.161543,0.009199,0.000267,0.016353,0.993185
delta_2,4806,-0.00039,0.014958,-0.122574,-0.004973,-0.000209,0.004159,0.18478,0.008755,0.000224,0.014962,0.993885



=== Vega: Model AAD vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
vega_0,4806,5.2e-05,0.023025,-0.177189,-0.009515,4.7e-05,0.008979,0.198514,0.014845,0.00053,0.023022,0.985453
vega_1,4806,0.000466,0.024213,-0.312705,-0.008494,0.000133,0.00897,0.249046,0.014796,0.000586,0.024215,0.982299
vega_2,4806,-0.000751,0.024789,-0.296425,-0.009284,-0.000253,0.008677,0.200021,0.015019,0.000615,0.024798,0.982617



=== Vega: Model AAD vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
vega_0,4806,5.2e-05,0.023025,-0.177189,-0.009515,4.7e-05,0.008979,0.19851,0.014845,0.00053,0.023022,0.985453
vega_1,4806,0.000466,0.024213,-0.312705,-0.008493,0.000133,0.00897,0.249045,0.014796,0.000586,0.024215,0.982299
vega_2,4806,-0.00075,0.024789,-0.296426,-0.009284,-0.000253,0.008677,0.200021,0.015019,0.000615,0.024798,0.982617



=== Delta: Model NN vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
delta_0,4806,0.000782,0.004059,-0.0585,-0.000192,0.00048,0.001433,0.108188,0.001932,1.7e-05,0.004133,0.999558
delta_1,4806,-0.000553,0.005777,-0.113954,-0.001832,-0.000571,0.00026,0.109329,0.002547,3.4e-05,0.005803,0.99914
delta_2,4806,0.000767,0.003752,-0.045523,-0.000206,0.000585,0.001585,0.061217,0.001995,1.5e-05,0.003829,0.999615



=== Delta: Model NN vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
delta_0,4806,0.000782,0.004059,-0.0585,-0.000192,0.00048,0.001433,0.108188,0.001932,1.7e-05,0.004133,0.999558
delta_1,4806,-0.000553,0.005777,-0.113954,-0.001832,-0.000571,0.00026,0.109329,0.002547,3.4e-05,0.005803,0.99914
delta_2,4806,0.000767,0.003752,-0.045524,-0.000205,0.000584,0.001585,0.061217,0.001995,1.5e-05,0.003829,0.999615



=== Vega: Model NN vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
vega_0,4806,0.001093,0.00536,-0.079406,-0.000852,0.000774,0.002787,0.115532,0.003029,3e-05,0.00547,0.99921
vega_1,4806,0.000312,0.006909,-0.20994,-0.001675,-0.000168,0.001805,0.16233,0.003273,4.8e-05,0.006915,0.998535
vega_2,4806,-0.000766,0.006298,-0.14359,-0.002406,-0.000534,0.001242,0.07115,0.003396,4e-05,0.006344,0.998876



=== Vega: Model NN vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,R2
vega_0,4806,0.001093,0.00536,-0.079406,-0.000852,0.000773,0.002787,0.115532,0.003029,3e-05,0.00547,0.99921
vega_1,4806,0.000312,0.006909,-0.209942,-0.001675,-0.000168,0.001805,0.16233,0.003273,4.8e-05,0.006915,0.998535
vega_2,4806,-0.000766,0.006298,-0.143589,-0.002406,-0.000534,0.001242,0.07115,0.003396,4e-05,0.006344,0.998876



=== PRICE: Model AAD vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,MAPE,sMAPE,MPE_bias,R2
price/k,4806,-2.115864e-09,0.002282,-0.017784,-0.001099,4.1e-05,0.001186,0.014469,0.001594,5e-06,0.002282,3.252345,0.06286,-0.930328,0.999903



=== PRICE: Model AAD vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,MAPE,sMAPE,MPE_bias,R2
price/k,4806,-2.115864e-09,0.002282,-0.017784,-0.001099,4.1e-05,0.001186,0.014469,0.001594,5e-06,0.002282,3.252345,0.06286,-0.930328,0.999903



=== PRICE: Model NN vs MC AAD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,MAPE,sMAPE,MPE_bias,R2
price/k,4806,0.004315,0.002367,-0.009787,0.003036,0.004169,0.005366,0.028609,0.0044,2.4e-05,0.004921,7.744963,0.112067,7.744622,0.999901



=== PRICE: Model NN vs MC FD ===


Unnamed: 0,count,mean_diff,std_diff,min_diff,25%,50%,75%,max_diff,MAE,MSE,RMSE,MAPE,sMAPE,MPE_bias,R2
price/k,4806,0.004315,0.002367,-0.009787,0.003036,0.004169,0.005366,0.028609,0.0044,2.4e-05,0.004921,7.744963,0.112067,7.744622,0.999901
