# 3. Deep Hedging Model

In this notebook, we implement and train a neural network that learns to hedge a European call option dynamically using simulated market data.

We treat this as a reinforcement learning-style problem, where the model (policy) observes the market, makes hedging decisions at each time step, and aims to replicate the option payoff as closely as possible.

---

## ðŸ“˜ 3.1 Neural Network Architecture Setup

We begin by defining the architecture of the hedging model.

### Objective:

The neural network takes as input a **market state vector** at each time step and outputs a **hedging decision** â€” specifically, the amount of the underlying asset to hold (hedge ratio).

This is a function:

$$
a_t = \pi_\theta(s_t)
$$

Where:
- $( s_t \in \mathbb{R}^4 )$ is the state at time $( t )$ (from our feature vector)
- $( \pi_\theta )$ is the policy network (a neural net with parameters $( \theta )$)
- $( a_t )$ is the action: the number of units of the underlying asset to hold

---

### Input Features

Each state $( s_t )$ includes:

1. **Normalized Price**: $( S_t / S_0 )$
2. **Time to Maturity**: $( (T - t)/T )$
3. **Log Return**: $( \log(S_t / S_{t-1}) )$
4. **Simulated Volatility**: $( \sigma_t )$

These are taken from the training dataset constructed in Notebook 2.

---

### Neural Network Architecture

We use a simple feedforward (fully connected) neural network applied independently at each time step:

- **Input layer**: 4 features
- **Hidden layers**: 2â€“3 layers with ReLU activation
- **Output layer**: 1 value per time step (hedge ratio)
- **Output activation**: Linear (no constraint on hedge ratio)

This network learns a function from observed market conditions to trading decisions.

---

### Time Step Independence

Note: We apply the network **separately at each time step**, treating each $( s_t )$ as independent.  
This is sufficient for vanilla deep hedging and simplifies training compared to using an RNN or transformer.

---

### Summary

We now define the neural network and prepare it to learn a mapping:

$$
s_t \mapsto a_t \quad \text{for each time step } t \in \{0, \ldots, T-1\}
$$

In the next section, we simulate the portfolio evolution from these decisions and define the training objective based on hedging performance.

In [1]:
# import sys, torch, platform, os, site, json, textwrap, subprocess, pathlib, gc
# import torch.nn as nn
# import torch.nn.functional as F
# from tqdm import trange
# import time
# from IPython.display import clear_output
# import math

In [2]:
# torch.cuda.empty_cache();  gc.collect()

20

In [1]:
import numpy as np

data = np.load("data/deep_hedging_data.npz")
X_train = data["X_train"]            # (8000, T, 4)
Y_train = data["Y_train"]            # (8000,)
X_test  = data["X_test"]             # (2000, T, 4)
Y_test  = data["Y_test"]             # (2000,)

In [4]:
import torch, math, time, os, numpy as np
from torch import nn
from IPython.display import clear_output

# ------------------------------------------------------------
# 0.  Load tensors (already have X_train, Y_train, X_test, Y_test in RAM)
#     If not, load them here with np.load(...)
# ------------------------------------------------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

X_train = torch.tensor(X_train, dtype=torch.float32, device=device)
Y_train = torch.tensor(Y_train, dtype=torch.float32, device=device).squeeze(-1)
X_test  = torch.tensor(X_test , dtype=torch.float32, device=device)
Y_test  = torch.tensor(Y_test , dtype=torch.float32, device=device).squeeze(-1)

N_train, T, _ = X_train.shape        # 8000 Ã— T Ã— 4
N_test  = X_test.shape[0]

# quick imbalance print
zero_pct = (Y_train == 0).float().mean().item() * 100
print(f"{zero_pct:.1f}% of training pay-offs are exactly zero")

# ------------------------------------------------------------
# Build spot matrices from feature-0 (normalised price)
# ------------------------------------------------------------
S0   = 100.0                         # same S0 you used in Notebook 2
S_t  = X_train[:, :, 0] * S0         # [8000, T]  â€” already on GPU
S_te = X_test  [:, :, 0] * S0        # [2000, T]

# ------------------------------------------------------------
# 1.  GRU hedge network
# ------------------------------------------------------------
class HedgingGRU(nn.Module):
    def __init__(self, input_dim=4, hidden_dim=64):
        super().__init__()
        self.gru = nn.GRU(input_dim, hidden_dim, batch_first=True)
        self.out = nn.Linear(hidden_dim, 1)
    def forward(self, x):            # x: [N, T, 4]
        h, _ = self.gru(x)
        return self.out(h).squeeze(-1)   # [N, T]

model = HedgingGRU().to(device)

# ------------------------------------------------------------
# 2.  Optimiser & scheduler
# ------------------------------------------------------------
optimizer = torch.optim.Adam(model.parameters(), lr=3e-3)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
              optimizer, factor=0.5, patience=20, verbose=False)

n_epochs = 300
tc       = 0.0
alpha    = 0.9
logfile  = open("training_log.txt", "w", encoding="utf-8")

def progress_bar(i, total, bar_len=40):
    pct  = i / total
    fill = "â–ˆ" * int(bar_len * pct)
    bar  = f"[{fill:<{bar_len}}] {pct:6.1%}  ({i}/{total})"
    clear_output(wait=True); print(bar)

# ------------------------------------------------------------
# 3.  Training loop
# ------------------------------------------------------------
for epoch in range(n_epochs):
    optimizer.zero_grad()

    # hedge ratios
    H   = model(X_train)                                # [N, T]
    dH  = torch.diff(H, 1, prepend=torch.zeros_like(H[:, :1]))

    trade_vol = dH.abs() * S_t
    cash      = (-dH * S_t - tc * trade_vol).sum(1)
    V_T       = cash + H[:, -1] * S_t[:, -1]            # [N]

        # --- ITM-CVaR(Î±)  +  Î²Â·MSE (to penalise overshoot) ----------------
    itm   = Y_train > 0
    itm_err = (V_T - Y_train)[itm]        # P&L error on ITM paths

    beta = 0.10                           # 0.01â€“0.10 works; start with 0.05

    if itm_err.numel():
        # 1) CVaR part (same as before)
        short = torch.clamp(itm_err, max=0)               # only short-falls
        k     = max(int((1 - alpha) * itm_err.numel()), 1)
        VaR, _ = torch.kthvalue(short, k)
        cvar   = short[short <= VaR].mean()               # â‰¤ 0
        # 2) symmetric MSE part (penalise overshoot too)
        mse_itm = (itm_err**2).mean()
        # 3) combined objective
        loss = -cvar + beta * mse_itm                     # minimise both
    else:
        loss = torch.tensor(0.0, device=device)


    loss.backward()
    optimizer.step()
    scheduler.step(loss)
    
    if epoch % 10 == 0:
        msg = (f"epoch {epoch:4d}  -CVaR {(-loss).item():10.4f}  "
               f"lr {optimizer.param_groups[0]['lr']:.1e}")
        logfile.write(msg + "\n"); logfile.flush()
        print(msg)
    progress_bar(epoch + 1, n_epochs)

logfile.close()

# ------------------------------------------------------------
# 4.  TEST EVALUATION & SAVE
# ------------------------------------------------------------
start = time.time()
with torch.no_grad():
    H_te   = model(X_test)                               # [2000, T]
    dH_te  = torch.diff(H_te, 1, prepend=torch.zeros_like(H_te[:, :1]))
    trade_vol = dH_te.abs() * S_te
    cash   = (-dH_te * S_te - tc * trade_vol).sum(1)
    V_T_te = cash + H_te[:, -1] * S_te[:, -1]            # [2000]

os.makedirs("results", exist_ok=True)
np.savez_compressed(
    "results/hedging_eval_test.npz",
    V_T = V_T_te.cpu().numpy().astype(np.float32),
    Z_T = Y_test.cpu().numpy().astype(np.float32)
)
print(f"\nSaved results to results/hedging_eval_test.npz  "
      f"(shape: {V_T_te.shape})  in {time.time()-start:.1f}s")


[â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ] 100.0%  (300/300)

Saved results to results/hedging_eval_test.npz  (shape: torch.Size([2000]))  in 0.2s


In [5]:
os.makedirs("models", exist_ok=True)
torch.save(model.state_dict(), "models/gru_beta010_final.pt")
print("âœ“  Saved weights to  models/gru_beta010_final.pt")

âœ“  Saved weights to  models/gru_beta010_final.pt
