# 3. Deep Hedging Model

In this notebook, we implement and train a neural network that learns to hedge a European call option dynamically using simulated market data.

We treat this as a reinforcement learning-style problem, where the model (policy) observes the market, makes hedging decisions at each time step, and aims to replicate the option payoff as closely as possible.

---

## 📘 3.1 Neural Network Architecture Setup

We begin by defining the architecture of the hedging model.

### Objective:

The neural network takes as input a **market state vector** at each time step and outputs a **hedging decision** — specifically, the amount of the underlying asset to hold (hedge ratio).

This is a function:

$$
a_t = \pi_\theta(s_t)
$$

Where:
- $( s_t \in \mathbb{R}^4 )$ is the state at time $( t )$ (from our feature vector)
- $( \pi_\theta )$ is the policy network (a neural net with parameters $( \theta )$)
- $( a_t )$ is the action: the number of units of the underlying asset to hold

---

### Input Features

Each state $( s_t )$ includes:

1. **Normalized Price**: $( S_t / S_0 )$
2. **Time to Maturity**: $( (T - t)/T )$
3. **Log Return**: $( \log(S_t / S_{t-1}) )$
4. **Simulated Volatility**: $( \sigma_t )$

These are taken from the training dataset constructed in Notebook 2.

---

### Neural Network Architecture

We use a simple feedforward (fully connected) neural network applied independently at each time step:

- **Input layer**: 4 features
- **Hidden layers**: 2–3 layers with ReLU activation
- **Output layer**: 1 value per time step (hedge ratio)
- **Output activation**: Linear (no constraint on hedge ratio)

This network learns a function from observed market conditions to trading decisions.

---

### Time Step Independence

Note: We apply the network **separately at each time step**, treating each $( s_t )$ as independent.  
This is sufficient for vanilla deep hedging and simplifies training compared to using an RNN or transformer.

---

### Summary

We now define the neural network and prepare it to learn a mapping:

$$
s_t \mapsto a_t \quad \text{for each time step } t \in \{0, \ldots, T-1\}
$$

In the next section, we simulate the portfolio evolution from these decisions and define the training objective based on hedging performance.

In [1]:
import numpy as np

price_paths = np.load("price_paths.npy")
data = np.load("data/deep_hedging_data.npz")
X_train = data["X_train"]
X_test = data["X_test"]
Y_train = data["Y_train"]
Y_test = data["Y_test"]

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [3]:
# Convert your training and test sets
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
Y_train_tensor = torch.tensor(Y_train, dtype=torch.float32)

X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
Y_test_tensor = torch.tensor(Y_test, dtype=torch.float32)

In [4]:
class HedgingNN(nn.Module):   # Inherit from PyTorch's neural net base
    def __init__(self):       # This runs ONCE when you create the model
        super().__init__()
        
        # Define the layers here: 
        self.fc1 = nn.Linear(4, 64)
        self.fc2 = nn.Linear(64, 32)  
        self.out = nn.Linear(32, 1)   

    def forward(self, x):     # This runs EVERY TIME you pass data in
        # Define the sequence of layer operations
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.out(x)
        return x

In [5]:
from tqdm import trange
import sys

In [None]:
# Model
model = HedgingNN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
n_epochs = 100

cash = 0.0
prev_hedge = 0.0
transaction_cost_rate = 0.001

# Training loop
for epoch in trange(n_epochs, desc="Training Epochs"):
    total_loss = torch.tensor(0.0, dtype=torch.float32)

    for i in trange(X_train_tensor.shape[0]):
        cash = torch.tensor(0.0, dtype=torch.float32)
        prev_hedge = torch.tensor(0.0, dtype=torch.float32)
    
        for t in range(X_train_tensor.shape[1]):
            state = X_train_tensor[i, t]
            hedge_ratio = model(state)
    
            S_t = torch.tensor(price_paths[i][t], dtype=torch.float32)
    
            delta_hedge = hedge_ratio - prev_hedge
            trade_volume = abs(delta_hedge) * S_t
            transaction_cost = transaction_cost_rate * trade_volume
    
            cash = cash - delta_hedge * S_t - transaction_cost
            prev_hedge = hedge_ratio
    
        S_T = torch.tensor(price_paths[i, -1], dtype=torch.float32)
        V_T = cash + prev_hedge * S_T
        Z_T = Y_train_tensor[i]

        loss_i = ((V_T - Z_T) ** 2).squeeze()
        total_loss = total_loss + loss_i


    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()

Training Epochs:   0%|                                | 0/100 [00:00<?, ?it/s]
  0%|                                                | 0/8000 [00:00<?, ?it/s][A
  0%|                                        | 3/8000 [00:00<04:32, 29.33it/s][A
  0%|                                        | 7/8000 [00:00<04:23, 30.37it/s][A
  0%|                                       | 11/8000 [00:00<04:20, 30.71it/s][A
  0%|                                       | 15/8000 [00:00<04:20, 30.60it/s][A
  0%|                                       | 19/8000 [00:00<04:20, 30.64it/s][A
  0%|                                       | 23/8000 [00:00<04:19, 30.78it/s][A
  0%|▏                                      | 27/8000 [00:00<04:17, 30.92it/s][A
  0%|▏                                      | 31/8000 [00:01<04:17, 30.95it/s][A
  0%|▏                                      | 35/8000 [00:01<04:17, 30.92it/s][A
  0%|▏                                      | 39/8000 [00:01<04:17, 30.94it/s][A
  1%|▏             