# 3. Deep Hedging Model

In this notebook, we implement and train a neural network that learns to hedge a European call option dynamically using simulated market data.

We treat this as a reinforcement learning-style problem, where the model (policy) observes the market, makes hedging decisions at each time step, and aims to replicate the option payoff as closely as possible.

---

## 📘 3.1 Neural Network Architecture Setup

We begin by defining the architecture of the hedging model.

### Objective:

The neural network takes as input a **market state vector** at each time step and outputs a **hedging decision** — specifically, the amount of the underlying asset to hold (hedge ratio).

This is a function:

$$
a_t = \pi_\theta(s_t)
$$

Where:
- $( s_t \in \mathbb{R}^4 )$ is the state at time $( t )$ (from our feature vector)
- $( \pi_\theta )$ is the policy network (a neural net with parameters $( \theta )$)
- $( a_t )$ is the action: the number of units of the underlying asset to hold

---

### Input Features

Each state $( s_t )$ includes:

1. **Normalized Price**: $( S_t / S_0 )$
2. **Time to Maturity**: $( (T - t)/T )$
3. **Log Return**: $( \log(S_t / S_{t-1}) )$
4. **Simulated Volatility**: $( \sigma_t )$

These are taken from the training dataset constructed in Notebook 2.

---

### Neural Network Architecture

We use a simple feedforward (fully connected) neural network applied independently at each time step:

- **Input layer**: 4 features
- **Hidden layers**: 2–3 layers with ReLU activation
- **Output layer**: 1 value per time step (hedge ratio)
- **Output activation**: Linear (no constraint on hedge ratio)

This network learns a function from observed market conditions to trading decisions.

---

### Time Step Independence

Note: We apply the network **separately at each time step**, treating each $( s_t )$ as independent.  
This is sufficient for vanilla deep hedging and simplifies training compared to using an RNN or transformer.

---

### Summary

We now define the neural network and prepare it to learn a mapping:

$$
s_t \mapsto a_t \quad \text{for each time step } t \in \{0, \ldots, T-1\}
$$

In the next section, we simulate the portfolio evolution from these decisions and define the training objective based on hedging performance.

In [2]:
import numpy as np

data = np.load("data/deep_hedging_data.npz")
X_train = data["X_train"]
X_test = data["X_test"]
Y_train = data["Y_train"]
Y_test = data["Y_test"]