# 1. Theory and Background

Welcome to the **Deep Hedging Neural Network for Derivatives Pricing** project.  
This notebook lays the theoretical foundation for understanding how modern machine learning can be used to hedge financial derivatives under realistic market conditions.

> **Goal.** Learn a *data-driven hedging policy* that minimizes tail risk of the terminal hedging error under discrete rebalancing and proportional costs. We follow the “deep hedging” framework (policy = neural net, objective = risk of residual P&L).

---

## 1.1 Introduction

In modern finance, hedging refers to strategies that reduce the risk of adverse price movements in an asset. Traditionally, this is done using models like Black-Scholes, which assume idealized market conditions — no transaction costs, continuous trading, and perfect liquidity.

However, real markets are noisy, discontinuous, and contain frictions such as bid-ask spreads and slippage. In such settings, traditional analytic hedging methods underperform.  
**Deep Hedging** replaces handcrafted formulas with neural networks that learn optimal hedging strategies from data, taking market imperfections into account.

In this project we learn a hedging policy $π_θ$ mapping observable state $s_t$ to a position $a_t$ at each trading step, and we train it to minimize tail hedging losses (CVaR) under transaction costs and discrete re‑hedging.

---

## 1.2 Options & Derivatives Overview

Options are financial contracts whose value depends on the price of an underlying asset. They give the right, but not the obligation, to buy or sell an asset at a predetermined price.

### 1.2.1 Vanilla Options

- **Call Option**: Right to **buy** an asset at a strike price $( K )$ before expiry $( T )$.
- **Put Option**: Right to **sell** an asset at a strike price $( K )$ before expiry $( T )$.

European option payoffs at maturity:

$$
\text{Call:} \quad \max(S_T - K, 0)
$$

$$
\text{Put:} \quad \max(K - S_T, 0)
$$

Where $( S_T )$ is the underlying asset price at maturity.
### 1.2.2 Exotic Options

These include:
- **Barrier options**: Activated or extinguished when the price hits a barrier.
- **Asian options**: Payoff depends on the average price over time.
- **Lookback options**: Payoff depends on the maximum or minimum price.


---

## 1.3 Stochastic Processes & Market Models

Asset prices evolve stochastically — they follow random paths over time.

### 1.3.1 Brownian Motion

A continuous-time stochastic process used to model randomness in price movements.

Properties:
- $( B_0 = 0 )$
- Continuous paths
- Independent, normally distributed increments:

$$
B_t - B_s \sim \mathcal{N}(0, t - s)
$$

A common model for asset prices:

$$
dS_t = \mu S_t \, dt + \sigma S_t \, dB_t
$$

Where:
- $( \mu )$: drift (expected return)
- $( \sigma )$: volatility
- $( dB_t )$: increment of Brownian motion

The solution:

$$
S_t = S_0 \exp\left[\left(\mu - \frac{\sigma^2}{2}\right)t + \sigma B_t\right]
$$

### 1.3.3 The Black-Scholes Model

Assumes:
- Continuous trading
- No transaction costs
- Log-normal price distribution

The Black-Scholes price of a European call option:

$$
C = S_0 N(d_1) - K e^{-rT} N(d_2)
$$

Where:

$$
d_1 = \frac{\ln(S_0 / K) + (r + \frac{\sigma^2}{2})T}{\sigma \sqrt{T}}, \quad
d_2 = d_1 - \sigma \sqrt{T}
$$

### 1.3.4 Heston

We can optionally simulate *stohastic volatility* with heston:

$$
dv_t = \kappa (\theta - v_t) \, dt + \xi \sqrt{v_t} \, dW_t^v
$$

$$
dS_t = \mu S_t \, dt + \sqrt{v_t} S_t \, dW_t^S
$$

$$
dW_t^S \, dW_t^v = \rho \, dt
$$

Where:
- $v_t$ is the instantaneous variance at time $t$.
- $\kappa$ is the rate of mean reversion.
- $\theta$ is the long-term mean variance.
- $\xi$ is the volatility of volatility.
- $\mu$ is the drift rate of the asset.
- $\rho$ is the correlation between the Brownian motions $W_t^S$ and $W_t^v$.

In Discrete form:

$$
v_{t+1} = v_t + \kappa (\theta - v_t) \, \Delta t + \xi \sqrt{v_t} \, \sqrt{\Delta t} \, \eta_t
$$

$$
S_{t+1} = S_t \, \exp\left( \left( \mu - \frac{1}{2} v_t \right) \Delta t + \sqrt{v_t} \, \sqrt{\Delta t} \, \varepsilon_t \right)
$$

where:

$$
v_{t+1} = \max(v_{t+1}, 0)
$$

and

$$
\text{corr}(\varepsilon_t, \eta_t) = \rho
$$

also

$$
\varepsilon_t, \eta_t \sim \mathcal{N}(0,1)
$$
(Notebook 2 provides a `market_model = "gbm" | "heston"` switch.)

---

## 1.4 Hedging: Basic Concepts

**Hedging** is the act of reducing financial risk by taking an offsetting position in a related asset.

- **Static hedging**: Construct once and hold (e.g. buying protective puts).
- **Dynamic hedging**: Frequently adjust positions in response to price changes.

### Delta Hedging

Adjusting holdings of the underlying asset to neutralize small price movements:

$$
\Delta = \frac{\partial C}{\partial S}
$$

Limitations:
- Requires continuous trading
- Sensitive to volatility estimates
- Breaks down with transaction costs or jumps

### 1.4.1 Self‑financing with transaction costs

Let $a_t$ be the position after trading at time t and $\Delta a_t:= a_t - a_{t-1}$ (with $a_{t-1} = 0$).

Proportional trading cost at t is

$$
c_t(\Delta a_t) = \lambda |\Delta a_t|S_t.
$$

Cash update (assuming r = 0 for simplicity):
$$
C_t = C_{t-1} - \Delta a_t S_t - c_t(\Delta a_t),   V_t = C_t + a_t S_t
$$

At maturity,

$$
G = V_T - Z_T,     L:=-G,
$$

so positive L are hedging losses we want to control in the tail.

---

## 1.5 Deep Hedging: Core Idea

Deep Hedging is a data-driven framework that learns optimal trading (hedging) strategies for derivatives under **realistic market conditions** such as transaction costs, discrete time steps, and illiquid markets.

At each time step $( t )$, the agent observes the market state $( s_t )$, including:
- Current asset prices
- Volatility or other indicators
- Time remaining
- Any additional relevant information

It outputs an action $( a_t )$, the amount of each hedging asset to hold.

### Objective

Instead of replicating the payoff exactly, Deep Hedging **maximizes a utility function** of the final portfolio value. The total gain is:

$$
G(a) = Z_T + a \star H_T - C_T(a)
$$

Where:
- $( Z_T )$: option payoff at maturity, (e.g. $max⁡(ST−K,0)$ for a call)
- $( a \star H_T )$: accumulated P&L from trades
- $( C_T(a) )$: total transaction costs

The **policy** $( a(s_t) )$ is represented by a neural network with parameters $( \theta )$:

$$
a_t = a_\theta(s_t)
$$


We minimize the Conditional Value-at-Risk at level $a \in (0,1)$ applied to losses $L=-G$:

$$
CVaR_a(L) = min_{q \in \mathbb{R}}\{q + \frac{1}{1-a}\mathbb{E}[(L-q)_+]\}
$$

Here q acts like the $a$-VaR threshold. Focusing on CVaR trains the policy to reduce **worst‑case hedging shortfalls** under frictions.

---

## 1.6 Reinforcement Learning in Deep Hedging

This becomes a **policy optimization problem** in reinforcement learning.

- **Agent**: the neural network
- **State $( s_t )$**: current market info
- **Action $( a_t )$**: how to adjust hedge
- **Reward**: utility of terminal value minus trading cost

The final goal becomes:

$$
\max_\theta \; \mathbb{E}[U(G(a_\theta))]
$$

In our implementation we instantiate $U$ as tail‑risk control by directly minimizing $CVaR_a$ of $L=−G$ on simulated paths (see Notebook 3).

---

## 1.7) Setup & Notation

- Discrete trading dates: $(t = 0,1,\dots,T)$.
- Underlying price path: $(S_t)$.
- Payoff at maturity: $(Z_T)$ (here: European call with strike $(K)$).
- Trading policy (our model): positions $(\Delta_t \in [-H_{\max}, H_{\max}]$).
- Proportional transaction costs with rate $(\gamma)$ on turnover:
  $$
  \mathrm{Turnover}_t = |\Delta_t - \Delta_{t-1}|,\quad
  \mathrm{Cost} = \gamma \sum_{t=1}^{T} \mathrm{Turnover}_t \cdot S_t.
  $$
- Hedging P\&L:
  $$
  V_T \;=\; \sum_{t=0}^{T-1} \Delta_t\,(S_{t+1}-S_t)\;-\;\mathrm{Cost}.
  $$
- **Residual (hedging error)**:
  $$
  X \;=\; V_T - Z_T.
  $$
  Negative $(X)$ = shortfall (bad). Positive $(X)$ = surplus (not prioritized).

**Final knobs (used later):**
- Tail level $( \alpha = 0.90 )$
- MAE weight $( \beta = 0.01 )$ (small center penalty)
- Position bound $( H_{\max} = 5 )$
- Cost scale $( \gamma \in [5\!\times\!10^{-4},\,10^{-3}] )$ (we settle on $(0.0005)$–$(0.001)$ in runs)

---


## 1.8) Risk Objective (CVaR/RU Head)

We minimize the *tail* of $(-X)$ via CVaR. Using the Rockafellar–Uryasev formulation, for a *learned* threshold $(\tau)$,

$$
\ell(X;\tau) \;=\; \frac{(\tau - X)_+}{1-\alpha} - \tau,
\qquad
\text{and optimize } \mathbb{E}[\ell(X;\tau)].
$$

- This focuses learning on the worst \(1-\alpha\) fraction of outcomes (left tail of $(X)$).
- We add a **small** stabilization toward the center:
  $$
  \mathcal{L} \;=\; \underbrace{\mathbb{E}[\ell(X;\tau)]}_{\text{CVaR (tail)}}\;+\;\underbrace{\beta\,\mathbb{E}[|X|]}_{\text{small MAE term, }\beta=0.01}.
  $$

**What’s optimized vs. what’s monitored**
- **Optimized**: the CVaR surrogate (RU loss) + $(\beta)$·MAE.
- **Monitored** (for reporting): mean$(X)$, VaR/CVaR of $(X)$, variance/MAE of $(X), average turnover, average cost, baseline comparisons.

---

## 1.9) Data Generation & Features

- **Paths**: Simulated under Heston (stochastic volatility) with $(r=0)$ and fixed strike $(K)$ (baseline).
- **Splits**: Train / Validation / Test; each item is a *full path*.
- **Features** per time step (causal):
  1) **Price (normalized)**: $(S_t / S_0)$  
  2) **Time-to-maturity proxy**: $(\tau_t = (T-t)/T)$  
  3) **Lagged log return** (to keep causality): $( \log S_t - \log S_{t-1} )$  
  4) **Realized vol (annualized)** from a rolling window (e.g., 20 steps)

- **Labels**: We do *not* predict a label. The model outputs $(\Delta_t)$, loss is computed path-wise from $(X)$.

---

## 1.10) Policy Architecture (Hedger)

- Recurrent backbone to absorb path information and output *positions*:
  - GRU(32, return_sequences=True) → GRU(16, return_sequences=True) → Dense(32, ReLU)
  - Final 1-unit linear head → **tanh clamp** to $([-H_{\max}, H_{\max}])$ for positions $(\Delta_t)$.
- **Rollout inside the graph**:
  - Turnover $(= |\Delta_t - \Delta_{t-1}|)$ (with $(\Delta_{-1}=0)$).
  - P\&L $(= \sum \Delta_t,\Delta S_t)$.
  - Cost $(= \gamma \sum S_t \cdot \mathrm{Turnover}_t)$.
  - Residual $(X = V_T - Z_T)$.
  - RU loss $(\ell(X;\tau))$ + $(\beta |X|)$ added to the model via `add_loss`.

This makes the whole trading path **differentiable**, so gradients flow from the tail risk objective back into the policy.

---

## 1.11) Baselines & What “Good” Looks Like

- **Zero-hedge baseline**: $( \Delta_t \equiv 0 \Rightarrow X = -Z_T )$.

**Success criteria**:
- **Tail improvement**: VaR/CVaR of $(X)$ improves **materially** vs. baseline.
- **Cost/Turnover**: kept moderate (reflecting realistic frictions).
- **Stability**: similar improvements on validation and test sets.
- **Center error** (variance/MAE): may improve slightly; large gains usually require either extra assets (e.g., option hedge) or explicitly up-weighting center losses—*not the focus here*.

---


## 1.12) Project Plan (Notebooks)

1. **Notebook 1 — Theory & Background** *(this)*  
   Motivation, objective, data/feature design, risk metrics, and evaluation plan.

2. **Notebook 2 — Data & Simulation**  
   Generate Heston paths; build $((S, Z_T))$ for train/val/test; create causal features and per-step $(\Delta S_t)$.

3. **Notebook 3 — Model & Training**  
   - Build the hedger (GRU backbone + position clamp).  
   - In-graph rollout: P\&L, costs, residual $(X)$.  
   - RU head with learned $(\tau)$; training objective $(\mathbb{E}[\ell(X;\tau)] + \beta \mathbb{E}[|X|])$.  
   - Callbacks (plateau LR, early stop, checkpoint on best tail).

4. **Notebook 4 — Backtesting & Validation**  
   - Compute and compare: mean$(X)$, VaR/CVaR, variance, MAE, cost, turnover vs. baseline.  
   - Plots: histograms and tail overlays, empirical CDFs, turnover/cost diagnostics, and tail scatter/QQ.

---

## 1.13) Practical Choices (used in training)

- **ALPHA** $(=0.90)$: focus on the worst 10% of outcomes.  
- **BETA** $(=0.01)$: small encouragement toward center error control without sacrificing tail.  
- **H\_MAX** $(=5)$: realistic risk cap on position.  
- **GAMMA** $(= 0.0005 \text{ to } 0.001)$: proportional costs (10–20 bps round-trip on 1× notional per step is a rough interpretation).  
- **Batching** by *paths*, not by time steps (each batch = many independent paths).

---

## 1.14) Limitations & Extensions

- **Spot-only hedging** cannot neutralize gamma/vega; tail is the right objective.  
- To reduce MAE or variance *significantly*, add one option to the hedging set (policy outputs a vector), or up-weight center errors.  
- Extensions: regime mixes, jumps, robust losses, richer cost models/market impact, multi-asset portfolios, or historical data.

---


## 1.15 Literature Overview

### Bühler et al. – *Deep Hedging (2018)*

- Formalized the use of neural networks to hedge options in incomplete markets.
- Handles realistic frictions and trading costs.
- Trains on simulated paths via Monte Carlo.
- Shows strong improvement over delta hedging, especially when transaction costs are high.

Next: we simulate realistic price and volatility data to train our Deep Hedging agent.

## 1.16 Glossary

- $( S_t )$: Asset price at time $( t )$
- $( K )$: Strike price
- $( Z_T )$: Option payoff at maturity
- $( a_t )$: Hedge position after trading at time $( t )$
- $( s_t )$: State of the market
- $( C_T(a) )$: Total transaction costs
- $( a_\theta )$: Neural network with parameters $( \theta )$
- $( U )$: Utility function
- $(\Delta) a_t = a_t - a_{t-1}$: trade size at $t$
- $(\lambda):$ proportional transaction‑cost rate
- $(\tau_t):$ ): time‑to‑maturity $(T−t)\Delta t$
- $(G) = V_T - Z_T$: terminal hedging P&L
- $(L) = - G$: loss used in CVar

---

## 1.17 Further Reading

- Bühler et al., *Deep Hedging* (2018): https://arxiv.org/abs/1802.03042  
- TUM Slides (2022): https://ssrn.com/abstract=4151041  
- Hans Bühler – Deep Hedging Lectures: http://deep-hedging.com  