# Objective Function

## Mean Variance Risk Measurement

When formulating DRL hedging problem, our target is to optimize the distribution of terminal hedged P&L based on our risk appetite. The risk appetite and risk tolerance are presented in terms of objective function. The most common risk measurement forumlation is the mean-variance as:
$$F(PnL_T)=E[PnL_T]-c\times Std[PnL_T]$$  
Here $PnL_T$ is the accumulated daily $PnL_i$ as $\sum_{i=0}^T(PnL_i)$
The coefficient $c$ represents the risk aversion. 

If we assume the distribution of $PnL_T$ follows Normal (of course it is not likely in reality), then mean-variance $F(PnL_T)$ can represent all kinds of different distribution statistics by fine-tuning value $c$. The following table presents the $c$ value and its corresponding $PnL_T$ distribution statics:

| Normal Distribution Statistic | $c$ value |
|---|:---:|
| 95VaR | 1.644854 |
| 99VaR | 2.326348 |
| 95CVaR | 2.062713 |
| 99CVaR | 2.665214 |

## Test With Different $c$

In this test, we use the usual test case:
- BSM market (r=2%, drift=10%, sigma=30%)
- 3M ATM Call
- 0.5% transaction cost for stock trading

Through observing $PnL_T$, we find it is close to Normal distribution. Then we change $c$ in objective function to represent different risk measurement, and test the performance of DRL agent under different risk appetites. The following table shows the test results.  

| Objective Risk Measurement | Mean | S.D. | 95VaR | 99VaR | 95CVaR | 99CVaR |
|---|:---:|:---:|:---:|:---:|:---:|:---:|
| Greek Hedging | -13.92 | 10.48 | -32.46 | -44.86 | -39.68 | -50.79 |
| 95VaR | **-11.89** | 9.79 | **-29.83** | -43.56 | -38.40 | -51.17 |
| 99VaR | -12.41 | 9.99 | -30.65 | -42.78 | -38.28 | -49.81 |
| 95CVaR | -12.45 | 9.76 | -30.35 | **-42.08** | **-38.09** | -50.20 |
| 99CVaR | -16.26 | **8.78** | -32.18 | -43.17 | -38.55 | **-48.37** |

## Analysis

The best value of each $PnL_T$ distribution statistic is highlighted in the table. As we can see, the value choice of $c$ is quite accurate for corresponding objective risk measurement. Only DRL agent trained with 99VaR objective risk measurement cannot beat the performance of DRL agent trained with 95CVaR, where the c values are close (2.06 vs 2.33) in these two cases. 

The way to fine-tune $c$ in mean-variance risk measurement to represent our risk appetite (i.e. VaR, and CVaR) really depends on the assumption that terminal PnL Normally distributed with our optimal hedging strategy. It works under BSM market assumption, however not assures to work under other underlying dynamic assumptions and *Real* market. Alternatively, we can use some other parametric disrtibution (i.e. t-distribution), whose quantiles and expected shortfalls are also closed with the distribution parameters, to estimate terminal PnL in critic torso. But no matter how, the approach relies on an assumption of terminal PnL distribution. 

The best way is to estimate the terminal PnL's quantiles in an empirical fitting for inverse CDF (i.e. QR-DQN - Quantile Regression loss proposed by Dabney et. 2018), so that we can get rid of the parametric P&L distribution assumption. For expected shortfalls measurement, we can use a distorted embedding layer in critic torso to modify the iCDF (i.e. Implicit Quantile Networks - IQN proposed by Dabney et. 2018).