In [1]:
import warnings
warnings.filterwarnings('ignore')

<h1 style = "fontsize:400%;text-align:center;">QBUS3850: Time Series and Forecasting</h1>
<h2 style = "fontsize:300%;text-align:center;">Expected Shortfall</h2>
<h3 style = "fontsize:200%;text-align:center;">Lecture Notes</h3>

<h2 style = "fontsize:300%;text-align:center;">Problems with VaR</h2>

# Which is riskier?

  - Return of -2 with probability 0.005
  - Return of -1 with probability 0.005
  - Return of 0 with probability 0.99

  - Return of -100 with probability 0.005
  - Return of -1 with probability 0.005
  - Return of 0 with probability 0.99

What is the 1% VaR of each investment (define quantile as $q:\textrm{Pr}(R\leq q)=\alpha$)?

# Value at Risk

- One shortcoming of Value at Risk is that it only accounts for the *minimum* amount lost on 1 out of 100 (or 1 out of 20, 1 out of 1000, etc.) days.
- It does not account for how much could be lost when there is a violation.
- In the example above, both assets have the same VaR.

# Two more assets

- Consider two assets

  - Political Crisis: Return of -3 with probability 0.005
  - Financial Crisis: Return of -1 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

  - Political Crisis: Return of -1 with probability 0.005
  - Financial Crisis: Return of -9 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

What is the 1% VaR of each investment

# Portfolio

- Imagine you invet in both assets ($X+Y$).

  - Political Crisis: Return of -4 with probability 0.005
  - Financial Crisis: Return of -10 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

What is the 1% VaR of the sum of both assets?

Does this make sense when compared to the results for the single assets?

# Sub-additivity

- Diversification implies that portfolios should be less risky.
- For a risk measure $\rho$ and two assets $X$ and $Y$ this implies that
$$\rho(X)+\rho(Y)\geq \rho(X+Y)$$
- A portfolio should be at least as risky (or less risky) than the sum of riskiness of each asset alone.
- The previous examples show that Value at Risk **may not** be subadditive.
- This does not mean that there are not cases where VaR is subadditive (e.g. normality).

<h2 style = "fontsize:300%;text-align:center;">Expected Shortfall</h2>

# Expected Shortfall

- Expected Shortfall is given by.

$$ES^{(\alpha)}(X)=\frac{1}{\alpha}\int_0^\alpha VaR^{(u)}du$$

- This is the same as the expected loss given that there is a violation, i.e.

$$ES^{(\alpha)}(X)=E(X|X<VaR^{(\alpha)})$$

- This is also called conditional VaR, tail VaR, tail expectation or tail risk.

# Which is riskier?

  - Return of -2 with probability 0.005
  - Return of -1 with probability 0.005
  - Return of 0 with probability 0.99

  - Return of -100 with probability 0.005
  - Return of -1 with probability 0.005
  - Return of 0 with probability 0.99

What is the 1% ES of each investment?

# Two more assets

- Consider the same two assets as before

  - Political Crisis: Return of -3 with probability 0.005
  - Financial Crisis: Return of -1 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

  - Political Crisis: Return of -1 with probability 0.005
  - Financial Crisis: Return of -9 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

What is the 1% ES of each investment?

# Portfolio

- Imagine you invest in both assets ($X+Y$).

  - Political Crisis: Return of -4 with probability 0.005
  - Financial Crisis: Return of -10 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

What is the 1% ES of the sum of both assets?

How does this compare to the sum of ES of the individual assets?

# Subadditivity of ES

- This one example is not a proof.
- However the subadditivity of ES has been rigorously proven.
- *Coherent* risk measures have a number of properties that make sense
  - Subadditivity is one of these
- ES also satisfies the other properties (as does VaR)
- For this reason Basel III (the successor to Basel II) prefers ES over VaR as a risk measure.

<h2 style = "fontsize:300%;text-align:center;">Problems with ES</h2>

# Elicitability of a risk measure

- It is necessary to evaluate whether forecasts of a risk measure are accurate.
- Recall for VaR we could use the check loss function
- Why could we use this?
- This requires an understanding of the role of scoring rules

# Scoring rule

- For a random variable $X$ and a 'forecast' $v$, a scoring rule is any function $S(v,x)$ that takes $x$ (the realised value of $X$) and $v$ as inputs and returns a single number.
- Smaller values of the 'score' indicate better forecasts.
- A good property for the scoring rule to have is
$$\rho=\underset{v}{argmin}E_X(S(v,x))$$
- Where $\rho$ is our risk measure.

# Why is  this a good property?

- Suppose $\rho$ is the true VaR (i.e. a quantile).
- The 'forecast' that gives the smallest score (on average) will give us the true value of the risk measure.
- In practice that we can evaluate the score over a rolling window
  - Methods with smaller values of the score are give something closer to the true value of the risk measure. 
- For VaR, one example of a scoring rule is the check loss (covered last week).

# ES is not elicitable

- It has been proven that ES in not elicitable.
- There is no objective function that can be minimised to get the 'true' ES value.
- Scores cannot be used to evaluate different ES forecasts.
- This is a weakness of ES.

# What to do?

- Let $r_t$ be returns and $ES^{(\alpha)}_{t|t-h}$ be $\alpha$ level ES forecasts.
- Compute $\xi_t=r_t/ES^{(\alpha)}_{t|t-h}$ using **only** violations.
- Do a t-test that the mean of $\xi_t=0$
- Do a t-test that the mean of $\xi_t/\hat{\sigma}_{t|t-h}=0$
- Use RMSE and MAD on $\xi_t=0$ and $\xi_t/\hat{\sigma}_{t|t-h}=0$ to compare different methods.
- The tests in particular are genearlly weak or based on invalid assumptions of independence.

# Joint elicitability

- Fissler and Ziegel, published a paper in 2016 showing that ES and VaR are jointly elicitable.
- The function needs to take ES, VaR and the value of returns as an input.
- The function is
$$S(v,e,x)=(I(x\leq v)-\alpha)(G_1(v)-G_1(x))+\frac{1}{\alpha}G_2(e)I(x<v)(v-x)+G_2(e)(e-v)-\mathcal{G}_2(e)$$

- where $G_1,G_2$ are monotonic and $\mathcal{G}'_2=g_2$.
- choices that work are $G_1(v)=v$ and $G_2(e)=\mathcal{G}_2(e)=exp(e)$

# In practice

- Compute VaR and ES for method A and method B
- Compute scores for both methods
- Carry out a Diebold Mariano test (a version of a two-sample t-test for serially correlated data) to test whether there are differences between the methods.

<h2 style = "fontsize:300%;text-align:center;">Forecasting ES</h2>