In [1]:
import warnings
warnings.filterwarnings('ignore')

<h1 style = "fontsize:400%;text-align:center;">QBUS3850: Time Series and Forecasting</h1>
<h2 style = "fontsize:300%;text-align:center;">Expected Shortfall</h2>
<h3 style = "fontsize:200%;text-align:center;">Lecture Notes</h3>

<h2 style = "fontsize:300%;text-align:center;">Problems with VaR</h2>

# Which is riskier?

  - Return of -2 with probability 0.005
  - Return of -1 with probability 0.005
  - Return of 0 with probability 0.99

  - Return of -100 with probability 0.005
  - Return of -1 with probability 0.005
  - Return of 0 with probability 0.99

What is the 1% VaR of each investment (define quantile as $q:\textrm{Pr}(R\leq q)=\alpha$)?

# Value at Risk

- One shortcoming of Value at Risk is that it only accounts for the *minimum* amount lost on 1 out of 100 (or 1 out of 20, 1 out of 1000, etc.) days.
- It does not account for how much could be lost when there is a violation.
- In the example above, both assets have the same VaR.

# Two more assets

- Consider two assets

  - Political Crisis: Return of -3 with probability 0.005
  - Financial Crisis: Return of -1 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

  - Political Crisis: Return of -1 with probability 0.005
  - Financial Crisis: Return of -9 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

What is the 1% VaR of each investment

# Portfolio

- Imagine you invest in both assets ($X+Y$).

  - Political Crisis: Return of -4 with probability 0.005
  - Financial Crisis: Return of -10 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

What is the 1% VaR of the sum of both assets?

Does this make sense when compared to the results for the single assets?

# Sub-additivity

- Diversification implies that portfolios should be less risky.
- For a risk measure $\rho$ and two assets $X$ and $Y$ this implies that
$$\rho(X)+\rho(Y)\geq \rho(X+Y)$$
- A portfolio should be at most as risky (or less risky) than the sum of riskiness of each asset alone.
- The previous examples show that Value at Risk **may not** be subadditive.
- This does not mean that there are not cases where VaR is subadditive (e.g. normality).

<h2 style = "fontsize:300%;text-align:center;">Expected Shortfall</h2>

# Expected Shortfall

- Expected Shortfall is given by.

$$ES^{(\alpha)}(X)=\frac{1}{\alpha}\int_0^\alpha VaR^{(u)}du$$

- This is the same as the expected loss given that there is a violation, i.e.

$$ES^{(\alpha)}(X)=E(X|X<VaR^{(\alpha)})$$

- This is also called conditional VaR, tail VaR, tail expectation or tail risk.

# Which is riskier?

  - Return of -2 with probability 0.005
  - Return of -1 with probability 0.005
  - Return of 0 with probability 0.99

  - Return of -100 with probability 0.005
  - Return of -1 with probability 0.005
  - Return of 0 with probability 0.99

What is the 1% ES of each investment?

# Two more assets

- Consider the same two assets as before

  - Political Crisis: Return of -3 with probability 0.005
  - Financial Crisis: Return of -1 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

  - Political Crisis: Return of -1 with probability 0.005
  - Financial Crisis: Return of -9 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

What is the 1% ES of each investment?

# Portfolio

- Imagine you invest in both assets ($X+Y$).

  - Political Crisis: Return of -4 with probability 0.005
  - Financial Crisis: Return of -10 with probability 0.005
  - Good economy: Return of 0 with probability 0.99

What is the 1% ES of the sum of both assets?

How does this compare to the sum of ES of the individual assets?

# Subadditivity of ES

- This one example is not a proof.
- However the subadditivity of ES has been rigorously proven.
- *Coherent* risk measures have a number of properties that follow commonsense
  - Subadditivity is one of these
- ES also satisfies the other properties (as does VaR)
- For this reason Basel III (the successor to Basel II) prefers ES over VaR as a risk measure.

<h2 style = "fontsize:300%;text-align:center;">Problems with ES</h2>

# Elicitability of a risk measure

- It is necessary to evaluate whether forecasts of a risk measure are accurate.
- Recall for VaR we could use the check loss function
- Why could we use this?
- This requires an understanding of the role of scoring rules

# Scoring rule

- For a random variable $X$ and a 'forecast' $v$, a scoring rule is any function $S(v,x)$ that takes $x$ (the realised value of $X$) and $v$ as inputs and returns a single number.
- Smaller values of the 'score' indicate better forecasts.
- A good property for the scoring rule to have is
$$\rho=\underset{v}{argmin}E_X(S(v,x))$$
- Where $\rho$ is our risk measure.

# Why is  this a good property?

- Suppose $\rho$ is the true VaR (i.e. a quantile).
- The 'forecast' that gives the smallest score (on average) will give us the true value of the risk measure.
- In practice we can evaluate the score over a rolling window
  - Methods with smaller average values of the score are give something closer to the true value of the risk measure. 
- For VaR, one example of a scoring rule is the check loss (covered last week).
- The check loss makes the VaR an *elicitable* risk measure.

# ES is not elicitable

- It has been proven that ES in not elicitable.
- There is no objective function that can be minimised to get the 'true' ES value.
- Even if we knew the true ES, there is not score that would be guaranteed to show that the true ES is better than some other ES.
- This is a weakness in evaluating different ES forecasts.


# What to do?

- Let $r_t$ be returns and $ES^{(\alpha)}_{t|t-h}$ be $\alpha$ level ES forecasts.
- Compute $\xi_t=r_t-ES^{(\alpha)}_{t|t-h}$ using **only** violations.
- Do a t-test that the mean of $\xi_t=0$
- Do a t-test that the mean of $\xi_t/\hat{\sigma}_{t|t-h}=0$
- Use RMSE and MAD on $\xi_t=0$ and $\xi_t/\hat{\sigma}_{t|t-h}=0$ to compare different methods.
- The tests in particular are generally weak or based on invalid assumptions of independence.

# Joint elicitability

- Fissler and Ziegel, published a paper in 2016 showing that ES and VaR are jointly elicitable.
- The function needs to take ES, VaR and the value of returns as an input.
- The function is
$$S(v,e,x)=(I(x\leq v)-\alpha)(G_1(v)-G_1(x))+\frac{1}{\alpha}G_2(e)I(x<v)(v-x)+G_2(e)(e-v)-\mathcal{G}_2(e)$$

- where $G_1,G_2$ are monotonic and $\mathcal{G}'_2=G_2$.
- choices that work are $G_1(v)=v$ and $G_2(e)=\mathcal{G}_2(e)=exp(e)$

# In practice

- Compute VaR and ES for method A and method B
- Compute scores for both methods
- Carry out a Diebold Mariano test (a version of a two-sample t-test for serially correlated data) to test whether there are differences between the methods.

<h2 style = "fontsize:300%;text-align:center;">Forecasting ES</h2>

# ES for 1-step ahead

- For 1-step ahead forecasts and for Gaussian errors the ES is given by

$$\hat{\mu}_{t|t-1}-\hat{\sigma}_{t|t-1}\frac{\phi(\Phi^{-1}(\alpha))}{\alpha}$$

- Here $\phi$ is the density (pdf) and $\Phi^{-1}(.)$ is the inverse cdf of a standard normal.
- For 1-step ahead forecasts and for t errors the ES is given by

$$\hat{\mu}_{t|t-1}-\hat{\sigma}_{t|t-1}\frac{t_{\nu}(T_{\nu}^{-1}(\alpha))}{\alpha}\frac{\nu + (T^{-1}_{\nu}(\alpha))^2}{\nu-1}\sqrt{\frac{\nu-2}{\nu}}$$

- Here $t_{\nu}$ is the pdf and $T_{\nu}^{-1}(.)$ is the inverse cdf of a t-distribution with $\nu$ df.

# ES for multi-step ahead

- These formulas no longer work for h-step ahead forecasts.
- Instead Monte Carlo simulation can be used
- Simulate a path of future values by
  - Simulate $\tilde{r}_{t+1}$ conditional on all observed $r_t$ 
  - Simulate $\tilde{r}_{t+2}$ conditional on all observed $r_t$ and $\tilde{r}_{t+1}$ as if it were $r_{t+1}$
  - Continue until simulating $\tilde{r}_{t+h}$
- Repeat this $B$ times

# ES for multi-step ahead

- Take the $B$ values of $\tilde{r}_{t+h}$
- Select only the lowest 5\% (or more generally $100\alpha$ \%) of values.
- Take the mean.
- This is a Monte Carlo estimate of ES
- Often $B$ needs to be big to ensure stable results.

# Import Data

In [2]:
import pandas as pd
import numpy as np
import scipy
bhp = pd.read_csv('BHP.AX.csv')
bhp['Date'] = pd.to_datetime(bhp['Date'])
ret=np.log(bhp['Close']).diff()[1:]
from arch import arch_model
Teval = 400
alpha = 0.05
B=1000
q = scipy.stats.norm.ppf(alpha)
e = scipy.stats.norm.pdf(q)/alpha
Actual = ret.tail(Teval)

# One step ahead

In [3]:
Sigma_GARCH = np.zeros(Teval)
VaR_GARCH = np.zeros(Teval)
ES_GARCH = np.zeros(Teval)
for j in range(Teval):
    ret_train = ret[:-Teval+j]
    garch = arch_model(ret_train,mean='Constant',vol='GARCH',p=1,q=1)
    garchfit=garch.fit(disp='off')
    fc=garchfit.forecast(horizon = 1)
    Sigma_GARCH[j] = np.sqrt(fc.variance['h.1'].tail(1))
    VaR_GARCH[j] = fc.mean['h.1'].tail(1) + Sigma_GARCH[j] * q
    ES_GARCH[j] = fc.mean['h.1'].tail(1) - Sigma_GARCH[j] * e

# One step ahead

In [4]:
xi = (Actual-ES_GARCH)[Actual<VaR_GARCH]
t = np.mean(xi)/(np.std(xi)/np.sqrt(len(xi)))
print(t)
scipy.stats.t.cdf(t,df = len(xi-1))

-1.5288859008430489


0.06996520957629498

Unable to reject validity of model at 5% level of significance

# One step ahead

In [5]:
xi = (Actual-ES_GARCH)[Actual<VaR_GARCH]/Sigma_GARCH[Actual<VaR_GARCH]
t = np.mean(xi)/(np.std(xi)/np.sqrt(len(xi)))
print(t)
scipy.stats.t.cdf(t,df = len(xi-1))

-1.6881217861873286


0.0524516144683017

Unable to reject validity of model at 5% level of significance

# Same for ARCH

In [6]:
Sigma_ARCH = np.zeros(Teval)
VaR_ARCH = np.zeros(Teval)
ES_ARCH = np.zeros(Teval)
for j in range(Teval):
    ret_train = ret[:-Teval+j]
    arch = arch_model(ret_train,mean='Constant',vol='ARCH',p=1)
    archfit=arch.fit(disp='off')
    fc=archfit.forecast(horizon = 1)
    Sigma_ARCH[j] = np.sqrt(fc.variance['h.1'].tail(1))
    VaR_ARCH[j] = fc.mean['h.1'].tail(1) + Sigma_ARCH[j] * q
    ES_ARCH[j] = fc.mean['h.1'].tail(1) - Sigma_ARCH[j] * e

# One step ahead

In [7]:
xi = (Actual-ES_ARCH)[Actual<VaR_ARCH]
t = np.mean(xi)/(np.std(xi)/np.sqrt(len(xi)))
print(t)
scipy.stats.t.cdf(t,df = len(xi-1))

-1.3475478416946887


0.09774509238987435

Unable to reject validity of model at 5% level of significance

# Fissler Ziegel

In [8]:
def fsscore(v,e,x,a):
    fs = ((v<=x)-a)*(v-x)+(1/a)*np.exp(e)*(v<=x)*(v-x)+np.exp(e)*(v-x)-np.exp(e)
    return fs
totfs_ARCH = 0.0
totfs_GARCH = 0.0
for j in range(Teval):
    totfs_ARCH += fsscore(VaR_ARCH[j],ES_ARCH[j],Actual.iloc[j],0.05)
    totfs_GARCH += fsscore(VaR_GARCH[j],ES_GARCH[j],Actual.iloc[j],0.05)
print(totfs_ARCH)
print(totfs_GARCH)

-642.6625960112602
-606.7872701205615


ARCH performs better for ES and VaR forecasts combined

# Multi-step ahead

In [9]:
B=1000
ES_GARCH10=np.zeros(Teval)
VaR_GARCH10=np.zeros(Teval)
for j in range(Teval):
    ret_train = ret[:-Teval+j]
    garch = arch_model(ret_train,mean='Constant',vol='GARCH',p=1,q=1)
    garchfit=garch.fit(disp='off')
    fc = garchfit.forecast(horizon=10,method='simulation',simulations=B)
    rtilde = fc.simulations.values[-1,:,9]
    var = np.percentile(rtilde,5)
    VaR_GARCH10[j]=var
    ES_GARCH10[j]=np.mean(rtilde[rtilde<=var])


# Evaluate

In [10]:
xi = (Actual-ES_GARCH10)[Actual<VaR_GARCH10]
t = np.mean(xi)/(np.std(xi)/np.sqrt(len(xi)))
print(t)
scipy.stats.t.cdf(t,df = len(xi-1))

-1.2846872555132873


0.10613216578260404

# Wrap up
- Expected Shorfall has many attractive properties relative to Value at Risk
- Basel III is currently being implemented by banks, so there is demand for people who know how to estimate it.
- The score for comparing different models for ES (the Fissler Ziegel score) is quite new in the literature.
- Work remains to be done for backtesting.