## Value-at-Risk: Forecast Evaluation

**Functions**

`sm.OLS`, `stats.bernoulli`

### Exercise 60
    
Compare this VaR to the HS VaR in the previous example.

import pandas as pd

In [None]:
import pandas as pd
# Load Data
sp500 = pd.read_hdf("./data/arch-data.h5", "sp500")        
eurusd = pd.read_hdf("./data/arch-data.h5", "eurusd")

sp500_returns = 100 * sp500.SP500.pct_change().dropna()
eurusd_returns = 100 * eurusd.DEXUSEU.pct_change().dropna()

with pd.HDFStore("./data/hs-var.h5", mode="r") as hdf:
    sp500_hs = hdf.get("sp500_var")
    eurusd_hs = hdf.get("eurusd_var")
with pd.HDFStore("./data/fhs-var.h5", mode="r") as hdf:
    sp500_fhs = hdf.get("sp500_var")
    eurusd_fhs = hdf.get("eurusd_var")

# Rename columns to distinguish
sp500_hs.columns = [c.replace("VaR","HS VaR") for c in sp500_hs.columns]
eurusd_hs.columns = [c.replace("VaR","HS VaR") for c in eurusd_hs.columns]

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("darkgrid")
plt.rc("figure", figsize=(16,12))
plt.rc("font", size=16)

sp500_var = pd.concat([sp500_hs, sp500_fhs],axis=1)
eurusd_var = pd.concat([eurusd_hs, eurusd_fhs], axis=1)

for h in (1,5,10):
    fig, axes = plt.subplots(2,1)
    cols =[f"{h}-day 5% VaR", f"{h}-day 5% HS VaR"]
    ax = sp500_var[cols].plot(ax=axes[0],legend=False)
    ax.set_title(f"Comparing {h}-day S&P 500 VaRs")
    ax.set_xlabel(None)
    ax.set_ylabel("% Value-at-Risk")
    ax.legend(frameon=False)
    ax.set_xlim(sp500_var.index.min(), sp500_var.index.max())
    
    ax = eurusd_var[cols].plot(ax=axes[1], legend=False)
    ax.set_title(f"Comparing {h}-day EUR/USD VaRs")
    ax.set_xlabel(None)
    ax.set_ylabel("% Value-at-Risk")
    ax.set_xlim(eurusd_var.index.min(), eurusd_var.index.max())
    ax.legend(frameon=False)
    fig.tight_layout(pad=1.0)

#### Explanation

The HS VaRs are very smooth while the FHS VaRs are more dynamic.  The FHS VaRs are mostly driven by changes in volatility. The dynamics in the EUR/USD data are substantially different with long-swings evident in volatility.

### Exercise 61
Evaluate the FHS and HS VaR forecasts constructed in the previous exercises using:

* HIT tests
* The Bernoulli test for unconditionally correct VaR
* Christoffersen’s test for conditionally correct VaR


In [None]:
# Construct HITS
cols =["1-day 5% VaR", f"1-day 5% HS VaR"]
combined = pd.concat([sp500_returns, sp500_var[cols]], axis=1).dropna()
combined.columns = ["ret","fhs","hs"]
hit_fhs = combined.ret < -combined.fhs
hit_hs = combined.ret < -combined.hs
hits = pd.DataFrame({"hit_fhs":hit_fhs,"hit_hs":hit_hs}).astype("float")
print(hits.mean())
print(hits.corr())
import numpy as np
temp = hits.replace(0.0,np.nan)
temp.iloc[:,0] += 0.05
temp.iloc[:,1] -= 0.05
temp.columns = ["Filtered HS", "Historical Simulation"]
plt.rc("figure", figsize=(16,6))
ax = temp.plot(marker="o",linestyle="none",legend=False, markersize=12)
ax.set_ylim(0.9,1.10)
ax.set_yticks([])
ax.set_ylabel("VaR Violations")
ax.set_xlabel(None)
ax.legend(frameon=False)

#### Explanation


We start by constructing the HITs.  These are computed by comparing -1 times the VaRs to the returns. We previously aligned the VaRs so we don't need to shift them here. 

We see that both produce fewer HITs than they should, and that the FHS is particularly bad. The are mildly correlated with about half of the HITs being observed in the same period.The HS violation appear to be clustered in 2016 and 2019. 

In [None]:
from scipy import stats
phat = hits.mean(0)
for col in hits:
    hit = hits[col]
    phat = hit.mean()
    llf = stats.bernoulli(phat).logpmf(hit).sum()
    llf0 = stats.bernoulli(0.05).logpmf(hit).sum()
    lr = 2 * (llf - llf0)
    pval = 1 - stats.chi2(1).cdf(lr)
    print(f"Method: {col} LR: {lr} P-value: {pval}")

#### Explanation
This is the simplest test and only requires evaluating the Bernoulli log-likelihood at the MLE ($\overline{HIT}$, the average) and at the value under the null (5%).  The test statistic is 2 times the difference in the two and has a $\chi^2_1$ distribution.  Both models reject correct specification.

In [None]:
for col in hits:
    hit = hits[col]
    hit_t = hit.shift(1)
    hit_tp1 = hit
    n00 = ((1-hit_t) * (1-hit_tp1)).sum()
    n10 = (hit_t * (1-hit_tp1)).sum() 
    n01 = ((1-hit_t) * hit_tp1).sum()
    n11 = (hit_t * hit_tp1).sum()
    
    p00_hat = n00 / (n00 + n01)
    p11_hat = n11 / (n11 + n10)
    p00 = p00_hat
    p11 = p11_hat
    llf = n00*np.log(p00) + n10 * np.log(1-p00) + n11 * np.log(p11) + n10 * np.log(1-p11)
    
    p11 = .05
    p00 = 1 - p11
    llf0 = n00*np.log(p00) + n10 * np.log(1-p00) + n11 * np.log(p11) + n10 * np.log(1-p11)
    
    lr = 2 * (llf - llf0)
    pval = 1 - stats.chi2(2).cdf(lr)
    print(f"Christoffersen's test, Method: {col} LR: {lr} P-value: {pval}")


#### Explanation
Christoffersen's test is also fairly simple.  We first construct the vectors of HITs at time t and t+1 to use the formula in the noted.  We then compute $n_{ij}$ for $i,j\in\{0,1\}$. These are then used to estimate the model parameters which allow the log-likelihood to be computed for both the MLE and the null.  Two times this difference is the test statistic which has a $\chi^2_2$ distribution.

In [None]:
import statsmodels.api as sm

results = {} 
for col in hits:
    hit = hits[col] - 0.05
    lags = [hit.shift(i+1) for i in range(5)]
    lags = pd.concat(lags,axis=1)
    var_col = "fhs" if "fhs" in col else "hs"
    var = combined[var_col]
    data = pd.concat([hit, var, lags], axis=1).dropna()
    y = data.iloc[:,0]
    x = sm.add_constant(data.iloc[:,1:])
    x.columns = ["const","var"] + [f"hit_L_{i}" for i in range(1,6)]
    res = sm.OLS(y, x).fit()
    r = np.eye(7)
    joint = res.wald_test(r)
    stat = np.squeeze(joint.statistic)
    results[col] = {"summary": res.summary()}
    results[col]["stat"] = f"Stat: {stat}, P-value: {joint.pvalue}"

In [None]:
results["hit_fhs"]["summary"]


In [None]:
print(results["hit_fhs"]["stat"])

In [None]:
results["hit_hs"]["summary"]

In [None]:
print(results["hit_hs"]["stat"])

#### Explanation
Finally we can run the dynamic quantile test (or HIT regression). The model estimated is

$$ HIT_{t+1} = \gamma_0 + \gamma_1 VaR_{t+1|t} + \sum_{i=1}^5 \gamma_{1+i} HIT_{t-i+1} + \epsilon_t+1 $$

We construct the lags using `shift`.  Both models are rejected.  The HS model appears to have substantial 
serial correlation while the FHS as serial correlation, the wrong level, and excess sensitivity to the 
forecast VaR.
    

In [None]:
# Construct HITS
cols =["10-day 5% VaR", f"10-day 5% HS VaR"]
rets_10 = 100 * sp500.SP500.pct_change(10)
combined = pd.concat([rets_10, sp500_var[cols]], axis=1).dropna()
combined.columns = ["ret","fhs","hs"]
hit_fhs = combined.ret < -combined.fhs
hit_hs = combined.ret < -combined.hs
hits = pd.DataFrame({"hit_fhs":hit_fhs,"hit_hs":hit_hs}).astype("float")
print("Mean HIT %")
print(hits.mean())
print("Correlation accross HITs")
print(hits.corr())
import numpy as np
temp = hits.replace(0.0,np.nan)
temp.iloc[:,0] += 0.05
temp.iloc[:,1] -= 0.05
temp.columns = ["Filtered HS", "Historical Simulation"]
ax = temp.plot(marker="o",linestyle="none",legend=False, markersize=12)
ax.set_ylim(0.9,1.10)
ax.set_yticks([])
ax.set_ylabel("VaR Violations")
ax.set_xlabel(None)
ax.legend(frameon=False)

#### Explanation

We construct the 10-day HITs using the VaR forecasts and the 10-day returns
computed using `pct_change(10)`. The forecasts are already aligned and
so the HITs are just violations. The plot shows that both models seem to have 
important problems.  The FHS model has a 2-year period with no HITs.  The HS
forecast has 2 distinct period with no HITs. 

Both models produce close to the correct number of HITs and the HITs are
fairly correlated accross the two models. 


In [None]:
horizon=10
for col in hits:
    hit = hits[col] - 0.05
    lags = [hit.shift(i+horizon) for i in range(5)]
    lags = pd.concat(lags,axis=1)
    var_col = "fhs" if "fhs" in col else "hs"
    var = combined[var_col]
    data = pd.concat([hit, var, lags], axis=1).dropna()
    y = data.iloc[:,0]
    x = sm.add_constant(data.iloc[:,1:])
    x.columns = ["const","var"] + [f"hit_L_{horizon+i}" for i in range(1,6)]
    bw = int(1.2 * y.shape[0] ** (2/5))
    res = sm.OLS(y, x).fit(cov_type="HAC", cov_kwds={"maxlags": bw})
    r = np.eye(7)
    joint = res.wald_test(r)
    stat = np.squeeze(joint.statistic)

    joint_ex = res.wald_test(r[1:,:])
    stat_ex = np.squeeze(joint_ex.statistic)

    results[col] = {"summary": res.summary()}
    results[col]["stat"] = f"Stat: {stat}, P-value: {joint.pvalue}"
    results[col]["stat ex"] = f"Stat ex. constant: {stat_ex}, P-value ex. constant: {joint_ex.pvalue}"


In [None]:
results["hit_fhs"]["summary"]

In [None]:
print(results["hit_fhs"]["stat"])


In [None]:
print(results["hit_fhs"]["stat ex"])

In [None]:
results["hit_hs"]["summary"]

In [None]:
print(results["hit_hs"]["stat"])

In [None]:
print(results["hit_hs"]["stat ex"])


#### Explanation

First note that the lagged hits start at 10 lags. This is necessary 
since everything on the right-hand side of the model must be known
at time $t$, and lags 1,...,9 only happen in the future. 

The tests use `wald_test` and restrict all coefficient to be zero so
that the loading matrix in $R\hat{\beta}$ is $I_k$ where $k$ is the number
of variables in the model. I also included a test that ignores the intercept
which uses the same $R$ excluding the first column. 

The regressions use Newey-West (Bartlett) covariance estimators since 
the data used to produce the HITs is overlapping.  

Both models are rejected in the complete test.  The FHS model appears to
have slightly less serial correlation althogh both test specifications 
reject the null that the VaR is correct. 