# __Tests of Superior Predictive Ability__

---

<br>

__DATA 5610__ <br>
Author:      Tyler J. Brough <br>
Last Update: April 19, 2022 <br>

---

<br>

---

__Notice__

* These notes follow the notes by Kevin Sheppard very closely

* See also:
    - White, _Econometrica_ 2000
    - Sullivan, Timmermann, and White, _Journal of Finance_ 1999

---

## __Overview__

* Multiple hypothesis testing
    - White's Reality Check (RC) and Hansen's Superior Predictive Test (SPA)
    - StepM procedure (we won't cover)
    - Model Confidence Set
    - False Discovery Rate Control
 
* Bayesian methods
    - ROPE methods (Kruschke, Kuhn)
    - Bandit models approaches (Reinforcement Learning approach)

<br>

## __White's Reality Check__

* The _Reality Check_ extends DMW to testing for _Superior Predictive Ability_ (SPA)

* Tests of SPA examine whether or not a set of predictive models can outperform a benchmark model

* Suppose forecasts were available for $m$ forecasts $j = 1, \ldots, m$

* The vector of loss differentials _relative to a benchmark_ could be constructed

<br>

$$
{\large \delta_{t} = \begin{bmatrix}
                      L(y_{t+h}, \hat{y}_{t+h,BM|t}) - L(y_{t+h}, \hat{y}_{t+h,1|t}) \\
                      L(y_{t+h}, \hat{y}_{t+h,BM|t}) - L(y_{t+h}, \hat{y}_{t+h,2|t}) \\
                      \vdots \\
                      L(y_{t+h}, \hat{y}_{t+h,BM|t}) - L(y_{t+h}, \hat{y}_{y+t,m|t}) \\
                     \end{bmatrix}}
$$

<br>

* $\hat{y}_{t+h, BM|t}$ is the loss from the _benchmark forecast_

<br>

## __Implementing the Reality Check__

* The Reality Check is implemented using the $P$ by $m$ matrix of loss differentials
    - $P$ out-of-sample periods
    - $m$ models
    
* The original article describes two methods
    - Monte Carlo Reality Check
    - Bootstrap Reality Check
    
* In practice, only the Bootstrap Reality Check is used

* The distribution of the _maximum_ of normals is not normal, and so only the percentile method is applicable

<br>

### __The Algorithm (Bootstrap Reality Check)__

---

1. _Compute_ $\quad T^{RC} = \max{(\bar{\delta})}$ 

2. For $b = 1, \ldots, B$ _re-sample the vector of loss differentials_ $\mathbf{\delta}_{t}$ _to construct a bootstrap sample_ $\{\mathbf{\delta}_{b,t}^{\ast} \}$ _using the stationary bootstrap_

3. _Using the bootstrap sample, compute_

<br>

$$
{\large T_{b}^{\ast RC} = \max{\left( \frac{1}{P} \sum\limits_{t=R+1}^{T} (\mathbf{\delta}_{b,t}^{\ast} - \mathbf{\bar{\delta}}) \right) }}
$$

<br>

4. _Compute the Reality Check p-value as the percentage of the bootstrapped maxima which are larger than the sample maximum_

<br>

$$
{\large p-\mbox{value} = \frac{1}{B} \sum\limits_{b=1}^{B} I[T_{b}^{\ast RC} > T^{RC}]}
$$

---

## __Intuition__

* The boostrap means are like draws (simulation) from the asymptotic distribution $N(\mathbf{0}, \mathbf{\Sigma})$

* Taking the maximum of these draws simulates the distribution of a set of correlated normals

* Each bootstrap mean is centered at the sample mean
    - This is known as using the _Least Favorable Configuration_ (LFC) point
    - Simulation is done assuming any model could be as good as the benchmark
    
* Since the asymptotic distribution can be simulated, asymptotic critical values and p-values can be constructed directly

* The Monte Carlo Reality Check works by first estimating $\Sigma$ using a HAC estimator, and then simulating random normals directly
    - MCRC is equivalent to BRC, only requires estimating:
        - A potentially large covariance if $m$ is big
        - The Choleski decomposition of this covariance matrix
        - B drawn from this Choleski
    - In practice, $m$ may be so large that the covariance matrix won't fit in a normal computer's memory
    
<br>

## __Hansen's Test of SPA__

* Hansen was White's doctoral student at UCSD

* Hansen (2005, JBES) provided two refinements of the RC
    1. Studentized loss differentials
    2. Omission of very bad models from the distribution of the test statistic
    
* From a practical point-of-view, the first a very important consideration

* From a theoretical point-of-view, the seocond is the important issue
    - The second can be ignored if no models are very poor
    - This may be difficult if using automated model generation schemes
    
<br>

## __Studentization of Loss Differentials__

* The RC uses the loss differentials directly

* This can lead to a loss of power if there is a large amount of cross-sectional heteroskedasticity

* Bad, high variance model can mask a good, low variance model

* The solution is to use the Studentized loss differential

* The test statistic is based on

<br>

$$
{\large T^{SPA} = \max_{j=1,\ldots,m}{ \left( \frac{\bar{\delta}_{j}}{\sqrt{\hat{\omega}_{j}^{2} / P}} \right)}}
$$

<br>

* $\hat{\omega}_{j}^{2}$ is an estimator of the asymptotic (long-run) variance of $\bar{\delta}_{j}$

<br>

$$
{\large \hat{\omega}_{j}^{2} = \hat{\gamma}_{j,0} + 2 \sum_{i=1}^{P-1} k_{i} \hat{\gamma}_{j,i}}
$$

* $\phantom{ }$
    - $\hat{\gamma}_{j,i}$ is the $i^{th}$ sample autocovariance of the sequence $\{\delta_{j,t}\}$
    - $k_{i} = \frac{P-i}{P} \left(1 - \frac{1}{w}\right)^{i} + \frac{i}{P} \left(1 - \frac{1}{w}\right)^{P-i}$ where $w$ is the window lenght in the Stationary Bootstrap
    
<br>

* Alternatively use bootstrap variance $\hat{\omega}_{j}^{2} = \frac{P}{B} \sum_{b=1}^{B} \left(\mathbf{\bar{\delta}}_{b,j}^{\ast} - \mathbf{\bar{\delta}}_{j}\right)^{2}$

<br>

### __The Algorithm (Studentized Bootstrap Reality Check)__

---

1. _Estimate_ $\hat{\omega}_{j}^{2}$ _and compute_ $T^{SPA} = \max{\left( \bar{\delta} / \sqrt{\hat{\omega}_{j}^{2} / P} \right)}$


2. _For_ $b = 1, \ldots, B$ _re-sample the vector of loss differentials_ $\mathbf{\delta}_{t}$ _to construct a bootstrap sample_ $\{\mathbf{\delta}_{b,t}^{\ast}\}$ _using the stationary bootstrap_ 


3. _Using the bootstrap sample, compute

<br>

$$
{\large T_{u,b}^{\ast SPA} = \max{\left( \frac{P^{-1} \sum_{t=R+1}^{T} (\delta_{j,b,t}^{\ast} - \bar{\delta}_{j})}{\sqrt{\hat{\omega}_{j}^{2} / P}} \right)}} 
$$

<br>

4. _Compute the Studentized Reality Check p-value as the percentage of the boostrapped maxima which are larger than the sample maximum_

<br>

$$
{\large p-\mbox{value} = \frac{1}{B} \sum_{b=1}^{B} I[T_{u,b}^{\ast SPA} > T_{u}^{SPA}]}
$$

---

<br>

### __The__ $u$ __in__ $T_{u}^{SPA}$ __is for__ ___Upper___

* The $U$ is included to indicate that the p-value derived using the LFC may not be the best p-value

* Suppose that some of the models have a very low mean and a high standard deviation

* In the RC and SPA-U, all models are assumed to be as good as the benchmark

* This is implemented by always re-centering the bootstrap samples around $\bar{\delta}_{j}$

* If a model is rejectably bad, then it may be possible to improve the power of the RC/SPA-U by excluding this model

* This is implemented using a "pre-test" of the form

<br>

$$
{\large I_{j}^{u} = 1, \quad I_{j}^{c} = \frac{\bar{\delta}_{j}}{\sqrt{\hat{\omega}}_{j}^{2}/P} > - \sqrt{2 \ln{\ln{p}}}, \quad I_{j}^{l} = \bar{\delta}_{j} > 0}
$$

<br>

* The first ($c$ is for _consistent_) tests whether or not the standarized mean loss differential is greater than a HQ-like lower bound

* The second ($l$ is for _lower_) only re-centers if the loss-differential is positive (e.g. the benchmark is out-performed)

<br>

## __General SPA__

### __Algorithm (Test of SPA)__

---

1. _Estimate_ $\hat{\omega}_{j}^{2}$ _and compute_ $T^{SPA} = \max{\left( \bar{\delta} / \sqrt{\hat{\omega}_{j}^{2} / P} \right)}$

2. _For_ $b = 1, \ldots, B$ _re-sample the vector of loss differentials_ $\bar{\delta}_{j}$ _to construct bootstrap sample_ $\{ \mathbf{\delta}_{b,t}^{\ast} \}$ _using the stationary bootstrap_

3. _Using the bootstrap sample, compute_

<br>

$$
{\large T_{s,b}^{\ast SPA} = \max{\left( \frac{P^{-1} \sum_{t=R+1}^{T} (\delta_{j,b,t}^{\ast} - I_{j}^{s}\bar{\delta}_{j})}{\sqrt{\hat{\omega}_{j}^{2} / P}} \right)}, \quad s = l, c, u} 
$$

<br>

4. _Compute the Studentized Reality Check p-values as the percentage of the bootstrapped maxima which are larger than the sample maximum_

<br>

$$
{\large p-\mbox{value} = \frac{1}{B} \sum_{b=1}^{B} I[T_{s,b}^{\ast SPA} > T_{u}^{SPA}], \quad s = l, c, u}
$$

---

<br>

### __Comments on SPA__

* The three versions only differ on whether or not a model is re-centered

* If a model is _not_ re-centered, then it is unlikely to be the maximum in the re-sample distribution
    - This is how "bad" models are discarded in the SPA
    
* Can compute 6 different p-values statistics
    - Studentized or unmodified
    - Indicator function in $l, c, u$
        - Test statistic does not depend on $l, c, u$, only p-value does

* Reality Check uses unmodified loss differentials and $u$

* In practice Studentization bring important gains

* Using $c$ is important if using SPA on large universe of automated rules if some may be very poor

<br>

## __Application of RC to Technical Trading Rules__

* Sullivan, Timmermann, and White (1999) apply the RC to a large universe of technical trading rules

* Rules include:
    - Filter rules
    - Moving Average Oscillators
    - Support and Resistance
    - Channel Breakout
    - On-balance Volume Averages
        - Tracks volume times return sign
        - Similar to Moving Average rules for prices
       
* Total of 7,846 trading rules

* Only use 1 at a time

* Use DJIA as in BLL, updated to 1996

* Consider mean return criteria and Sharpe Ratio

<br>

### __Mean Return Performance BLL Universe__

<br>

![STW Table III](images/STW-Table-III.png)

<br>

### __Sheppard's Notebook for SPA Demo__

See here: https://github.com/bashtage/arch/blob/main/examples/multiple-comparison_examples.ipynb

In [1]:
from arch.bootstrap import SPA

In [2]:
SPA?

[0;31mInit signature:[0m
[0mSPA[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mbenchmark[0m[0;34m:[0m [0;34m'ArrayLike'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmodels[0m[0;34m:[0m [0;34m'ArrayLike'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mblock_size[0m[0;34m:[0m [0;34m'Optional[int]'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mreps[0m[0;34m:[0m [0;34m'int'[0m [0;34m=[0m [0;36m1000[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbootstrap[0m[0;34m:[0m [0;34m"Literal[('stationary', 'sb', 'circular', 'cbb', 'moving block', 'mbb')]"[0m [0;34m=[0m [0;34m'stationary'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstudentize[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnested[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mseed[0m[0;34m:[0m [0;34m'Union[N