<a href="https://colab.research.google.com/github/jo-cho/advances_in_financial_machine_learning/blob/master/Chapter_12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

There are two ways of backtesting. One is the WF(walk-forawrd) method, and the other is the cross-validation method.

##WF
WF is a historical simulation of how the strategy would have performed in
past.

Disadvantages
- First, a single scenario is tested (the historical
path), which can be easily overfit (Bailey et al. [2014]). 
- Second, WF is not
necessarily representative of future performance, as results can be biased by the particular
sequence of datapoints.
- The third disadvantage of WF is that the initial decisions are made on a smaller
portion of the total sample.

## CV

The goal of backtesting through cross-validation (CV) is not to derive historically accurate performance, but to infer future performance from a number of out-of-sample scenarios.


Advantages
1. The test is not the result of a particular (historical) scenario. In fact, CV
tests k alternative scenarios, of which only one corresponds with the historical
sequence.
2. Every decision is made on sets of equal size. This makes outcomes comparable
across periods, in terms of the amount of information used to make those
decisions.
3. Every observation is part of one and only one testing set. There is no warm-up
subset, thereby achieving the longest possible out-of-sample simulation.

Disadvantages
1. Like WF, a single backtest path is simulated (although not the historical one).
There is one and only one forecast generated per observation.
2. CV has no clear historical interpretation. The output does not simulate how the
strategy would have performed in the past, but how it may perform in the future
under various stress scenarios (a useful result in its own right).
3. Because the training set does not trail the testing set, leakage is possible.
Extreme care must be taken to avoid leaking testing information into the training
set. See Chapter 7 for a discussion on how purging and embargoing can
help prevent informational leakage in the context of CV.

## THE COMBINATORIAL PURGED CROSS-VALIDATION METHOD

Given a number 𝜑 of backtest
paths targeted by the researcher, CPCV generates the precise number of combinations
of training/testing sets needed to generate those paths, while purging training
observations that contain leaked information.

CPCV stages
1. Partition T observations into N groups without shuffling, where groups
n = 1,…,N − 1 are of size ⌊T∕N⌋, and the Nth group is of size T −
⌊T∕N⌋ (N − 1).
2. Compute all possible training/testing splits, where for each split N − k groups
constitute the training set and k groups constitute the testing set.
3. For any pair of labels (yi, yj), where yi belongs to the training set and yj belongs
to the testing set, apply the PurgedKFold class to purge yi if yi spans over a
period used to determine label yj. This class will also apply an embargo, should
some testing samples predate some training samples.
4. Fit classifiers on the
( N
N−k
)
training sets, and produce forecasts on the respective
( N
N−k
)
testing sets.
5. Compute the 𝜑[N, k] backtest paths. You can calculate one Sharpe ratio from
each path, and from that derive the empirical distribution of the strategy’s
Sharpe ratio (rather than a single Sharpe ratio, like WF or CV).

# Exercises

## 1. 
Q. Suppose that you develop a momentum strategy on a futures contract, where
the forecast is based on an AR(1) process. You backtest this strategy using the
WF method, and the Sharpe ratio is 1.5. You then repeat the backtest on the
reversed series and achieve a Sharpe ratio of –1.5. What would be the mathematical
grounds for disregarding the second result, if any?

A. it is as easy to overfit a
walk-forward backtest as to overfit a walk-backward backtest, and the fact that changing
the sequence of observations yields inconsistent outcomes is evidence of that
overfitting.
If proponents of WF were right, we should observe that walk-backwards
backtests systematically outperform their walk-forward counterparts. That is not the
case, hence the main argument in favor of WF is rather weak.

##2.
Q. You develop a mean-reverting strategy on a futures contract. Your WF backtest
achieves a Sharpe ratio of 1.5. You increase the length of the warm-up period,
and the Sharpe ratio drops to 0.7. You go ahead and present only the result with
the higher Sharpe ratio, arguing that a strategy with a shorter warm-up is more
realistic. Is this selection bias?

A. Even if a warm-up period is set, most of the information
is used by only a small portion of the decisions.
Although this problem is attenuated
by increasing the warm-up period, doing so also reduces the length of the
backtest.

##3.
Q. Your strategy achieves a Sharpe ratio of 1.5 on a WF backtest, but a Sharpe
ratio of 0.7 on a CV backtest. You go ahead and present only the result with
the higher Sharpe ratio, arguing that the WF backtest is historically accurate,
while the CV backtest is a scenario simulation, or an inferential exercise. Is this
selection bias?

## 4.
Your strategy produces 100,000 forecasts over time. You would like to derive
the CPCV distribution of Sharpe ratios by generating 1,000 paths.What are the
possible combinations of parameters (N, k) that will allow you to achieve that?

In [0]:
#path = testing sets = 1000
#1000 = k/N(N N-k)

In [0]:
from scipy.special import comb

In [0]:
def path(N,k):
  path = k/N*comb(N,N-k)
  return path

n of path = k/N*comb(N,N-K) = 1000

n of forecasts = n of path * N = 100000

N $\leq$ 100

In [52]:
for N in range(1,101):
  for k in range(1,N):
    if 1001 > path(N,k) >= 1000:
      print('(N,k)=',(N,k))

(N,k)= (15, 11)


##5.
