<a href="https://colab.research.google.com/github/jo-cho/advances_in_financial_machine_learning/blob/master/Chapter_14.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Statistics of backtest



1. General characteristics - 
  - **Time range**: Time range specifies the start and end dates. The period used to
test the strategy should be **sufficiently long to include a comprehensive number**
of regimes (Bailey and Lopez de Prado [2012]).

  - **Average AUM**: This is the average dollar value of the assets under management.
For the purpose of computing this average, the dollar value of long and
short positions is considered to be a positive real number.

  - **Capacity**: A strategy’s capacity can be measured as the **highest AUM that delivers a target risk-adjusted performance**. A minimum AUM is needed to ensure proper bet sizing and risk diversification. Beyond that minimum AUM, performance will decay as AUM increases, due to higher transaction costs and lower turnover.

  - **Leverage**: Leverage measures the amount of borrowing needed to achieve the reported performance. If leverage takes place, costs must be assigned to it. One way to measure leverage is as the ratio of **average dollar position size to average AUM**.

  - **Maximum dollar position**: Maximum dollar position size informs us
whether the strategy at times took dollar positions that greatly exceeded the average AUM. In general we will **prefer strategies that take maximum dollar positions close to the average AUM**, indicating that they do not rely on the occurrence of extreme events (possibly outliers).

  - **Ratio of longs**: The ratio of longs show what proportion of the bets involved
long positions. In long-short, market neutral strategies, ideally this value is close
to **0.5**. **If not, the strategy may have a position bias**, or the backtested period may
be too short and unrepresentative of future market conditions.

  - **Frequency of bets**: The frequency of bets is the number of bets per year in
the backtest. A sequence of positions on the same side is considered part of the
same bet. A bet ends when the position is flattened or flipped to the opposite
side. The number of bets is always smaller than the number of trades. A trade
count would overestimate the number of independent opportunities discovered
by the strategy.

  - **Average holding period**: The average holding period is the average number of
days a bet is held. High-frequency strategies may hold a position for a fraction
of seconds, whereas low frequency strategies may hold a position for months
or even years. Short holding periods may limit the capacity of the strategy. The
holding period is related but different to the frequency of bets. For example,
a strategy may place bets on a monthly basis, around the release of nonfarm
payrolls data, where each bet is held for only a few minutes.

  - **Annualized turnover**: Annualized turnover measures the ratio of the average
dollar amount traded per year to the average annual AUM. High turnover may
occur even with a low number of bets, as the strategy may require constant
tuning of the position. High turnover may also occur with a low number of trades, if every trade involves flipping the position between maximum long and
maximum short.

  - **Correlation of underlying**: This is the correlation between strategy returns
and the returns of the underlying investment universe. When the correlation is
significantly positive or negative, the strategy is essentially holding or shortselling
the investment universe, without adding much value.

2. Performance - Dollar and returns numbers without risk adjustments
  
  - **PnL**: The total amount of dollars (or the equivalent in the currency of denomination) generated over the entirety of the backtest, including liquidation costs from the terminal position.
  - **PnL from long positions**: The portion of the PnL dollars that was generated exclusively by long positions. This is an interesting value for assessing the bias of long-short, market neutral strategies.
  - **Annualized rate of return**: The time-weighted average annual rate of total return, including dividends, coupons, costs, etc.
  - **Hit ratio**: The fraction of bets that resulted in a positive PnL.
  - **Average return from hits**: The average return from bets that generated a profit.
  - **Average return from misses**: The average return from bets that generated a loss.

3. Runs 

  Investment strategies rarely generate returns drawn from an IID process. In the absence of this property, strategy returns series exhibit frequent runs. Runs are uninterrupted sequences of returns of the same sign. Consequently, runs increase downside risk, which needs to be evaluated with proper metrics.
  - 


4. Efficiency
  - **Annualized Sharpe ratio**: This is the SR value, annualized by a factor
$\sqrt{a}$,
where a is the average number of returns observed per year. This common annualization
method relies on the assumption that returns are IID.
  - **Information ratio**: This is the SR equivalent of a portfolio that measures its performance
relative to a benchmark. It is the annualized ratio between the average
excess return and the tracking error. The excess return is measured as the portfolio’s
return in excess of the benchmark’s return. The tracking error is estimated
as the standard deviation of the excess returns.
  - **Probabilistic Sharpe ratio**: PSR corrects SR for inflationary effects caused
by non-Normal returns or track record length. It should exceed 0.95, for the
standard significance level of 5%. It can be computed on absolute or relative
returns.
  - **Deflated Sharpe ratio**: DSR corrects SR for inflationary effects caused by
non-Normal returns, track record length, and multiple testing/selection bias.
It should exceed 0.95, for the standard significance level of 5%. It can be computed
on absolute or relative returns.

#Exercise

## 1 
A strategy exhibits a high turnover, high leverage, and high number of bets, with
a short holding period, low return on execution costs, and a high Sharpe ratio.
Is it likely to have large capacity? What kind of strategy do you think it is?

low capacity..


##2.
On the dollar bars dataset for E-mini S&P 500 futures, compute

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
import numpy as np
import pandas as pd

In [4]:
!pip install -q mlfinlab
from mlfinlab import data_structures, features, filters, labeling, util

[?25l[K     |██▊                             | 10kB 34.8MB/s eta 0:00:01[K     |█████▌                          | 20kB 3.0MB/s eta 0:00:01[K     |████████▎                       | 30kB 4.4MB/s eta 0:00:01[K     |███████████                     | 40kB 2.9MB/s eta 0:00:01[K     |█████████████▉                  | 51kB 3.6MB/s eta 0:00:01[K     |████████████████▋               | 61kB 4.2MB/s eta 0:00:01[K     |███████████████████▍            | 71kB 4.9MB/s eta 0:00:01[K     |██████████████████████▏         | 81kB 5.5MB/s eta 0:00:01[K     |█████████████████████████       | 92kB 6.1MB/s eta 0:00:01[K     |███████████████████████████▊    | 102kB 4.7MB/s eta 0:00:01[K     |██████████████████████████████▌ | 112kB 4.7MB/s eta 0:00:01[K     |████████████████████████████████| 122kB 4.7MB/s 
[?25h

In [5]:
raw_dollar_bars = data_structures.get_dollar_bars('/content/drive/My Drive/Colab Notebooks/csv/clean_IVE_tickbidask2.csv', threshold=1000000)
dollar_bars = raw_dollar_bars.set_index(pd.to_datetime(raw_dollar_bars.date_time))
dollar_bars = dollar_bars.drop(columns='date_time')
dollar_bars = dollar_bars.reset_index().drop_duplicates(subset='date_time', keep='last').set_index('date_time')

Reading data in batches:
Batch number: 0
Returning bars 



###(a) HHI index on positive returns.

the concentration of positive returns

In [0]:
#————————————————————————————————————————
def getHHI(betRet):
  if betRet.shape[0]<=2:return np.nan
  wght=betRet/betRet.sum()
  hhi=(wght**2).sum()
  hhi=(hhi-betRet.shape[0]**-1)/(1.-betRet.shape[0]**-1)
  return hhi

In [7]:
ret = dollar_bars.close.pct_change().dropna()
ret.head()

date_time
2009-09-28 09:53:49    0.001371
2009-09-28 09:55:26    0.000000
2009-09-28 10:02:52    0.002151
2009-09-28 10:10:21    0.000780
2009-09-28 10:19:36   -0.001170
Name: close, dtype: float64

In [8]:
rHHIPos = getHHI(ret[ret>=0]) # concentration of positive returns per bet
rHHIPos

0.00017091496355296337

### (b) HHI index on negative returns.

In [9]:
rHHINeg = getHHI(ret[ret<0]) # concentration of negative returns per bet
rHHINeg

0.0001529867361240445

###(c) HHI index on time between bars.


In [10]:
ret.groupby(pd.Grouper(freq='M')).count()

date_time
2009-09-30     52
2009-10-31    320
2009-11-30    246
2009-12-31    245
2010-01-31    207
             ... 
2019-02-28    575
2019-03-31    473
2019-04-30    459
2019-05-31    559
2019-06-30    440
Freq: M, Name: close, Length: 118, dtype: int64

In [11]:
tHHI = getHHI(ret.groupby(pd.Grouper(freq='M')).count()) # concentr. bets/month
tHHI

0.0032921257799305814

###(d) The 95-percentile DD.


In [0]:
def computeDD_TuW(series,dollars=False):
  # compute series of drawdowns and the time under water associated with them
  df0=series.to_frame('pnl')
  df0['hwm']=series.expanding().max()
  df1=df0.groupby('hwm').min().reset_index()
  df1.columns=['hwm','min']
  df1.index=df0['hwm'].drop_duplicates(keep='first').index # time of hwm
  df1=df1[df1['hwm']>df1['min']] # hwm followed by a drawdown
  if dollars:dd=df1['hwm']-df1['min']
  else:dd=1-df1['min']/df1['hwm']
  tuw=((df1.index[1:]-df1.index[:-1])/np.timedelta64(1,'Y')).values# in years
  tuw=pd.Series(tuw,index=df1.index[:-1])
  return dd,tuw

In [0]:
series = dollar_bars.close

In [0]:
dd = computeDD_TuW(series)[0]

In [15]:
np.percentile(dd,95)

0.020718013511119477

### (e) The 95-percentile TuW.


In [0]:
tuw = computeDD_TuW(series)[1]

In [17]:
np.percentile(tuw,95)

0.037676976851249655

###(f) Annualized average return.


In [0]:
y = (series.index[-1] - series.index[0]).days/365

In [0]:
p= (1 + ret).prod()

In [20]:
R = (p**(1/y))-1
R

0.088298960150196

### (g) Average returns from hits (positive returns).

In [0]:
pos_ret = ret[ret>0]

In [22]:
(1+pos_ret).prod()**(1/len(pos_ret))-1

0.0011348316679109516

In [23]:
pos_ret.mean()

0.0011376924589268614

###(h) Average return from misses (negative returns).

In [0]:
neg_ret = ret[ret<0]

In [25]:
(1+neg_ret).prod()**(1/len(neg_ret))-1

-0.001137347181231152

In [26]:
neg_ret.mean()

-0.0011342641708143742

### (i) Annualized SR.

In [27]:
a = len(series)/y
a

6008.863764044944

In [28]:
np.sqrt(a) * ret.mean() / ret.std()

0.5142999702472026

###(j) Information ratio, where the benchmark is the risk-free rate.

In [33]:
#let Rf=0.005%
rf=0.005*0.01
np.sqrt(a) * (ret.mean()-rf) / ret.std()

-0.9466371550639973

### (k) PSR


$$ \widehat{PSR}[SR^*] =  Z\begin{bmatrix}\frac{(\widehat{SR}-SR^*) \sqrt{T-1}}{\sqrt{1-\hat{\gamma_3}\widehat{SR}+\frac{\hat{\gamma_4}-1}{4}\widehat{SR}^2 }} \end{bmatrix} $$

Bailey and L´opez de Prado [2012]

Z [.] is the cumulative distribution function (CDF) of the standard Normal distribution,
T is the number of observed returns, ̂𝛾3 is the skewness of the returns, and
̂𝛾4 is the kurtosis of the returns (̂𝛾4 = 3 for Gaussian returns). For a given SR∗, ̂PSR
increases with greater ̂SR (in the original sampling frequency, i.e. non-annualized), or
longer track records (T), or positively skewed returns (̂𝛾3), but it decreases with fatter tails (̂𝛾4).

In [0]:
import scipy.stats as ss

#https://github.com/esvhd/pypbo/blob/master/pypbo/pbo.py

def psr(sharpe, T, skew, kurtosis, target_sharpe=0):
    """
    Probabilistic Sharpe Ratio.
    Parameters:
        sharpe:
            observed sharpe ratio, in same frequency as T.
        T:
            no. of observations, should match return / sharpe sampling period.
        skew:
            sharpe ratio skew
        kurtosis:
            sharpe ratio kurtosis
        target_sharpe:
            target sharpe ratio
    Returns:
        Cumulative probabilities for observed sharpe ratios under standard
        Normal distribution.
    """
    value = (
        (sharpe - target_sharpe)
        * np.sqrt(T - 1)
        / np.sqrt(1.0 - skew * sharpe + sharpe ** 2 * (kurtosis - 1) / 4.0)
    )
    # print(value)
    psr = ss.norm.cdf(value, 0, 1)
    return psr

In [0]:
sharpe = ret.mean() / ret.std()

In [76]:
skew = ss.skew(ret)
kurt = ss.kurtosis(ret)
skew,kurt

(10.04305600603968, 5203.254150903189)

In [75]:
psr(sharpe, len(ret),skew,kurt)

0.9467085634660752

### (l) DSR, where we assume there were 100 trials, and the variance of the trials' SR was 0.5.

In [0]:
def expected_max(N):
    """
    Expected maximum of IID random variance X_n ~ Z, n = 1,...,N,
    where Z is the CDF of the standard Normal distribution,
    E[MAX_n] = E[max{x_n}]. Computed for a large N.
    """
    if N < 5:
        raise AssertionError("Condition N >> 1 not satisfied.")
    return (1 - np.euler_gamma) * ss.norm.ppf(
        1 - 1.0 / N
    ) + np.euler_gamma * ss.norm.ppf(1 - np.exp(-1) / N)
#---------------------------------------------------------------------------
def dsr(test_sharpe, sharpe_std, N, T, skew, kurtosis):
    """
    Deflated Sharpe Ratio statistic. DSR = PSR(SR_0).
    See paper for definition of SR_0. http://ssrn.com/abstract=2460551
    Parameters:
        test_sharpe :
            reported sharpe, to be tested.
        sharpe_std :
            standard deviation of sharpe ratios from N trials / configurations
        N :
            number of backtest configurations
        T :
            number of observations
        skew :
            skew of returns
        kurtosis :
            kurtosis of returns
    Returns:
        DSR statistic
    """
    # sharpe_std = np.std(sharpe_n, ddof=1)
    target_sharpe = sharpe_std * expected_max(N)

    dsr_stat = psr(test_sharpe, T, skew, kurtosis, target_sharpe)

    return dsr_stat

In [77]:
dsr(sharpe,np.sqrt(0.5),100,len(ret),skew,kurt)

0.0

Bailey and L´opez de Prado [2014]

## 3. Consider a strategy that is long one futures contract on even years, and is short one futures contract on odd years.

In [0]:
bi = series.copy()
bi[bi.index.year % 2 != 0] = 1
bi[bi.index.year % 2 == 0] = -1

In [0]:
ret2 = ret*bi.dropna()

In [55]:
rHHIPos2 = getHHI(ret2[ret2>=0])
rHHINeg2 = getHHI(ret2[ret2<0])
tHHI2 = getHHI(ret2.groupby(pd.Grouper(freq='M')).count())
print(rHHIPos2, rHHINeg2, tHHI2)

0.0001491558922275327 0.00017538563866829847 0.0032921257799305814


In [53]:
p2= (1 + ret2).prod()
AnnR = (p2**(-y))-1
AnnR

5.348213527901592

In [52]:
pos_ret2 = ret2[ret2>=0]
neg_ret2 = ret2[ret2<0]
print(
pos_ret2.mean(),
neg_ret2.mean())

0.0010704985436283609 -0.001143628547932107


In [56]:
np.sqrt(a) * ret2.mean() / ret2.std()
#AnnSR

0.012315168582883873

## 4. The results from a 2-year backtest are that monthly returns have a mean of 3.6%, and a standard deviation of 0.079%.

### (a) What is the SR? (b) What is the annualized SR?

In [58]:
a_4 = 12

SR_4 = 3.6/0.079
ASR_4= np.sqrt(a_4)*SR_4
print(SR_4, ASR_4)

45.56962025316456 157.85779512020147


##5. Following on exercise 1:
(a) The returns have a skewness of 0 and a kurtosis of 3. What is the PSR?

(b) The returns have a skewness of -2.448 and a kurtosis of 10.164. What is the
PSR?

In [64]:
psr(SR_4,24,0,3)

0.9999999999939523

In [65]:
psr(SR_4,24,-2.448,10.164)

0.9991308782532012

## 6 What would be the PSR from 2.b, if the backtest had been for a length of 3 years?

In [66]:
psr(SR_4,36,-2.448,10.164)

0.9999440377053758

##7. A 5-year backtest has an annualized SR of 2.5, computed on daily returns. The skewness is -3 and the kurtosis is 10.
(a) What is the PSR?

(b) In order to find that best result, 100 trials were conducted. The variance of
the Sharpe ratios on those trials is 0.5. What is the DSR?

In [0]:
SR_7 = 2.5/np.sqrt(365)

In [68]:
psr(SR_7, 5*365,-3,10)

0.99999850616158

In [72]:
dsr(SR_7,np.sqrt(0.5),100,5*365,-3,10)

0.0