* By: Illya Barziy
* Email: illyabarziy@gmail.com
* Reference: __Backtesting__ _by_ Campbell R. Harvey _and_ Yan Liu

## Haircut Sharpe Ratios and Profit Hurdle algorithms

### A General overview of the framework

It is a common practice to discount reported Sharpe ratios by 50% as a result of data mining. The authors of the research developed an analytical way to determine the haircut's magnitude. 

The haircut is the percentage difference between the original Sharpe ratio and the Sharpe ratio adjusted to the effect of data mining.

The authors explain that their framework relies on the concept of multiple testing. 
- If a a set of data $X$ explains $Y$ and the relation is significant with t-ratio of 2.0 (it has a probability value of 0.05), we refer to it as a single test. 
- If multiple sets of data $X_1, X_2, .., X_n$ explain $Y$, the same criteria for significance cannot be used (Some of the variables can produce t-ratios 2.0 and higher). Then, what is the appropriate cut off for statistical significance? 

Generally speaking, with a higher number of sets, the t-ratio is also higher.

When a strategy produces a Sharpe ratio, it's transformed into a t-ratio and then to p-value that takes into account multiple testing. 

In order to use the framework, one has to decide on the number of previous tests. In the research of _Harvey, C.R._, _Y. Liu_, and _H. Zhu_ __“… and the Cross-section of
Expected Returns.”__  [available here](https://faculty.fuqua.duke.edu/~charvey/Research/Published_Papers/P118_and_the_cross.PDF) at least 316 factors explaining the cross-sectional patterns in equity returns.


In the provided approach, the haircut is nonlinear and the marginal Sharpe ratios are heavily penalized in comparison to high Sharpe ratios. Researchers state that it has economic sense, as strategies with high Sharpe ratios have a higher probability of being true discoveries.

Researchers point to the following caveatas of the method:
- High Sharpe ratios may be a result of non-normal distribution of returns. Therefore, Sharpe ratios should be viewed in the context fo the distribution of returns.
- Sharpe ratios are not the only measures of risk, hovever the approach also applies to information ratios.
- Need for determining the significance level for multiple testing.
- Need to choose between the adjustment methods used in the framework provided (there are four of them).


### Method description

Let $r_t$ denote the return for an investment strategy between time $t-1$ and $t$. The strategy can consist of returns from both long and short positions. 

In order to conclude if the strategy is able to maintain true profits, a statistical test is formed to see if the expected excess returns are different from zero. 

Given a set of returns $(r_1, r_2, .., r_T)$, we denote $\mu$ as the mean and $\sigma$ as the standard deviation. T-statistic to test the null-hypothesis that the average return is zero is:

$t-statistic = \frac{\mu}{\sigma/\sqrt{T}}$

_The returns are assumed to be i.i.d. normal_, then the described t-statistic follows a t-distribution with $T-1$ degrees of freedom. This way we can assess the statistical significance of the investment strategy. 

At the same time, the Sharpe ratio is defined as:

$SR = \frac{\mu}{\sigma}$

Therefore, based on the previous equation, 

$SR = \frac{t-ratio}{\sqrt{T}}$

This shows that a higher Sharpe ratio implies higher t-statistic, which implies higher significance level (with fixed T).







To adjust the Sharpe ratio for data mining bias, first we calculate the p-value of a single test:

$p^s = Pr(|r|>t-ratio) = Pr(|r|>SR*\sqrt{T})$, where $r$ is a random variable of a t-sistribution.

This metric doesn't make sense when hundreds of strategies were tested only the most profitable is presented. 

If N strategies were tested (and we assume the test statistics for N strategies to be independent), under the null hypothesis that none of the strategies can generate non-zero returns, multiple testing p-value is:

$p^M = Pr(max\{|r_i|, i = 1, .., N\}>t-ratio) = 1 - \prod^N_{i=1}Pr(|r_i|\le t-ratio) = 1 - (1 - p^S)^N$

For $N=10, p^S=0.05$ whereas $p^M=0.401$. Multiple testing greatly reduces the statistical significance of a single test. 

Equating the p-value of a single test to $p^M$ will provide the equation for calculating the adjusted (haircut) Sharpe ratio $HSR$:

$p^M = Pr(|r|>HSR * \sqrt{T})$

__A numerical example:__ 

$T = 200$ - ten years of monthly observations, $SR = 0.75$ - observed annual Sharpe ratio of $0.75$ and p-value of $0.0008$ in a single test.  When we assume the number of other strategies tested $N = 200$ and, therefore $p^M = 0.15$, we can calculate the adjusted Sharpe ratio $HSR = 0.32$, thus being reduced by $60\%$. 

This calculation is true when N strategies are independent, however, this approach is not applicable for real-life cases. For this reason, in the paper _Harvey, C.R._, _Y. Liu_, and _H. Zhu_ __“… and the Cross-section of
Expected Returns.”__  [available here](https://faculty.fuqua.duke.edu/~charvey/Research/Published_Papers/P118_and_the_cross.PDF) authors provide a multiple testing framework to find the appropriate p-value adjustment. This model is referred to as the HLZ model.

### The HLZ model

This model adjusts p-values for multiple testing taking into account that the strategies are not independent. It consists of three methods.

#### Bonferroni method

First, the p-values are ordered in ascending orders.

$p_{(1)} \le p_{(2)} \le ... \le p_{(M)}$

This method adjusts each p-value equally - inflates the original p-value by the number of tests $M$:

$p^{Bonferroni}_{(i)} = min[M*p_{(i)}, 1], i=1, .., M$
