### Deviation From Normality
- Actual Time Series Return on different Asset Classes
    - They are not normally distributed
    - Assumption where the return from the asset has normal distribution is actually an assumption.

##### The Gaussian Assumption
- Standard simplifying assumption asset returns are **normally distributed**.
- If we actually take a consideration of real-life stock, the return or loss is not normally distributed.  
    - In the Gaussian Assumption, we underestimate the maginude of the return.

##### Higher Order Moments
- Skewness
    - the measure of symmetry of the given distribution
    - negative value
        - the probability of getting an outcome below the mean is higher
        - The Skewness of the given Normal/Gaussian Distribution can be calculatedf as follow:
            $$ S(R) = \frac{E[(R-E(R))^3]}{[Var(R)]^{\frac{3}{2}}} = \frac{E[(R-E(R))^3]}{[s.d.(R)]^3}$$
            - Where:
                - $E[(R-E(R))^3]$ is the third center moment of the return distribution
                - How far the each value is away from the mean? -> also take a consideration of the negative value as well.

    - positive value
        - the probability of getting an outcome above the mean is higher
- Kurtosis
    - the measure of the thickness of the tail of the distribution, i.e. measuring the tailed-ness of the given distribution.
    - The Normal Distribution has a very thin tail, i.e. the decreases very sharply to zero.
        - The probability of getting very large negative and positive outcomes tend to be very small with a normal distribution.
    - In real-life, the fatter tails tend to be constructed.
    - The Kurtosis can be calculated by as follow:
        $$ K(R) = \frac{E[(R - (E(R))^4]}{[Var(R)]^2} $$
        - where:
            - the numerator is the fourth center moment of the return distribution
            - which is normalied by dividing by the standard deviation power of 2
    - a portfolio with a high kurtosis, i.e., fat tail, profile amplifies the risk of extreme returns.
        - an increases likelihood for substantial gains or losses.
        - In other words, a lepto-kurtic distribution, high kurtosis, suggests that the extreme events are more probable.

##### Jarque-Bera Test
- A statistical goodness-of-fit test that assesses whethere sample data exhibits **skewness** and **kurtosis** consistent with a *normal distribution*.
    - in other words, the Jarque-Bera Test evalueates whether the **skewness** and **kurtosis** of a dataset match those expected in a normal distribution.
- always non-negative
- If JB deviates significantly from zero, it indicates that the data do not follow a normal distribution
- Formula:
    $$ JB = \frac{n}{6}{(S^2+\frac{1}{4}(K-3)^2)} $$
    - where:
        - `n`: the number of obeservations or degree of freedom
        - `S`: the sample Skewness
        - `K`: the sample Kurtosis
        - the estimates of third and fourth central moments are involved in the calculation.
- Hypothesis Testing
    - Null Hypothesis, $H_0$: The data come from a normal distribution
        - skewness = 0
        - kurtosis = 0
    - Alternative Hypothesis, $H_1$: THe do not follow a normal distribution.
    - If the data are indeed normally distributed, the JB statistic, asymptotically, follows a chi-squared distribution with two degree of freedom.

In [4]:
/%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [5]:
import pandas as pd
import numpy as np
import edhec_risk_kit as erk

In [6]:
hfi = erk.get_hfi_returns()
hfi.head()

  hfi = pd.read_csv("data/edhec-hedgefundindices.csv",


Unnamed: 0_level_0,Convertible Arbitrage,CTA Global,Distressed Securities,Emerging Markets,Equity Market Neutral,Event Driven,Fixed Income Arbitrage,Global Macro,Long/Short Equity,Merger Arbitrage,Relative Value,Short Selling,Funds Of Funds
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1997-01,0.0119,0.0393,0.0178,0.0791,0.0189,0.0213,0.0191,0.0573,0.0281,0.015,0.018,-0.0166,0.0317
1997-02,0.0123,0.0298,0.0122,0.0525,0.0101,0.0084,0.0122,0.0175,-0.0006,0.0034,0.0118,0.0426,0.0106
1997-03,0.0078,-0.0021,-0.0012,-0.012,0.0016,-0.0023,0.0109,-0.0119,-0.0084,0.006,0.001,0.0778,-0.0077
1997-04,0.0086,-0.017,0.003,0.0119,0.0119,-0.0005,0.013,0.0172,0.0084,-0.0001,0.0122,-0.0129,0.0009
1997-05,0.0156,-0.0015,0.0233,0.0315,0.0189,0.0346,0.0118,0.0108,0.0394,0.0197,0.0173,-0.0737,0.0275


##### Skewness
Intuitively, a negative skew means that we get more negative returns than we would have expected if the returns were distributed like the normal distribution.

Another way of thinking about it is if that returns are normally distributed, the mean and the median would be very close.

However, if they are negatively skewed, the expected value (the mean) is less than the median. If they are positively skewed, the expected value (the mean) is the grater than the median.

In [7]:
'''
the following df shows the following:
    - mean of each column
    - median of each column
    - if mean is greater than median, it returns True, otherwise False
        - if mean > median: then the data is right/positively skewed.
        - if mean < median: then the data is left/negatively skewed.
'''
pd.concat([hfi.mean(), hfi.median(), hfi.mean()>hfi.median()], axis=1)

Unnamed: 0,0,1,2
Convertible Arbitrage,0.005508,0.0065,False
CTA Global,0.004074,0.0014,True
Distressed Securities,0.006946,0.0089,False
Emerging Markets,0.006253,0.0096,False
Equity Market Neutral,0.004498,0.0051,False
Event Driven,0.006344,0.0084,False
Fixed Income Arbitrage,0.004365,0.0055,False
Global Macro,0.005403,0.0038,True
Long/Short Equity,0.006331,0.0079,False
Merger Arbitrage,0.005356,0.006,False


In [8]:
def skewness(r):
    '''
    Alternative to scipy.stats.skew()
    Compute the skewness of the given Series or Dataframe based on the given equation above.
    '''
    # (R - E(R))^3
    demeaned_r = r - r.mean()

    '''
    # use the population standard deviation, so set ddof=0
    # where ddof stands for "delta degree of freedom"
    # The divisor used in calculations is (N - ddof), where N represents the number of elements.
    '''
    # (Var(R))^(3/2) or (s.d.(R))^3
    sigma_r = r.std(ddof=0)

    # ExpectedValue(demeaned_r ** 3)
    exp = (demeaned_r**3).mean()
    return exp/((sigma_r)**3)

In [9]:
skewness(hfi).sort_values()

Fixed Income Arbitrage   -3.940320
Convertible Arbitrage    -2.639592
Equity Market Neutral    -2.124435
Relative Value           -1.815470
Event Driven             -1.409154
Merger Arbitrage         -1.320083
Distressed Securities    -1.300842
Emerging Markets         -1.167067
Long/Short Equity        -0.390227
Funds Of Funds           -0.361783
CTA Global                0.173699
Short Selling             0.767975
Global Macro              0.982922
dtype: float64

In [10]:
'''
Let's use skewness function that is built into scipy.stats
'''
import scipy.stats

In [11]:
sorted(scipy.stats.skew(hfi))

[-3.940320291190085,
 -2.6395922251089274,
 -2.1244353839421204,
 -1.8154697489380174,
 -1.4091535635547947,
 -1.3200833333543787,
 -1.3008420437912207,
 -1.1670674947992332,
 -0.39022677418839474,
 -0.36178308368373274,
 0.1736986449903901,
 0.7679748443026674,
 0.9829218839470764]

Let's look at the skewness that we would expect from a truly random sequence of returns.
Random Normal Generator from numpy will be used and generate the same numberf of returns as we have for the hedge fund data.

In [12]:
hfi.shape

(263, 13)

In [13]:
normal_rets = np.random.normal(0, 0.15, (263, 1))

In [14]:
normal_rets.mean(), normal_rets.std()

(0.004755211682899364, 0.15737000972814627)

In [15]:
erk.skewness(normal_rets)

0.26406491199050625

In [16]:
scipy.stats.skew(normal_rets)

array([0.26406491])

##### Kurtosis
The *kurtosis* measures the **"fatness** of the tails of the distribution. The normal distribution has a kurtosis of 3 and so if the kurtosis of you returns is less than 3, then ijt tends to have thinner tails and if the kurtosis is greater than 3, then it is considered that the distribution has fat tails

In [17]:
def kurtosis(r):
    '''
    alternative of the scipy.stats.kutosis
    '''
    # (R - E(R))
    demeaned_r = r - r.mean()

    # Var(R)
    sigma_r = r.std(ddof=0)

    # ExpectedValue(demeaned_r ** 4)
    exp = (demeaned_r**4).mean()
    return  exp/(sigma_r**4)

In [18]:
erk.kurtosis(hfi).sort_values()

CTA Global                 2.952960
Long/Short Equity          4.523893
Global Macro               5.741679
Short Selling              6.117772
Funds Of Funds             7.070153
Distressed Securities      7.889983
Event Driven               8.035828
Merger Arbitrage           8.738950
Emerging Markets           9.250788
Relative Value            12.121208
Equity Market Neutral     17.218555
Convertible Arbitrage     23.280834
Fixed Income Arbitrage    29.842199
dtype: float64

In [19]:
scipy.stats.kurtosis(hfi)

array([20.28083446, -0.04703963,  4.88998336,  6.25078841, 14.21855526,
        5.03582817, 26.84219928,  2.74167945,  1.52389258,  5.73894979,
        9.12120787,  3.11777175,  4.07015278])

Note that the values produced by scipy.stats module are lower by 3 from the number we have computed. That is becasue, the expected kurtosis of a normally distributed series of numbers is 3, where:
    - if the value > 3: the tail is fat
    - else if the value 3: the tail is not fat
and `scipy.stats` is returning the **Excess Kurtosis**.

##### Running the Jarque-Bera Test for Nomrality
The `scipy.stats` module contains a function that runs the *Jarque-Bera Test` on a sequence of numbers. Let's apply that to the normally generated returns.

- The generated numbers represent the followings:
    - The first number: the test statistic
    - The second number: the `p-value` for the hypothesis test.
        - If we want to run the test at a 1% level of significance, we would like this number to be greater than 0.01 to accept the hypothesis that the data is normally distributed and if that nuimber is less than 0.01, then you must reject the hypothesis of normality.

In [20]:
scipy.stats.jarque_bera(normal_rets)

SignificanceResult(statistic=3.0815882503621155, pvalue=0.21421092357619348)

In this case, we get a number higher than 0.01, we can accept the hypothesis that the numbers are random. 

In [21]:
scipy.stats.jarque_bera(hfi)

SignificanceResult(statistic=25656.585999171337, pvalue=0.0)

In [26]:
erk.is_normal(normal_rets)

True

In [25]:
erk.is_normal(hfi)

Convertible Arbitrage     False
CTA Global                 True
Distressed Securities     False
Emerging Markets          False
Equity Market Neutral     False
Event Driven              False
Fixed Income Arbitrage    False
Global Macro              False
Long/Short Equity         False
Merger Arbitrage          False
Relative Value            False
Short Selling             False
Funds Of Funds            False
dtype: bool

##### Testing CRSP SmallCap and LargeCap Returns for Normality
Let's see whether any of the returns we've been studying so far pass the normality hypothesis

In [27]:
ffme = erk.get_ffme_returns()
erk.skewness(ffme)

SmallCap    4.410739
LargeCap    0.233445
dtype: float64

In [28]:
erk.kurtosis(ffme)

SmallCap    46.845008
LargeCap    10.694654
dtype: float64

In [29]:
erk.is_normal(ffme)

SmallCap    False
LargeCap    False
dtype: bool