# Skewness and Kurtosis of Investment Returns

In this practice, we will observe the violation of the Normal Assumption in the historical data of Investment Returns.

I will use the data from *__EDHEC Risk Hedge Fund Indices__*

Source: https://risk.edhec.edu/all-downloads-hedge-funds-indices


In [1]:
%load_ext autoreload
%autoreload 2


In [2]:
import pandas as pd
import edhec_risk_kit_105 as erk
hfi = erk.get_hfi_returns()
hfi.head()

Unnamed: 0_level_0,Convertible Arbitrage,CTA Global,Distressed Securities,Emerging Markets,Equity Market Neutral,Event Driven,Fixed Income Arbitrage,Global Macro,Long/Short Equity,Merger Arbitrage,Relative Value,Short Selling,Funds Of Funds
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1997-01,0.0119,0.0393,0.0178,0.0791,0.0189,0.0213,0.0191,0.0573,0.0281,0.015,0.018,-0.0166,0.0317
1997-02,0.0123,0.0298,0.0122,0.0525,0.0101,0.0084,0.0122,0.0175,-0.0006,0.0034,0.0118,0.0426,0.0106
1997-03,0.0078,-0.0021,-0.0012,-0.012,0.0016,-0.0023,0.0109,-0.0119,-0.0084,0.006,0.001,0.0778,-0.0077
1997-04,0.0086,-0.017,0.003,0.0119,0.0119,-0.0005,0.013,0.0172,0.0084,-0.0001,0.0122,-0.0129,0.0009
1997-05,0.0156,-0.0015,0.0233,0.0315,0.0189,0.0346,0.0118,0.0108,0.0394,0.0197,0.0173,-0.0737,0.0275


# A. Skewness
## A.1 Calculate the Skewness of Investment Returns

In [3]:
List1=[hfi.mean(),hfi.median(),hfi.mean()<hfi.median()]
List2=["Mean", "Median", "Mean < Median"]
Dist_Check=pd.concat(List1, axis="columns")
Dist_Check.columns=List2
Dist_Check

Unnamed: 0,Mean,Median,Mean < Median
Convertible Arbitrage,0.005508,0.0065,True
CTA Global,0.004074,0.0014,False
Distressed Securities,0.006946,0.0089,True
Emerging Markets,0.006253,0.0096,True
Equity Market Neutral,0.004498,0.0051,True
Event Driven,0.006344,0.0084,True
Fixed Income Arbitrage,0.004365,0.0055,True
Global Macro,0.005403,0.0038,False
Long/Short Equity,0.006331,0.0079,True
Merger Arbitrage,0.005356,0.006,True


In [4]:
import scipy.stats
Dist_Check["Skewness"]=scipy.stats.skew(hfi)
Dist_Check

Unnamed: 0,Mean,Median,Mean < Median,Skewness
Convertible Arbitrage,0.005508,0.0065,True,-2.639592
CTA Global,0.004074,0.0014,False,0.173699
Distressed Securities,0.006946,0.0089,True,-1.300842
Emerging Markets,0.006253,0.0096,True,-1.167067
Equity Market Neutral,0.004498,0.0051,True,-2.124435
Event Driven,0.006344,0.0084,True,-1.409154
Fixed Income Arbitrage,0.004365,0.0055,True,-3.94032
Global Macro,0.005403,0.0038,False,0.982922
Long/Short Equity,0.006331,0.0079,True,-0.390227
Merger Arbitrage,0.005356,0.006,True,-1.320083


## A.2 The Skewness of Normal Random Sample
Let's use numpy randome normal to populate a sample of normally distributed variable.

Let's make it comparable with the observations in 'hfi' dataframe.
In order to do so, let's find out the size of the dataframe 'hfi'.


In [5]:
hfi.shape

(263, 13)

Right!, Then, we can populate 263 of normal random samples.

In [6]:
import numpy as np
normal_returns = np.random.normal(0, .15, size=(hfi.shape[0],1))

Let's check the skewness of the freshly populated normal random samples.

In [7]:
scipy.stats.skew(normal_returns)

array([-0.37733551])

# B. Kurtosis
## B.1 Kurtosis of Investment Returns

### <span style='color: red;'> *__Caution__* </span>

Somehow, *__scipy.stats.kurtosis__* does not calculate Kurtosis.

It calculates *__Excess Kurtosis__*, which is Kurtosis beyond 3. 


## <span style='color: blue;'> *__scipy.stats.kurtosis  ==> 'Excess Kurtosis' or Kurtosis - 3__*  </span>



In [8]:
Dist_Check["Excess Kurtosis"]=scipy.stats.kurtosis(hfi)
Dist_Check

Unnamed: 0,Mean,Median,Mean < Median,Skewness,Excess Kurtosis
Convertible Arbitrage,0.005508,0.0065,True,-2.639592,20.280834
CTA Global,0.004074,0.0014,False,0.173699,-0.04704
Distressed Securities,0.006946,0.0089,True,-1.300842,4.889983
Emerging Markets,0.006253,0.0096,True,-1.167067,6.250788
Equity Market Neutral,0.004498,0.0051,True,-2.124435,14.218555
Event Driven,0.006344,0.0084,True,-1.409154,5.035828
Fixed Income Arbitrage,0.004365,0.0055,True,-3.94032,26.842199
Global Macro,0.005403,0.0038,False,0.982922,2.741679
Long/Short Equity,0.006331,0.0079,True,-0.390227,1.523893
Merger Arbitrage,0.005356,0.006,True,-1.320083,5.73895


## B.2 Kurtosis of Normal Random Sample

### <span style='color: red;'> *__Caution__* </span>

Repeatedly, *__scipy.stats.kurtosis__* does not calculate Kurtosis.

It calculates *__Excess Kurtosis__*, which is Kurtosis beyond 3. 

So, the output of *__scipy.stats.kurtosis__* for the normal random sample would not be close to 3, instead it would be close to 0. 

In [9]:
scipy.stats.kurtosis(normal_returns)

array([0.16322147])

# C. Testing Normality: Jarque-Bera test

What can we do in order to assess the <span style='color: blue;'>*__Normality__* </span> of a distribution?

> The Jarque-Bera test tests whether the sample data has the skewness and kurtosis matching a normal distribution.

> Note that this test only works for a large enough number of data samples (>2000) as the test statistic asymptotically has a Chi-squared distribution with 2 degrees of freedom.


 * Source (scipy document): https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.jarque_bera.html
 
> *__The null hypothesis__* for the test is that the data is normally distributed; the alternate hypothesis is that the data does not come from a normal distribution.

> In general, a large J-B value indicates that errors are not normally distributed.

> Unfortunately, most statistical software does not support this test. In order to interpret results, you may need to do a little comparison (and so you should be intimately familiar with hypothesis testing). Checking p-values is always a good idea. For example, a tiny p-value and a large chi-square value from this test means that you can reject the null hypothesis that the data is normally distributed.


 * Source (Statistics): Stephanie Glen. "Jarque-Bera Test" From StatisticsHowTo.com: Elementary Statistics for the rest of us! https://www.statisticshowto.com/jarque-bera-test/



In [10]:
scipy.stats.jarque_bera(normal_returns)

(6.533025188894738, 0.03813920228592793)

In [11]:
scipy.stats.jarque_bera(normal_returns)[0]

6.533025188894738

In [12]:
scipy.stats.jarque_bera(normal_returns)[1]

0.03813920228592793

Now, we define a function to assess the normality of data, using *__Jacque Bera__*

Let's use 0.01 as the criteria for *__p value__*

In [13]:
def JB_normality_test(r, level=0.01):
    """
    Applies the Jarque-Bera test to determine if a Series is normal or not
    Test is applied at the 1% level by default
    Returns True if the hypothesis of normality is accepted, False otherwise
    """
    if isinstance(r, pd.DataFrame):
        return r.aggregate(is_normal)
    else:
        statistic, p_value = scipy.stats.jarque_bera(r)
        return p_value > level


Use *__.aggregate()__* to pass the data into the function *__JB_normality_test__*

> Aggregate using one or more operations over the specified axis.

Source: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html


In [14]:
Dist_Check["JB Normality Test"]=hfi.aggregate(JB_normality_test)
Dist_Check

Unnamed: 0,Mean,Median,Mean < Median,Skewness,Excess Kurtosis,JB Normality Test
Convertible Arbitrage,0.005508,0.0065,True,-2.639592,20.280834,False
CTA Global,0.004074,0.0014,False,0.173699,-0.04704,True
Distressed Securities,0.006946,0.0089,True,-1.300842,4.889983,False
Emerging Markets,0.006253,0.0096,True,-1.167067,6.250788,False
Equity Market Neutral,0.004498,0.0051,True,-2.124435,14.218555,False
Event Driven,0.006344,0.0084,True,-1.409154,5.035828,False
Fixed Income Arbitrage,0.004365,0.0055,True,-3.94032,26.842199,False
Global Macro,0.005403,0.0038,False,0.982922,2.741679,False
Long/Short Equity,0.006331,0.0079,True,-0.390227,1.523893,False
Merger Arbitrage,0.005356,0.006,True,-1.320083,5.73895,False
