In [1]:
import pandas as pd
from pyplotz.pyplotz import PyplotZ, plt
pltz = PyplotZ()
pltz.enable_chinese()

In [2]:
# Parameter
return_path = "../data/clean_data/return_20200906.csv"

In [8]:
df_ret = pd.read_csv(return_path)
df_ret["trade_date"] = pd.to_datetime(df_ret["trade_date"])
df_ret = df_ret.set_index("trade_date")

In [9]:
df_ret.head()

Unnamed: 0_level_0,伊利股份,恒瑞医药,海螺水泥
trade_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1999-08-01,,,
1999-09-01,-0.196442,,
1999-10-01,-0.046369,,
1999-11-01,0.004881,,
1999-12-01,-0.177359,,


## Skewness

Intuitively, a negative skew means that you get more negative returns than you would have expected if the returns were distributed like the normal distribution.

Another way of thinking about it is if that returns are normally distributed, the mean and the median would be very close.

However, if they are negatively skewed, the expected value i.e. the mean is less than the median. If they are positively skewed, the expected value (again, the mean) is greater than the median.

Recall that the skewness is given by:

$$ S(R) = \frac{E[ (R-E(R))^3 ]}{\sigma_R^3} $$


In [10]:
import scipy.stats

def skewness(r):
    """
    Alternative to scipy.stats.skew()
    Computes the skewness of the supplied Series or DataFrame
    Returns a float or a Series
    """
    demeaned_r = r - r.mean()
    # use the population standard deviation, so set dof=0
    sigma_r = r.std(ddof=0)
    exp = (demeaned_r ** 3).mean()
    return exp / sigma_r ** 3

In [16]:
skewness(df_ret.dropna())

伊利股份   -0.856970
恒瑞医药    0.585239
海螺水泥   -0.067847
dtype: float64

In [19]:
print(df_ret.columns)
scipy.stats.skew(df_ret.dropna())

Index(['伊利股份', '恒瑞医药', '海螺水泥'], dtype='object')


array([-0.85696965,  0.58523937, -0.06784677])

# Kurtosis

Intuitively, the kurtosis measures the "fatness" of the tails of the distribution. The normal distribution has a kurtosis of 3 and so if the kurtosis of your returns is less than 3 then it tends to have thinner tails, and if the kurtosis is greater than 3 then the distribution has fatter tails.

Kurtosis is given by:

$$ K(R) = \frac{E[ (R-E(R))^4 ]}{\sigma_R^4} $$

In [18]:
from scipy.stats import kurtosis

print(df_ret.columns)
kurtosis(df_ret.dropna())

Index(['伊利股份', '恒瑞医药', '海螺水泥'], dtype='object')


array([3.26407443, 1.75061037, 1.3182084 ])

# Jarque-Bera Test for Normality

In [22]:
from scipy.stats import jarque_bera

def is_normal(r, level=0.05):
    """
    Applies the Jarque-Bera test to determine if a Series is normal or not
    Test is applied at the 1% level by default
    Returns True if the hypothesis of normality is accepted, False otherwise
    """
    statistic, p_value = jarque_bera(r)
    return p_value > level

In [24]:
df_ret.agg(is_normal)

伊利股份    False
恒瑞医药    False
海螺水泥    False
dtype: bool