# Hypothesis Test in Python

*Short introduction to the most common Hypothesis tests available in Python.*

Source: https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/

## Normality Tests

*Tests for checking whether data has Gaussian distribution.*

<div class="alert alert-block alert-warning"><b>Shapiro-Wilk</b></div>

Assumpitons:

- each sample is independent and identically distributed (all samples are taken from the same distribution)

Interpretation:

- H0: the sample has a Gaussian distribution.
- H1: the sample does not have a Gaussian distribution.

In [4]:
from scipy.stats import shapiro
data = [1,2,3,4,5,6,7,8,9] #an easy example
data2 = [1,2,3,4,5,6,7,8,9, -100] #I will ruin my first data on purpose :)
data_col = [data,data2]


for data in data_col: 
    stat, p = shapiro(data)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably Gaussian')
    else:
        print('Probably not Gaussian')

stat=0.972, p=0.914
Probably Gaussian
stat=0.441, p=0.000
Probably not Gaussian


*We can see that for this simple data everything works just fine. Meaning: second distribution is skewed on purpose and therefore is not Gaussian*

<div class="alert alert-block alert-warning"><b>D’Agostino’s K^2 Test</b></div>

*Same assumptions and interpretation as for Shapiro-Wilk.*

In [6]:
import warnings
warnings.filterwarnings('ignore')

from scipy.stats import normaltest
data = [1,2,3,4,5,6,7,8,9] #an easy example
data2 = [1,2,3,4,5,6,7,8,9, -100] #I will ruin my first data on purpose :)
data_col = [data,data2]


for data in data_col: 
    stat, p = normaltest(data)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably Gaussian')
    else:
        print('Probably not Gaussian')

stat=1.861, p=0.394
Probably Gaussian
stat=28.106, p=0.000
Probably not Gaussian


<div class="alert alert-block alert-warning"><b>Anderson-Darling Test</b></div>

*Same assumptions and interpretation as for Shapiro-Wilk.*

In [None]:
from scipy.stats import anderson
data = [1,2,3,4,5,6,7,8,9] #an easy example
data2 = [1,2,3,4,5,6,7,8,9, -100] #I will ruin my first data on purpose :)
data_col = [data,data2]


for data in data_col: 
    stat, p = normaltest(data)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably Gaussian')
    else:
        print('Probably not Gaussian')

In [8]:
from scipy.stats import anderson
data = [1,2,3,4,5,6,7,8,9] #an easy example
data2 = [1,2,3,4,5,6,7,8,9, -100] #I will ruin my first data on purpose :)
data_col = [data,data2]

for data in data_col: 
    result = anderson(data)

    print('stat=%.3f' % (result.statistic))
    for i in range(len(result.critical_values)):
        sl, cv = result.significance_level[i], result.critical_values[i]
        if result.statistic < cv:
            print('Probably Gaussian at the %.1f%% level' % (sl))
        else:
            print('Probably not Gaussian at the %.1f%% level' % (sl))

stat=0.137
Probably Gaussian at the 15.0% level
Probably Gaussian at the 10.0% level
Probably Gaussian at the 5.0% level
Probably Gaussian at the 2.5% level
Probably Gaussian at the 1.0% level
stat=2.654
Probably not Gaussian at the 15.0% level
Probably not Gaussian at the 10.0% level
Probably not Gaussian at the 5.0% level
Probably not Gaussian at the 2.5% level
Probably not Gaussian at the 1.0% level


## Corellation Tests

*Tests for checking relation between samples.*

<div class="alert alert-block alert-warning"><b>Pearson’s Correlation Coefficient</b></div>

Assumptions:

- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample are normally distributed.
- Observations in each sample have the same variance.

Interpretation

- H0: the two samples are independent.
- H1: there is a dependency between the samples.

In [15]:
from scipy.stats import pearsonr
data1, data2 = [1, 2, 3,4,5,6], [2,4,6,8,10,12]
data3, data4 = [0,0,1,0,0], [1,2,3,4,5] #again I am ruining my data on purpose
data_col = [[data1,data2],[data3,data4]]

for data_A, data_B in data_col:
    stat, p = pearsonr(data_A, data_B)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably independent')
    else:
        print('Probably dependent')

stat=1.000, p=0.000
Probably dependent
stat=0.000, p=1.000
Probably independent


<div class="alert alert-block alert-warning"><b>Spearman’s Rank Correlation</b></div>

Assumptions:

- Observations in each sample are independent and identically distributed (iid)
- Observations in each sample can be ranked.

Interpretation:

- H0: the two samples are independent.
- H1: there is a dependency between the samples.

In [18]:
from scipy.stats import spearmanr
data1, data2 = [1, 2, 3,4,5,6], [2,4,6,8,10,12]
data3, data4 = [0,0,1,0,0], [1,2,3,4,5] #again I am ruining my data on purpose
data_col = [[data1,data2],[data3,data4]]

for data_A, data_B in data_col:
    stat, p = spearmanr(data_A, data_B)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably independent')
    else:
        print('Probably dependent')

stat=1.000, p=0.000
Probably dependent
stat=0.000, p=1.000
Probably independent


<div class="alert alert-block alert-warning"><b>Chi-Squared Test</b></div>

*Testing realtionship between categorical variables.*

Assumptions:

- Observations used in the calculation of the contingency table are independent (contingency table - just counting occurencies of events).
- 25 or more examples in each cell of the contingency table.

Interpretation:

- H0: the two samples are independent.
- H1: there is a dependency between the samples.

In [24]:
from scipy.stats import chi2_contingency
data1 = [[10, 20, 30],[6,  9,  17]]
data2 = [[0,0,0],[0,0,0]]
data_col = [data1,data2]

for data in data_col:
    stat, p, dof, expected = chi2_contingency(table)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably independent')
    else:
        print('Probably dependent')

stat=0.272, p=0.873
Probably independent
stat=0.272, p=0.873
Probably independent


*This one need some more explenation in the future :)* Chi-Square is calculated easily: (observed data - expected data)*2 / expected data.

<div class="alert alert-block alert-warning"><b>Stationary Tests</b></div>

*Tests for checking wheter time series is stationary or not. A stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc. are all constant over time. Image is self-explanatory.*

![Loss](tt1.png)

<div class="alert alert-block alert-warning"><b>Augmented Dickey-Fuller Unit Root Test</b></div>

*Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive.*

Assumptions:

- Observations in are temporally ordered.

Interpretation:

- H0: a unit root is present (series is non-stationary)
- H1: a unit root is not present (series is stationary).

In [30]:
from statsmodels.tsa.stattools import adfuller
data1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] #just a line
data2 = [1,2,1,2,1,2,1,2]  #and here we know for sure we have same mean and variance! :)
data_col = [data1,data2]

for data in data_col:
    stat, p, lags, obs, crit, t = adfuller(data)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably not Stationary')
    else:
        print('Probably Stationary')

stat=0.992, p=0.994
Probably not Stationary
stat=-1898884332263514.000, p=0.000
Probably Stationary


<div class="alert alert-block alert-warning"><b>Kwiatkowski-Phillips-Schmidt-Shin</b></div>

*Tests whether a time series is trend stationary or not.*

Assumptions:

- Observations in are temporally ordered.

Interpretation:

- H0: the time series is not trend-stationary.
- H1: the time series is trend-stationary.

In [31]:
from statsmodels.tsa.stattools import kpss
data1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] #just a line
data2 = [1,2,1,2,1,2,1,2]  #and here we know for sure we have same mean and variance! :)
data_col = [data1,data2]

for data in data_col:
    stat, p, lags, obs, crit, t = adfuller(data)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably not Stationary')
    else:
        print('Probably Stationary')

stat=0.992, p=0.994
Probably not Stationary
stat=-1898884332263514.000, p=0.000
Probably Stationary


## Parametric Statistical Hypothesis Tests

*Test for comparing our data samples.*

<div class="alert alert-block alert-warning"><b>Student’s t-test</b></div>

*Testing whether ther means of two independent samples are significantly different.*

Assumptions:

- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample are normally distributed.
- Observations in each sample have the same variance.

Interpretation:

- H0: the means of the samples are equal.
- H1: the means of the samples are unequal.

In [36]:
from scipy.stats import ttest_ind

data1 = [[10, 20, 30],[6,  9,  17]]
data2 = [[0,0,0],[5,8,12]]
data_col = [data1,data2]


for data1, data2 in data_col:
    stat, p = ttest_ind(data1, data2)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably the same distribution')
    else:
        print('Probably different distributions')

stat=1.405, p=0.233
Probably the same distribution
stat=-4.110, p=0.015
Probably different distributions


<div class="alert alert-block alert-warning"><b>Analysis of Variance Test (ANOVA)</b></div>

*Tests whether the means of two or more independent samples are significantly different.*

Assumptions:

- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample are normally distributed.
- Observations in each sample have the same variance.

Interpretation:

- H0: the means of the samples are equal.
- H1: one or more of the means of the samples are unequal.

In [37]:
from scipy.stats import f_oneway
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
stat, p = f_oneway(data1, data2, data3)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
    print('Probably the same distribution')
else:
    print('Probably different distributions')

stat=0.096, p=0.908
Probably the same distribution


## Nonparametric Statistical Hypothesis Tests

<div class="alert alert-block alert-warning"><b>Mann-Whitney U Test</b></div>

*Tests whether the distributions of two independent samples are equal or not.*

Assumptions:

- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample can be ranked.

Interpretation:

- H0: the distributions of both samples are equal.
- H1: the distributions of both samples are not equal.

In [39]:
from scipy.stats import mannwhitneyu

data1 = [[10, 20, 30],[6,  9,  17]]
data2 = [[0,0,0],[5,8,12]]
data_col = [data1,data2]

for data1, data2 in data_col:
    stat, p = mannwhitneyu(data1, data2)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably the same distribution')
    else:
        print('Probably different distributions')

stat=1.000, p=0.095
Probably the same distribution
stat=0.000, p=0.032
Probably different distributions


<div class="alert alert-block alert-warning"><b>Wilcoxon Signed-Rank Test</b></div>

*Tests whether the distributions of two paired samples are equal or not.*

Assumptions:

- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample can be ranked.
- Observations across each sample are paired.

Interpretation

- H0: the distributions of both samples are equal.
- H1: the distributions of both samples are not equal.

In [40]:
from scipy.stats import wilcoxon

data1 = [[10, 20, 30],[6,  9,  17]]
data2 = [[0,0,0],[5,8,12]]
data_col = [data1,data2]

for data1, data2 in data_col:
    stat, p = wilcoxon(data1, data2)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably the same distribution')
    else:
        print('Probably different distributions')

stat=0.000, p=0.250
Probably the same distribution
stat=0.000, p=0.250
Probably the same distribution


*Interesingly we have same data as in Mann-Whitney U test but different result.*

<div class="alert alert-block alert-warning"><b>Kruskal-Wallis H Test</b></div>

*Tests whether the distributions of two or more independent samples are equal or not.*

Assumptions:

- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample can be ranked.

Interpretation:

- H0: the distributions of all samples are equal.
- H1: the distributions of one or more samples are not equal.

In [41]:
from scipy.stats import kruskal

data1 = [[10, 20, 30],[6,  9,  17]]
data2 = [[0,0,0],[5,8,12]]
data_col = [data1,data2]

for data1, data2 in data_col:
    stat, p = kruskal(data1, data2)
    print('stat=%.3f, p=%.3f' % (stat, p))
    if p > 0.05:
        print('Probably the same distribution')
    else:
        print('Probably different distributions')

stat=2.333, p=0.127
Probably the same distribution
stat=4.355, p=0.037
Probably different distributions


<div class="alert alert-block alert-warning"><b>Friedman Test</b></div>

*Tests whether the distributions of two or more paired samples are equal or not.*

Assumptions:

- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample can be ranked.
- Observations across each sample are paired.

Interpretation:

- H0: the distributions of all samples are equal.
- H1: the distributions of one or more samples are not equal.

In [42]:
from scipy.stats import friedmanchisquare
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
stat, p = friedmanchisquare(data1, data2, data3)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
    print('Probably the same distribution')
else:
    print('Probably different distributions')

stat=0.800, p=0.670
Probably the same distribution


<div class="alert alert-block alert-warning"><b>Summary:</b></div>

- Normality Tests - checking whether data has Gaussian Distribution or not: Shapiro-Wilk Test, D’Agostino’s K^2 Test, Anderson-Darling Test
- Corellation Tests - checking whether two samples are somehow related to each other: Pearson’s Correlation Coefficient, Spearman’s Rank Correlation, Kendall’s Rank Correlation, Chi-Squared Test
- Stationary Tests - if time series is stationary or not: Augmented Dickey-Fuller Unit Root Test, Kwiatkowski-Phillips-Schmidt-Shin, 
- Parametric Statistical Hypothesis Test - test for comparing datasets and if they are different or not: Student’s t-test,Paired Student’s t-test, Analysis of Variance Test (ANOVA)
- Nonparametric Statistical Hypothesis Tests - testing whether independent samples are equal or not: Mann-Whitney U Test, Wilcoxon Signed-Rank Test, Kruskal-Wallis H Test, Friedman Test

*This is quite a good reference for statistical testing. Another thing is when to use which test and how to interpret it correctly :).*

***The End***