# Hypothesis testing in Python

blog : https://chiricutosnpython.blogspot.com/2020/09/hypothesis-testing-scipy.html

> All these are based on lecture notes from Danis Harman (Washington Univ.)
> & Objective Analysis class organized by Jinho Yoon (GIST)

Before we move on, the data here we use is Nino34 and SOI data.
For more details, please refer below links

https://www.ncdc.noaa.gov/teleconnections/enso/indicators/sst/
<br>
https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/

In [1]:
# libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# load data
# https://psl.noaa.gov/data/correlation/nina34.data
# https://psl.noaa.gov/data/correlation/soi.data

nino34 = pd.read_csv('./data/nino34.csv',
        names=['year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
        index_col='year')
soi    = pd.read_csv('./data/soi.csv',
        names=['year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
        index_col='year')

# remove NaN values 
# there are acutally many other ways to deal with NaN values. 
# here, we simply remove all the NaN values

nino34[nino34 == -99.99], soi[soi == -99.99] = np.nan, np.nan
nino34, soi = nino34.dropna().reset_index(drop=True), soi.dropna().reset_index(drop=True)

## 0. Hypothesis testing

- In using statistical significance test, there are five basic steps that should be followed in order.
    
    1. State the significance level
    
    2. State the null hypothesis ${H_0}$ and its alternative ${H_1}$
    
    3. State the static used
    
    4. State the critical region
    
    5. Evaluate the statistic and state the conclusion.

<br>    
<br>
- For example, in a sample of 10 winters, the mean January temperature is 42 degF and the standard deviation is 5 degF. What are the 95% confidence limits on the true mean January temperature?
<br><br>   
1. Desired confidence level is 95%.
<br>
2. ${H_0}$: the true mean is between 42 ${\pm}$ ${\Delta}$T
   <br>&emsp;${H_1}$: it is outside this region
<br>
3. We will use ${t}$ statistic.
<br>
4.The critical region is ${|t| < t_{0.025}}$, which is for ${n=N-1=9}$ is ${|t|<2.26}$. Stated in terms of confirence limits on
<br>&emsp;the mean we have 
$${\bar{x}-2.26\frac{s}{\sqrt{N-1}} < u < \bar{x}+2.26\frac{s}{\sqrt{N-1}}}$$<br>
5. Putting in the numbers we get ${38.23 < u < 45.77}$. We have 95% certainty that the true mean lies between these <br>&emsp;values.    
<br><br>
 ________________________________________________________________________________________________

## 1. Normality Tests

- It is known that many geophysical variables are approximately normally distributed. However, an appropriate statistical method can be different for the test depending on whether the data follows a normal distribution or not. Thus, it should be preceded to check normality in the data.

<br>
<br>
There are 3 normality test methods to check normality.
<br><br>
1. ${Shapiro-Wilk}$ ${Test}$
<br>
2. ${D'Agostino's}$ ${K^2}$ ${Test}$
<br>
3. ${Anderson-Darling}$ ${Test}$
<br><br>   
    
- Assummptions
    * All of them assumes that samples in data are **I**ndependent and **I**dentically **D**istributed (IID).
<br><br>
- Hypothesis
    * ${H_0}$: the data follows normal distribution
    * ${H_1}$: the data does not follow normal distribution


**!!! [Notice] !!!**

- **shapiro** works well with multi-dimensional data, like pandas dataframe. But, **normaltest (d'agostinos K^2 test)** and **anderson** requires 1-dimension data format for the entire data.
<br>
- Plus, **anderson** returns critical values up to significance level, while the others returns p-value.

In [2]:
from scipy.stats import shapiro, normaltest, anderson

stat_shapiro,  p_shapiro  = shapiro(nino34)
stat_agostino, p_agostino = normaltest(nino34.values.ravel())
results_anderson = anderson(nino34.values.ravel())

print(f"nino34's p-value from Shapiro-Wilk test : {p_shapiro:.3f}")
print(f"nino34's p-value from D'Agostino's K^2 test : {p_agostino:.3f}")
print(f"from Anderson-Darling test, nino34's statistic is {results_anderson[0]:.3f} \
and critical value for 99% is {results_anderson[1][4]:.3f}")

nino34's p-value from Shapiro-Wilk test : 0.013
nino34's p-value from D'Agostino's K^2 test : 0.011
from Anderson-Darling test, nino34's statistic is 0.758 and critical value for 99% is 1.087


In [3]:
stat_shapiro,  p_shapiro  = shapiro(soi)
stat_agostino, p_agostino = normaltest(soi.values.ravel())
results_anderson = anderson(soi.values.ravel())

print(f"soi's p-value from Shapiro-Wilk test : {p_shapiro:.3f}")
print(f"soi's p-value from D'Agostino's K^2 test : {p_agostino:.3f}")
print(f"from Anderson-Darling test, soi's statistic is {results_anderson[0]:.3f} \
and critical value for 99% is {results_anderson[1][4]:.3f}")

soi's p-value from Shapiro-Wilk test : 0.003
soi's p-value from D'Agostino's K^2 test : 0.002
from Anderson-Darling test, soi's statistic is 0.781 and critical value for 99% is 1.087


- The testing results show that nino34's p-values in the two tests are greater than 0.01, while the sois' are smaller. 
- In other words, it may not certain that nino34 follows normal distribution in 99% confidence level, but soi does. 
- If data does not have normal distribution, **non-parametric** statistical hypothesis tests should be preceeded. 
- Thus, for practice, we seperate soi data for **parametric** statistical hypothesis tests, and nino34 for **non-parametric** statistical hypothesis test.
    
 <br>
 ________________________________________________________________________________________________

## 2. Parametric Statistical Hypothesis Tests

- Various methods can be applied up to hypothesis to be analyzed. Here, we introduce 3 most common methods. 
    <br><br>
    1. ${Student's}$ ${t-test}$
    ; Tests whether the means of two independent samples are significantly different.
    
        - Assummptions
            * Each sample in data is independent and identically distributted. (IID)
            * Each sample in data is normally distributed.
            * Each sample in data has same variance.

        - Hypothesis

            * ${H_0}$: the means of the samples are equal.
            * ${H_1}$: the means of the samples are not equal.
    <br><br><br>    
    2. ${Paired}$ ${Student's}$ ${t-test}$
    ; Tests whether the means of two paired samples are significantly different.
    
        - Assummptions
            * Each sample in data is independent and identically distributted. (IID)
            * Each sample in data is normally distributed.
            * Each sample in data has same variance.
            * Each sample in data are paired.
       
        - Hypothesis

            * ${H_0}$: the means of the samples are equal.
            * ${H_1}$: the means of the samples are not equal.
    <br><br><br>    
    3. ${Analysis}$ ${of}$ ${Variance}$ ${Test}$ ${(ANOVA)}$
    ; Tests whether the means of two or more independent samples are significantly different.

        - Assummptions
            * Each sample in data is independent and identically distributted. (IID)
            * Each sample in data is normally distributed.
            * Each sample in data has same variance.
       
        - Hypothesis

            * ${H_0}$: the means of the samples are equal.
            * ${H_1}$: the means of the samples are not equal.
    <br><br><br>

In [4]:
from scipy.stats import ttest_ind, ttest_rel, f_oneway

stat_student, p_student = ttest_ind(soi['Mar'], soi['Apr'])
stat_paired,  p_paired  = ttest_rel(soi['Mar'], soi['Sep'])
stat_anova,   p_anova   = f_oneway(soi['Mar'], soi['Jun'], soi['Sep'])

print(f"soi's p-value from student t test : {p_student:.3f}")
print(f"soi's p-value from paired student t test : {p_paired:.3f}")
print(f"soi's p-value from anova test : {p_anova:.3f}")

soi's p-value from student t test : 0.248
soi's p-value from paired student t test : 0.033
soi's p-value from anova test : 0.109


## 3. Non-Parametric Statistical Hypothesis Tests

- Here, we introduce 4 different methods. 
    <br><br>
    1. ${Mann-Whitney}$ ${U}$ ${test}$
    ; Tests whether the distributions of two independent samples are equal or not.
    
        - Assummptions
            * Each sample in data is independent and identically distributted. (IID)
            * Each sample in data can be ranked.

        - Hypothesis
            * ${H_0}$: the distributions of the samples are equal.
            * ${H_1}$: the distributions of the samples are not equal.
    <br><br><br>      
    2. ${Wilcoxon}$ ${Signed-Rank}$ ${test}$
    ; Tests whether the distributions of two paired samples are equal or not.
    
        - Assummptions
            * Each sample in data is independent and identically distributted. (IID)
            * Each sample in data can be ranked.
            * Each sample in data are paired.
       
        - Hypothesis
            * ${H_0}$: the distributions of the samples are equal.
            * ${H_1}$: the distributions of the samples are not equal.
    <br><br><br>     
    3. ${Kruskal-Wallis}$ ${H}$ ${Test}$
    ; Tests whether the distributions of two or more independent samples are equal or not.
    
        - Assummptions
            * Each sample in data is independent and identically distributted. (IID)
            * Each sample in data can be ranked.
       
        - Hypothesis
            * ${H_0}$: the distributions of all samples are equal.
            * ${H_1}$: the distributions of one or more samples are not equal.
    <br><br><br>   
    4. ${Friedman}$ ${Test}$
    ; Tests whether the distributions of two or more paired samples are equal or not.
    
        - Assummptions
            * Each sample in data is independent and identically distributted. (IID)
            * Each sample in data can be ranked.
            * Each sample in data are paired.
       
        - Hypothesis
            * ${H_0}$: the distributions of all samples are equal.
            * ${H_1}$: the distributions of one or more samples are not equal.
    <br><br><br>        

In [5]:
from scipy.stats import mannwhitneyu, wilcoxon, kruskal, friedmanchisquare

stat_mann, p_mann = mannwhitneyu(nino34['Jan'], nino34['Feb'])
stat_wilc, p_wilc = wilcoxon(nino34['Feb'], nino34['Mar'])
stat_krus, p_krus = kruskal(nino34['Jan'], nino34['Feb'], nino34['Mar'])
stat_frie, p_frie = friedmanchisquare(nino34['Jan'], nino34['Feb'], nino34['Mar'])

print(f"nino34's p-value from mann-whitney test : {p_mann:.3f}")
print(f"nino34's p-value from wilcoxon signed-rank test : {p_wilc:.3f}")
print(f"nino34's p-value from kruskal-wallis H test : {p_krus:.3f}")
print(f"nino34's p-value from Friedman test : {p_frie:.3f}")

nino34's p-value from mann-whitney test : 0.106
nino34's p-value from wilcoxon signed-rank test : 0.000
nino34's p-value from kruskal-wallis H test : 0.000
nino34's p-value from Friedman test : 0.000
