# Exercise 8: Hypothesis Testing 



## Table of Contents

* Parametric tests
    * one-sample t-test
    * paired t-test
    * two-sample t-test
    * ANOVA
* Testing parametric assumptions
    * Verifying assumptions
    * Modifying data 
* Non-parametric tests
    * Signed-rank tests
    * Bootstrapping and estimation plots

## Setup

In [2]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import scipy.stats as sps
import os
import statsmodels.formula.api as smf
import statsmodels.api as sm

# For retina displays only 
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
%matplotlib inline

## Parametric tests

### one-sample t-test

In [37]:
# Generating data 

Y = np.random.normal(loc=3, scale=2, size=100)

In [39]:
# Excersise: What do you expect for pvalue if popmean = 3? What happens as you move away from 3? 
mu = 2.5
sps.ttest_1samp(Y, popmean=mu)

Ttest_1sampResult(statistic=2.8094140624415154, pvalue=0.005981223770258623)

In [40]:
# Exersise: find p value by using statsmodels and thinking about the test as a model
df_one_samp = pd.DataFrame(data = {'Y': Y - mu})
one_samp_model = smf.ols('Y ~ 1', data=df_one_samp)
one_samp_results = one_samp_model.fit()
one_samp_results.summary()

0,1,2,3
Dep. Variable:,Y,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,
Date:,"Fri, 30 Oct 2020",Prob (F-statistic):,
Time:,12:08:40,Log-Likelihood:,-214.46
No. Observations:,100,AIC:,430.9
Df Residuals:,99,BIC:,433.5
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.5834,0.208,2.809,0.006,0.171,0.995

0,1,2,3
Omnibus:,0.774,Durbin-Watson:,1.925
Prob(Omnibus):,0.679,Jarque-Bera (JB):,0.815
Skew:,-0.024,Prob(JB):,0.665
Kurtosis:,2.56,Cond. No.,1.0


### Paired t-test

In [42]:
# Generating data 
N = 100
X1 = np.random.normal(loc=0, scale=2, size=N)
X2 = X1 + np.random.normal(loc=2, scale=1, size=N)


In [43]:
sps.ttest_rel(X1, X2)

Ttest_relResult(statistic=-20.602944036083002, pvalue=1.3992129116501313e-37)

In [None]:
# Exercise: Use ttest_1samp amd show that you get the same result if you do one-way t-test of the difference






### Two-sample t-test 

In [51]:
# Generating data 
N = 100
effect_size = 1 
std_dev = 2
X1 = np.random.normal(loc=0, scale=std_dev, size=N)
X2 = np.random.normal(loc=effect_size, scale=std_dev, size=N)

In [52]:
sps.ttest_ind(X1, X2)

Ttest_indResult(statistic=-2.6370898479597686, pvalue=0.009025793791669342)

In [55]:
# Exersise: find p value by using statsmodels and thinking about the test as a model
df_two_samp = pd.DataFrame(data={'X1': X1, 'X2': X2, 'Y': X2 - X1})
two_samp_model = smf.ols('Y ~ 1', data=df_two_samp)
two_samp_results = two_samp_model.fit()
two_samp_results.summary()

0,1,2,3
Dep. Variable:,Y,R-squared:,-0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,
Date:,"Fri, 30 Oct 2020",Prob (F-statistic):,
Time:,12:25:10,Log-Likelihood:,-240.62
No. Observations:,100,AIC:,483.2
Df Residuals:,99,BIC:,485.8
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.7591,0.270,2.814,0.006,0.224,1.294

0,1,2,3
Omnibus:,2.78,Durbin-Watson:,1.899
Prob(Omnibus):,0.249,Jarque-Bera (JB):,2.778
Skew:,0.389,Prob(JB):,0.249
Kurtosis:,2.749,Cond. No.,1.0


### ANOVA

In [None]:
# Generating data 
N = 100
effect_size_2 = 1
effect_size_3 = -3
std_dev = 2
X1 = np.random.normal(loc=0, scale=std_dev, size=N)
X2 = np.random.normal(loc=effect_size_2, scale=std_dev, size=N)
X3 = np.random.normal(loc=effect_size_3, scale=std_dev, size=N)

In [None]:
sps.f_oneway(X1, X2, X3)

In [None]:
# Exersise (tougher): find p value by using statsmodels and thinking about the test as a model



## Testing non-parametic assumptions

In [None]:
# Exercise: Generate some non-gaussian data, make sure you know how to specifcy the mean of the 
# population and your sample size










In [None]:
# Exercise: Test it ASSUMING it meets assumptions of parametric tests. Do you get the correct result?
# 1. Play around with sample size, variances, etc. 
# 2. If you are feeling really into it, draw a graph showing what happens as one population gets 
# further and further from normal. 







In [None]:
# Exercise: Look up one of the following and apply it to your data: 
# 
# Data are normally distributed
#    D’Agostino-Pearson
#    Shapiro-Wilk
#    Kolmogorov-Smirnov
#    Lilliefors Test
# Equal variance between groups
#    Levene’s Test





## Non-parametric tests

### Signed-rank tests

### Bootstrapping and estimation plots