## Hypothesize tutorial

This notebook provides a few examples of how to use Hypothesize with a few common statistical designs. There are many more functions that could work for these designs but hopefully this helps to get you started.



In [0]:
!pip install hypothesize

In [0]:
from hypothesize.utilities import create_example_data

### How to compare two groups

#### Load data from a CSV or create some random data

In [3]:
#df=pd.read_csv("/home/allan/two_groups_data.csv")
df=create_example_data(design_values=2)

df.head()

Unnamed: 0,cell_1,cell_2
0,0.608798,0.582123
1,0.622826,0.854637
2,0.264165,0.655077
3,0.794185,0.37808
4,0.907687,0.468066


#### Import the desired function and pass in the data for each group
- This example uses the bootstrapped-t method with 20% trimmed means
- The output is a dictionary containing the results (95% confidence interval, p_value, test statistics, etc...)

In [4]:
from hypothesize.compare_groups_with_single_factor import yuenbt

results=yuenbt(df.cell_1, df.cell_2)

results['ci']

[-0.09190770159731171, 0.25635146839797]

---

### How to compare three groups

#### Load data from a CSV or create some random data

In [5]:
import pandas as pd

#df=pd.read_csv("/home/allan/one_way_data.csv")
df=create_example_data(design_values=3)

df.head()

Unnamed: 0,cell_1,cell_2,cell_3
0,0.265109,0.088914,0.480468
1,0.119988,0.482773,0.079476
2,0.109533,0.521834,0.762804
3,0.152454,0.177596,0.741767
4,0.355403,0.520991,0.380219


#### Import the desired functions and pass in the inputs
- One approach is to use a set of linear contrasts that will test all pairwise comparisons
- Then, the bootstrap-t method and the 20% trimmed mean can be used
- CIs are adjusted to control for FWE
- All pairwise contrasts can be created automatically using the `con1way` function
- The results are a dictionary of DataFrames that contain various statistics (p_value, CIs, standard error, test statistics, etc)

In [0]:
from hypothesize.compare_groups_with_single_factor import linconb
from hypothesize.utilities import con1way

results=linconb(df, con=con1way(3))

In [7]:
results['test']

Unnamed: 0,contrast_index,test,se,p_value
0,0.0,0.417745,0.081921,0.691152
1,1.0,-0.043381,0.085225,0.959933
2,2.0,-0.501332,0.075636,0.602671


In [8]:
results['psihat']

Unnamed: 0,contrast_index,psihat,ci_low,ci_up
0,0.0,0.034222,-0.168168,0.236612
1,1.0,-0.003697,-0.214251,0.206857
2,2.0,-0.037919,-0.224784,0.148946


---

### How to compare groups in a factorial design

#### Load data from a CSV or create some random data

In [9]:
import pandas as pd

#df=pd.read_csv("/home/allan/two_way_data.csv")
df=create_example_data(design_values=[2,3])

df.head()

Unnamed: 0,cell_1_1,cell_1_2,cell_1_3,cell_2_1,cell_2_2,cell_2_3
0,0.827524,0.476294,0.13172,0.410999,0.320306,0.370742
1,0.632281,0.588368,0.662648,0.242547,0.270292,0.700103
2,0.073064,0.472047,0.053942,0.069097,0.851596,0.962723
3,0.843377,0.095956,0.617434,0.765279,0.420772,0.993871
4,0.190709,0.013727,0.255385,0.577916,0.218277,0.125772


#### Import the desired function and pass in the data
- This example uses a 2-by-3 design
- One approach is to use a set of linear contrasts that will test all main effects and interactions
- Then, the bootstrap-t method and the 20% trimmed mean can be used
- The results are a dictionary of DataFrames that contain various statistics for each factor and the interactions

In [0]:
from hypothesize.compare_groups_with_two_factors import bwmcp

results=bwmcp(J=2, K=3, x=df)

In [11]:
results['factor_A']

Unnamed: 0,con_num,psihat,se,test,crit_value,p_value
0,0.0,0.173207,0.128072,1.352418,1.960025,0.15192


In [12]:
results['factor_B']

Unnamed: 0,con_num,psihat,se,test,crit_value,p_value
0,0.0,-0.067502,0.120091,-0.562091,2.494032,0.559265
1,1.0,0.039398,0.116328,0.33868,2.494032,0.721202
2,2.0,0.1069,0.098491,1.085373,2.494032,0.307179


In [13]:
results['factor_AB']

Unnamed: 0,con_num,psihat,se,test,crit_value,p_value
0,0.0,-0.183242,0.120091,-1.525869,2.3983,0.118531
1,1.0,-0.163525,0.116328,-1.40572,2.3983,0.186978
2,2.0,0.019718,0.098491,0.200196,2.3983,0.833055


---

### How to compute a robust correlation

#### Load data from a CSV or create some random data

In [14]:
import pandas as pd

#df=pd.read_csv("/home/allan/two_groups_data.csv")
df=create_example_data(design_values=2)

df.head()

Unnamed: 0,cell_1,cell_2
0,0.402284,0.049092
1,0.208278,0.550764
2,0.958482,0.986547
3,0.957759,0.277685
4,0.702811,0.749065


#### Import the desired function and pass in the data for each group
- One approach is to winsorize the x and y data
- A heteroscedastic method for testing zero correlation is also provided in this package but not shown here 
 - Please see the function `corb` which uses the percentile bootstrap to compute a 1-alpha CI and p_value for any correlation 
- The output is a dictionary containing various statistics (the winsorized correlation, winsorized covariance, etc...)

In [15]:
from hypothesize.measuring_associations import wincor

results=wincor(df.cell_1, df.cell_2)

results['cor']

0.2025744763450888