### How to compare two groups

#### Load data from a CSV

In [1]:
import pandas as pd

df=pd.read_csv("/home/allan/two_groups_data.csv")

df.head()

Unnamed: 0,Group_1,Group_2
0,0.044652,0.90675
1,0.763458,0.291555
2,0.71039,0.59828
3,0.175208,0.268073
4,0.957819,0.222688


#### Import the desired function and pass in the data for each group
- This example uses the bootstrapped-t method with 20% trimmed means
- The output is a dictionary containing the results (95% confidence interval, p_value, test statistics, etc...)

In [2]:
from hypothesize.compare_groups_with_single_factor import yuenbt

results=yuenbt(df.Group_1, df.Group_2)

print(results['ci'])

[-0.32294783884298334, 0.11774331261500753]


---

### How to compare three groups

#### Load data from a CSV

In [3]:
import pandas as pd

df=pd.read_csv("/home/allan/one_way_data.csv")

df.head()

Unnamed: 0,Group_1,Group_2,Group_3
0,0.044652,0.90675,0.795696
1,0.763458,0.291555,0.84158
2,0.71039,0.59828,0.110407
3,0.175208,0.268073,0.888728
4,0.957819,0.222688,0.834161


#### Import the desired functions and pass in the inputs
- One appraoch is to use a set of linear contrasts that will test all pairwise comparisons
- Then, the bootstrap-t method and the 20% trimmed mean can be used
- CIs are adjusted to control for FWE
- All pairwise contrasts can be created automatically using the `con1way` function
- The results are a dictionary of DataFrames that contain various statistics (p_value, CIs, standard error, test statistics, etc)

In [4]:
from hypothesize.compare_groups_with_single_factor import linconb
from hypothesize.utilities import con1way

results=linconb(df, con=con1way(3))

In [5]:
results['test']

Unnamed: 0,contrast_index,test,se,p_value
0,0.0,-0.999892,0.102613,0.33389
1,1.0,-0.65811,0.09961,0.522538
2,2.0,0.362839,0.102106,0.709516


In [6]:
results['psihat']

Unnamed: 0,contrast_index,psihat,ci_low,ci_up
0,0.0,-0.102602,-0.36885,0.163646
1,1.0,-0.065554,-0.324009,0.1929
2,2.0,0.037048,-0.227883,0.301979


---

### How to compare groups in a factorial design

#### Load data from a CSV

In [7]:
import pandas as pd

df=pd.read_csv("/home/allan/two_way_data.csv")

df.head()

Unnamed: 0,cell_1_1,cell_1_2,cell_1_3,cell_2_1,cell_2_2,cell_2_3
0,0.044652,0.90675,0.795696,0.519486,0.333636,0.232153
1,0.763458,0.291555,0.84158,0.033989,0.511235,0.732503
2,0.71039,0.59828,0.110407,0.898072,0.769496,0.048401
3,0.175208,0.268073,0.888728,0.287442,0.100153,0.210394
4,0.957819,0.222688,0.834161,0.599158,0.655308,0.203486


#### Import the desired function and pass in the data
- This example uses a 2-by-3 design
- One approach is to use a set of linear contrasts that will test all main effects and interactions
- Then, the bootstrap-t method and the 20% trimmed mean can be used
- CIs are adjusted to control for FWE for each family of tests (factor A, factor B, and the interactions)
- All pairwise contrasts are created internally using the `con2way` function
- The results are a dictionary of DataFrames that contain various statistics for each factor and the interactions

In [8]:
from hypothesize.compare_groups_with_two_factors import bwmcp

results=bwmcp(J=2, K=3, x=df)

In [9]:
results['factor_A']

Unnamed: 0,con_num,psihat,se,test,crit_value,p_value
0,0.0,0.039358,0.169849,0.231726,3.458663,0.924875


In [10]:
results['factor_B']

Unnamed: 0,con_num,psihat,se,test,crit_value,p_value
0,0.0,-0.104506,0.126135,-0.828529,2.613274,0.400668
1,1.0,-0.093136,0.151841,-0.613382,2.613274,0.540902
2,2.0,0.01137,0.135392,0.083978,2.613274,0.943239


In [11]:
results['factor_AB']

Unnamed: 0,con_num,psihat,se,test,crit_value,p_value
0,0.0,-0.100698,0.126135,-0.798336,2.412767,0.470785
1,1.0,-0.037972,0.151841,-0.250078,2.412767,0.759599
2,2.0,0.062726,0.135392,0.463291,2.412767,0.612688


---

### How to compute a robust correlation

#### Load data from a CSV

In [12]:
import pandas as pd

df=pd.read_csv("/home/allan/two_groups_data.csv")

df.head()

Unnamed: 0,Group_1,Group_2
0,0.044652,0.90675
1,0.763458,0.291555
2,0.71039,0.59828
3,0.175208,0.268073
4,0.957819,0.222688


#### Import the desired function and pass in the data for each group
- One approach is to winsorize the x and y data
- A heteroscedastic method for testing zero correlation is also provided in this package but not shown here 
 - Please see the function `corb` which uses the percentile bootstrap to compute a 1-alpha CI and p_value for any correlation 
- The output is a dictionary containing various statistics (the winsorized correlation, winsorized covariance, etc...)

In [13]:
from hypothesize.measuring_associations import wincor

results=wincor(df.Group_1, df.Group_2)

results['wcor']

-0.05690314435050796