
# 🗓Section 15: Hypothesis A/B Testing

- online-ds-pt-100719
- 01/23/20



# Topics 
 

- Workflow: choosing the correct hypothesis test.
    - T-Test & ANOVA Assumptions
    - Parametric vs Non-Parametric Tests

- Hands-On Hypothesis Testing

# Statistical Power

## QUESTIONS/INSIGHTS?

- What is power? 

- Tips/tricks succinctly articulate hypotheses.

- When to use what test and what assumptions do those tests have?

- Why do 1-sample t-tests only need an n=20 to avoid assumption of normality?


## Resources
**Overivews/Cheatsheets**
- [CodeAcademy Hypothesis Testing Slideshow](https://drive.google.com/open?id=1p4R2KCErq_iUO-wnfDrGPukTgQDBNoc7)
- [Cheatsheet: Hypothesis Testing with Scipy](https://drive.google.com/open?id=1EY4UCg20HawWlWa50M2tFauoKBQcFFAW)


- [Choosing Between Parametric and Non-Parametric Tests](https://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-test)

**Trustable Stat References**:
- [Graphpad Prism's Stat Guide](https://www.graphpad.com/guides/prism/8/statistics/index.htm)
- [LAERD Statistics Test Selector](https://statistics.laerd.com/premium/sts/index.php)


# Choosing the Correct Hypothesis Test

## STEP 1: Determine the category/type of test based on your data.

### Q1: What type of data do I have (Numeric or categorical?)

### Q2: How many samples/groups am I comparing?

- Using the answers to the above 2 questions: select the type of test from this table.

| What type of comparison? | Numeric Data | Categorical Data|
| --- | --- | --- |
|Sample vs Known Quantity/Target|1 Sample T-Test| Binomial Test|
|2 Samples | 2 Sample T-Test| Chi-Square|
|More than 2| ANOVA and/or Tukey | Chi Square|

## STEP 2:  Do we meet the assumptions of the chosen test?

### TEST ASSUMPTIONS SUMMARY


> CORRECTION 01/26/20:<br>I was incorrect to say that independent 2-sample t-tests do not require outlier removal. The table below has been corrected.


- [One-Sample T-Test](https://statistics.laerd.com/spss-tutorials/one-sample-t-test-using-spss-statistics.php)
    - No significant outliers
    - Normality

- [Independent t-test (2-sample)](https://statistics.laerd.com/statistical-guides/independent-t-test-statistical-guide.php)
    - No significant outliers
    - Normality
    - Equal Variance

- [One Way ANOVA](https://statistics.laerd.com/spss-tutorials/one-way-anova-using-spss-statistics.php)
    - No significant outliers
    - Equal variance
    - Normality

- [Chi-Square test](https://statistics.laerd.com/spss-tutorials/chi-square-test-for-association-using-spss-statistics.php)
    - Both variables are categorical

### Testing Assumptions of Normality & Equal Variance

1. **Test for Normality**
    - D'Agostino-Pearson's normality test<br>
    ```scipy.stats.normaltest```
    - Shapiro-Wilik Test<br>
    ```scipy.stats.shapiro```<br>
    
    
2. **If you pass the assumption of normality: test for equal variance**
     - Levene's Test<br>
    ```scipy.stats.levene```
    - **If you fail the assumption of equal variance use Welch's T-Test.** (for scipy, add `equal_var=False` to `ttest_ind`)
    
    
    
3. **If you DON'T have normal data: are your group sizes big enough to ignore normality assumption?**

    - If your group N's are sufficiently large (as defined by table below). Can disregard normality assumption and continue with chosen test.


| Parametric analyses| Sample size guidelines for nonnormal data| 
| --- | --- |
| 1-sample t test| Greater than 20|
| 2-sample t test| Each group should be greater than 15| 
| One-Way ANOVA|If have 2-9 groups, each group n >= 15. <br>If have 10-12 groups, each group n>20.|


- **If you group N's are NOT large enough, select the non-parametric version of your test from the table below:**

##### Parametric  T-Tests vs Non-Parametric Alternatives 
- [Choosing Between Parametric and Non-Parametric Tests](https://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-test)
 





| Parametric tests (means) | Nonparametric tests (medians) |
 | --- | --- |
 | 1-sample t test | 1-sample Wilcoxon |
 | 2-sample t test | Mann-Whitney U test |
 | One-Way ANOVA | Kruskal-Wallis |
 
 

### STEP 3: Interpret Result & Post-Hoc Tests

- If p value is < $\alpha$:
    - Calculate effect size (e.g. Cohen's $d$)
    - If you have multiple groups (i.e. ANOVA) you must **run a pairwise Tukey's test to know which groups were different.**
- [Tukey pairwise comparison test](https://www.statsmodels.org/stable/generated/statsmodels.stats.multicomp.pairwise_tukeyhsd.html)
    - `statsmodels.stats.multicomp.pairwise_tukeyhsd`


#  **Stating our Hypothesis:**

> **What question are you attempting to answer?**


- $H_1$ : 

- $H_0$ :

<br>

## HYPOTHESIS TESTING STEPS

- Separate data in group vars.
- Visualize data and calculate group n (size)

    
* Select the appropriate test based on type of comparison being made, the number of groups, the type of data.


- For t-tests: test for the assumptions of normality and homogeneity of variance.

    1. Check if sample sizes allow us to ignore assumptions, and if not:
    2. **Test Assumption Normality**

    3. **Test for Homogeneity of Variance**

    4. **Choose appropriate test based upon the above** 
    
    
* **Perform chosen statistical test, calculate effect size, and any post-hoc tests.**
    - To perform post-hoc pairwise comparison testing
    - Effect size calculation
        - Cohen's d

# Statistical Tests Summary Table



| Parametric tests (means) | Function | Nonparametric tests (medians) | Function |
 | --- | --- | --- | --- |
 | 1-sample t test |`scipy.stats.ttest_1samp()`|  1-sample Wilcoxon |`scipy.stats.wilcoxon`|
 | 2-sample t test |`scipy.stats.ttest_ind()` | Mann-Whitney U test |`scipy.stats.mannwhitneyu()` |
 | One-Way ANOVA | `scipy.stats.f_oneway()` | Kruskal-Wallis | `scipy.stats.kruskal` | 
 
 
 | Factorial DOE with one factor and one blocking variable |Friedman test  |


# Hypothesis Testing: Mouse Data

## Hypothesis
> Question: does stimulation of CRF Neurons in the central amygdala increase alcohol consumption?

- Metric: licks for alcohol
- Two groups: Control and Experimental (ChR2)


- $H_1$: There is a significant difference in average licks for alcohol between control and experimental/stimulated mice.

- $H_0$: There is no significant difference in licks for alcohol between control and experimental/stimulated mice.

$\alpha$=0.05


### Step 1: which type of test?

- What type of data?
    - Numerical
- How many groups?
    - 2 groups

In [1]:
from IPython.display import HTML
HTML('<img src="https://raw.githubusercontent.com/jirvingphd/fsds_100719_cohort_notes/master/images/sect_20_neuro_data.png">')

## Obtaining/Preprocessing Data

In [2]:
!pip install -U fsds_100719
from fsds_100719.imports import *
dp.clear_output()

In [3]:
plt.style.use('seaborn-notebook')

In [4]:
df = pd.read_csv('datasets/neuro_drinking_data.csv')#"/Users/jamesirving/Datasets/opto_DID_drinking_data.xlsx",
                   #sheet_name='lick_data')
df = df.drop(columns=['Batch'])

pd.set_option('display.max_columns',0)
pd.set_option('display.precision',3)
df.head()

Unnamed: 0,Mouse ID,Group,Sex,BL1,BL2,BL3,BL4,S1,S2,S3,S4,PS1,PS2,PS3,PS4,R1,R2,R3,R4,R5,R6,R7,R8
0,Con 4,Control,F,665,863,631,629,583,801,723,707,732,680,684,485,65,301,351,441,675,554,541,545
1,Con 5,Control,F,859,849,685,731,854,1103,645,633,733,662,605,623,128,268,462,569,988,728,933,564
2,Con 6,Control,F,589,507,635,902,699,743,761,949,872,952,828,806,129,311,669,666,516,579,913,736
3,CON 2.1,Control,M,939,909,850,756,807,617,526,736,743,625,690,759,281,357,386,585,565,550,806,732
4,ChR2 2.2,ChR2,F,710,505,494,596,620,589,676,537,779,537,581,515,477,659,737,606,713,682,709,759


#### Laying Out Our Approach

1. Make a **dict/lists of the column names** that should be **averaged together** (`col_dict`)

2. Make a new df of means using `col_dict`

3. Make a grp dict using  `df_means.groupby('Group').groups` 

- Visualize the two populations

- Prepare for hypothesis tests
    - Either use `grps` dict to reference the correct columsn to pass into tests

<!---
**Variables:**

- `col_dict` (dict): dict of column names to be grouped together for means
- `df_means` (df): df of col_dict column means.
- `grps` (dict): groupby dict where keys = 'Group' column and values = row indices

- `data` (dict): Dictionary of...
    - Series of each phase by group? --->

In [5]:
df['Group'].unique()

array(['Control', 'ChR2'], dtype=object)

In [6]:
## Rename groups
group_mapper = {'ChR2':'Experimental'}
df['Group'] = df['Group'].replace(group_mapper)
df.head()

Unnamed: 0,Mouse ID,Group,Sex,BL1,BL2,BL3,BL4,S1,S2,S3,S4,PS1,PS2,PS3,PS4,R1,R2,R3,R4,R5,R6,R7,R8
0,Con 4,Control,F,665,863,631,629,583,801,723,707,732,680,684,485,65,301,351,441,675,554,541,545
1,Con 5,Control,F,859,849,685,731,854,1103,645,633,733,662,605,623,128,268,462,569,988,728,933,564
2,Con 6,Control,F,589,507,635,902,699,743,761,949,872,952,828,806,129,311,669,666,516,579,913,736
3,CON 2.1,Control,M,939,909,850,756,807,617,526,736,743,625,690,759,281,357,386,585,565,550,806,732
4,ChR2 2.2,Experimental,F,710,505,494,596,620,589,676,537,779,537,581,515,477,659,737,606,713,682,709,759


In [7]:
## Remake Mouse Index 
mouse_ids = list(df['Mouse ID'].unique())
new_ids = [f"Mouse{i}" for i in range(1,len(mouse_ids)+1)]
new_ids

['Mouse1',
 'Mouse2',
 'Mouse3',
 'Mouse4',
 'Mouse5',
 'Mouse6',
 'Mouse7',
 'Mouse8',
 'Mouse9',
 'Mouse10',
 'Mouse11',
 'Mouse12',
 'Mouse13',
 'Mouse14',
 'Mouse15',
 'Mouse16',
 'Mouse17',
 'Mouse18',
 'Mouse19',
 'Mouse20',
 'Mouse21',
 'Mouse22']

In [8]:
df['Mouse ID'] = df['Mouse ID'].map(dict(zip(mouse_ids,new_ids)))
df.head()

Unnamed: 0,Mouse ID,Group,Sex,BL1,BL2,BL3,BL4,S1,S2,S3,S4,PS1,PS2,PS3,PS4,R1,R2,R3,R4,R5,R6,R7,R8
0,Mouse1,Control,F,665,863,631,629,583,801,723,707,732,680,684,485,65,301,351,441,675,554,541,545
1,Mouse2,Control,F,859,849,685,731,854,1103,645,633,733,662,605,623,128,268,462,569,988,728,933,564
2,Mouse3,Control,F,589,507,635,902,699,743,761,949,872,952,828,806,129,311,669,666,516,579,913,736
3,Mouse4,Control,M,939,909,850,756,807,617,526,736,743,625,690,759,281,357,386,585,565,550,806,732
4,Mouse5,Experimental,F,710,505,494,596,620,589,676,537,779,537,581,515,477,659,737,606,713,682,709,759


In [9]:
## Get a dict of which cols belong to which phase
col_dict = {}
phases = ['BL','S','R','PS']
for phase in phases:
    col_dict[phase] = [col for col in df.drop('Sex',axis=1).columns if phase in col]
col_dict

{'BL': ['BL1', 'BL2', 'BL3', 'BL4'],
 'S': ['S1', 'S2', 'S3', 'S4', 'PS1', 'PS2', 'PS3', 'PS4'],
 'R': ['R1', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8'],
 'PS': ['PS1', 'PS2', 'PS3', 'PS4']}

In [10]:
## Use col_dict to calculate each mouse's mean per phase
mean_to_df = {}
for k, cols in col_dict.items():
    mean_to_df[k] = df[cols].mean(axis=1)
df_means = pd.concat(mean_to_df,axis=1)
df_means

Unnamed: 0,BL,S,R,PS
0,697.0,674.375,434.125,645.25
1,781.0,732.25,580.0,655.75
2,658.25,826.25,564.875,864.5
3,863.5,687.875,532.75,704.25
4,576.25,604.25,667.75,603.0
5,639.25,565.875,813.125,618.5
6,795.75,730.5,727.5,645.75
7,559.5,524.625,148.0,525.25
8,706.25,621.625,389.5,547.75
9,806.5,901.875,667.375,890.0


In [11]:
len(df_means),len(df)

(22, 22)

In [12]:
df_means = pd.concat([df[['Mouse ID','Group','Sex']],
                      df_means],axis=1)
df_means

Unnamed: 0,Mouse ID,Group,Sex,BL,S,R,PS
0,Mouse1,Control,F,697.0,674.375,434.125,645.25
1,Mouse2,Control,F,781.0,732.25,580.0,655.75
2,Mouse3,Control,F,658.25,826.25,564.875,864.5
3,Mouse4,Control,M,863.5,687.875,532.75,704.25
4,Mouse5,Experimental,F,576.25,604.25,667.75,603.0
5,Mouse6,Experimental,F,639.25,565.875,813.125,618.5
6,Mouse7,Experimental,M,795.75,730.5,727.5,645.75
7,Mouse8,Experimental,M,559.5,524.625,148.0,525.25
8,Mouse9,Experimental,M,706.25,621.625,389.5,547.75
9,Mouse10,Experimental,M,806.5,901.875,667.375,890.0


### Getting Group Data For EDA & Testing

In [13]:
df_means.head()

Unnamed: 0,Mouse ID,Group,Sex,BL,S,R,PS
0,Mouse1,Control,F,697.0,674.375,434.125,645.25
1,Mouse2,Control,F,781.0,732.25,580.0,655.75
2,Mouse3,Control,F,658.25,826.25,564.875,864.5
3,Mouse4,Control,M,863.5,687.875,532.75,704.25
4,Mouse5,Experimental,F,576.25,604.25,667.75,603.0


In [14]:
df_means.to_csv('datasets/mouse_data_processed.csv',index=False)

In [15]:
df.head()

Unnamed: 0,Mouse ID,Group,Sex,BL1,BL2,BL3,BL4,S1,S2,S3,S4,PS1,PS2,PS3,PS4,R1,R2,R3,R4,R5,R6,R7,R8
0,Mouse1,Control,F,665,863,631,629,583,801,723,707,732,680,684,485,65,301,351,441,675,554,541,545
1,Mouse2,Control,F,859,849,685,731,854,1103,645,633,733,662,605,623,128,268,462,569,988,728,933,564
2,Mouse3,Control,F,589,507,635,902,699,743,761,949,872,952,828,806,129,311,669,666,516,579,913,736
3,Mouse4,Control,M,939,909,850,756,807,617,526,736,743,625,690,759,281,357,386,585,565,550,806,732
4,Mouse5,Experimental,F,710,505,494,596,620,589,676,537,779,537,581,515,477,659,737,606,713,682,709,759


In [16]:
phase_dict = {}
for phase,colnames in col_dict.items():
    for col in colnames:
        phase_dict[col] = phase
        
phase_dict

{'BL1': 'BL',
 'BL2': 'BL',
 'BL3': 'BL',
 'BL4': 'BL',
 'S1': 'S',
 'S2': 'S',
 'S3': 'S',
 'S4': 'S',
 'PS1': 'PS',
 'PS2': 'PS',
 'PS3': 'PS',
 'PS4': 'PS',
 'R1': 'R',
 'R2': 'R',
 'R3': 'R',
 'R4': 'R',
 'R5': 'R',
 'R6': 'R',
 'R7': 'R',
 'R8': 'R'}

In [17]:
## Two way anova
id_cols = ['Mouse ID','Group']
df2 = df.melt(id_vars=id_cols,
              value_vars=df.drop(columns=[*id_cols,'Sex']).columns,
              var_name='Day',value_name='Licks')
df2['Phase'] = df2['Day'].map(phase_dict)
df2

Unnamed: 0,Mouse ID,Group,Day,Licks,Phase
0,Mouse1,Control,BL1,665,BL
1,Mouse2,Control,BL1,859,BL
2,Mouse3,Control,BL1,589,BL
3,Mouse4,Control,BL1,939,BL
4,Mouse5,Experimental,BL1,710,BL
...,...,...,...,...,...
435,Mouse18,Experimental,R8,880,R
436,Mouse19,Experimental,R8,1047,R
437,Mouse20,Experimental,R8,293,R
438,Mouse21,Experimental,R8,900,R


In [18]:
phases

['BL', 'S', 'R', 'PS']

In [29]:
# TO DO: OUTLIER REMOVAL

In [30]:
## Check Assumptions

grps = df2.groupby('Group').groups
for grp_name in grps:
    grps[grp_name] = {}
    
    grp_df = df2.groupby('Group').get_group(grp_name)[['Day','Phase','Licks']]
    
    for phase in phases:
        grps[grp_name][phase] = grp_df.groupby('Phase').get_group(phase)
        
print(grps.keys())
grps['Control'].keys()

dict_keys(['Control', 'Experimental'])


dict_keys(['BL', 'S', 'R', 'PS'])

In [20]:
test = grps['Experimental']['S']['Licks']
test

92      620
93      591
94      882
95      546
96      759
97     1106
98      820
103     492
104     505
105     646
106     533
107     232
108     483
114     589
115     580
116     723
117     493
118     791
119     812
120     752
125     594
126     475
127     619
128     721
129      34
130     491
136     676
137     419
138     764
139     456
140     568
141     902
142     759
147     695
148     519
149     885
150     651
151      65
152     394
158     537
159     463
160     892
161     601
162     664
163     835
164     564
169     568
170     578
171     626
172     546
173     158
174     673
Name: Licks, dtype: int64

In [21]:
import scipy as sp
z_scores = sp.stats.zscore(test)
idx_outliers = np.abs(z_scores)>3

In [25]:
clean_data = {}

for grp, grp_data in grps.items():
    clean_data[grp] = {}

    for phase, data in grp_data.items():
        licks = data['Licks']
        z_scores = sp.stats.zscore(licks)
        idx_outliers = np.abs(z_scores)>3
        
        
        clean_data[grp][phase] = licks[~idx_outliers]
clean_data['Control'].keys()

dict_keys(['BL', 'S', 'R', 'PS'])

In [28]:
# pd.concat(clean_data.values())

In [None]:
import statsmodels.api as sm
import statsmodels.stats as stats
from statsmodels.formula.api import ols
formula = "Licks~C(Group)*C(Phase)"
model = ols(formula=formula, data=df2).fit()
results = stats.anova.anova_lm(model)
results

In [None]:
sns.barplot(data=df2, x='Phase',y='Licks',hue='Group',ci=68)

In [None]:
# help(sms)

In [None]:
# ## Get grps 
# grps = df_means.groupby('Group').groups

# for grp_name,grp_idx in grps.items():
#     grps[grp_name] = df_means.loc[grp_idx]
    
# grps

In [None]:
# from scipy.stats import sem
# ## showing off SEM bars
# f,ax = plt.subplots(figsize=(5,5))
# ax.bar('Exp',grp_exp['BL'],yerr=sem(grp_exp['BL']))
# ax.bar('Control',grp_control['BL'],yerr=sem(grp_control['BL']))

In [None]:
# def plot_dists(grp1,grp2,col='BL',name1='Exp',name2='Control'):

#     ## Defining "gridspec_kws" for plt.subplots()
#     ## This will make our first plot 3 times wider than the second.
#     gs_kw = dict(width_ratios=[3, 1])
    
#     fig, axes = plt.subplots(figsize=(10,4),ncols=2,
#                              gridspec_kw=gs_kw,constrained_layout=True)

#     ## Defining the data 
#     group1 = {'name':name1, 
#              'data':grp1[col],
#              'plot_specs':{
#                  'hist_kws':dict(color='b', lw=2,ls='-'),
#                  'kde_kws':dict(color='b',lw=1,ls='-'),
#                  'label':f"{name1} (n={len(grp1[col])})"}
#              }
    
#     group2 = {'name':name2, 
#              'data':grp2[col],
#               'plot_specs':{
#                   'hist_kws':dict(color='orange', lw=2,ls='-'),
#                   'kde_kws':dict(color='orange',lw=1,ls='-'),
#                    'label':f"{name2} (n={len(grp2[col])})"}
#              }
    
    
#     ax=axes[0]
#     sns.distplot(group1['data'], **group1['plot_specs'],ax=axes[0])
#     sns.distplot(group2['data'], **group2['plot_specs'],ax=axes[0])
#     ax.legend()
    
#     ax.set(ylabel="Density")
#     ax.set(xlabel='Number of Licks')
    
    
#     ax = axes[1]
#     ax.bar(group1['name'],group1['data'].mean(),
#           yerr=sem(group1['data']))

#     ax.bar(group2['name'],group2['data'].mean(),
#           yerr=sem(group2['data']))    
    
#     return fig,ax

In [None]:
# fig,ax = plot_dists(grp_control,grp_exp,col='R',name1='ChR2', name2='Control')

### Writing functions to test assumptions

In [None]:
df

In [None]:
import scipy.stats as stats
stats.normaltest(grp_control['BL'])

In [None]:
def test_normality(grp_control,col='BL',alpha=0.05):
    import scipy.stats as stats
    stat,p =stats.normaltest(grp_control[col])
    if p<alpha:
        print(f"Normal test p value of {np.round(p,3)} is < {alpha}, therefore data is NOT normal.")
    else:
        print(f"Normal test p value of {np.round(p,3)} is > {alpha}, therefore data IS normal.")
    return p

def test_equal_variance(grp1,grp2, alpha=.05):
    stat,p = stats.levene(grp1,grp2)
    if p<alpha:
        print(f"Levene's test p value of {np.round(p,3)} is < {alpha}, therefore groups do NOT have equal variance.")
    else:
        print(f"Normal test p value of {np.round(p,3)} is > {alpha},  therefore groups DOES have equal variance.")
    return p



# def test_assumptions(*args, normal=True,equal_var=True):
#     pass

In [None]:
test_normality(grp_control,col='S'), test_normality(grp_exp,col='S');

In [None]:
def Cohen_d(group1, group2):
    """
    Compute Cohen's d.
    
    Args:
        group1: Series or NumPy array
        group2: Series or NumPy array

    Returns:
        d (float): effect size statistic

    Interpretation:
    > Small effect = 0.2
    > Medium Effect = 0.5
    > Large Effect = 0.8
    """
    diff = group1.mean() - group2.mean()

    n1, n2 = len(group1), len(group2)
    var1 = group1.var()
    var2 = group2.var()

    # Calculate the pooled threshold as shown earlier
    pooled_var = (n1 * var1 + n2 * var2) / (n1 + n2)
    
    # Calculate Cohen's d statistic
    d = diff / np.sqrt(pooled_var)
    
    return d

In [None]:
def test_assumptions(df_means,grps=None,
                     group_col='Group',
                     grp1='ChR2',
                     grp2='Control',
                     data_col='BL',
                    plot_data=False):
    """MASSIVE FUNCTION PASTED IN DUE TO VERY LATE STUDY GROUP
    WE WILL CONSTRUCT A BETTER/SIMPLER VERSION OF THIS TOGETHER IN NEXT STUDY GROUP."""
    
    if grps is None:
        grps = df_means.groupby(group_col).groups
        
        
    group1 = {'name':grp1,
              'data':df_means.loc[grps[grp1],data_col]}
    
    group2 = {'name':grp2,
              'data':df_means.loc[grps[grp2],data_col]}
    
    results = [['Col','Test','Group(s)','Stat','p','p<.05']]
    
    ## Normality testing
    stat,p = stats.normaltest(group1['data'])
    results.append([data_col,'Normality',group1['name'],
                  stat, p, p<.05])
    
    stat,p = stats.normaltest(group2['data'])    
    results.append([data_col,'Normality',group2['name'],
                  stat, p, p<.05])
    ## Homo. of Variance Testing
    stat,p = stats.levene(group1['data'],group2['data'])
    results.append([data_col,'Equal Variance','Both',
                  stat, p, p<.05])
    
    
    ## Parametric T-Test
    stat,p = stats.ttest_ind(group1['data'],group2['data'])
    results.append([data_col,'T-Test 2samp','Both',stat,p,p<.05])
    
    ## Non-Parametric MWU
    stat,p = stats.mannwhitneyu(group1['data'],group2['data'])
    results.append([data_col,'Mann Whitney U','Both',stat,p,p<.05])
    
    ## Effect size with Cohen's d
    d = Cohen_d(group1['data'],group2['data'])
    results.append([data_col, "Cohen's d", 'Both','','',d])
    
#     if plot_data:
#         plot_dists(grp, col=data_col)
    
    return pd.DataFrame(results[1:],columns=results[0])

res = test_assumptions(df_means)


In [None]:
for phase in ['BL','S','PS','R']:
    print('---'*30)

    res = test_assumptions(df_means,data_col=phase)
    display(res)
    
    fig,ax = plot_statplot(df_means, data_col=phase)
    plt.show()

# fig,ax = plot_dists(grp_control,grp_exp,col='R',name1='ChR2 Mice', name2='Control Mice')


## CONCLUSION
- Running the correct test according to the assumptions of normality and equal variance will ensure you can get the correct test result.

- Notice how the last phase (R) did NOT come back as significant when we ran the t-test, but DID come back significant when we performed the Mann Whitney U instead. 



# APPENDIX

(https://www.statsmodels.org/stable/generated/statsmodels.stats.multicomp.pairwise_tukeyhsd.html)

## Statistical Analysis Pipeline

1. **Test for Normality**
    - D'Agostino-Pearson's normality test<br>
    ```scipy.stats.normaltest```
    - Shapiro-Wilik Test<br>
    ```scipy.stats.shapiro```<br>
    
    
2. **Test for Homogeneity of Variance**

    - Levene's Test<br>
    ```scipy.stats.levene```


3. **Choose appropriate test based upon 1. and 2.** <br> 
    - T Test (1-sample)
        - `stats.ttest_1samp()`
    - T Test (2-sample)
        - `stats.ttest_ind()`
        - [docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)
    - Welch's T-Test (2-sample)
        - `stats.ttest_ind(equal_var=False)`
        - [docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)
        
    - Mann Whitney U
        - `stats.mannwhitneyu()`
        - [docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html)
    - ANOVA 
        - `stats.f_oneway()`
        - [docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html)
    - Tukey's
     - `statsmodels.stats.multicomp.pairwise_tukeyhsd`
     -[docs](https://www.statsmodels.org/stable/generated/statsmodels.stats.multicomp.pairwise_tukeyhsd.html)
    

4. **Calculate effect size for significant results.**
    - Effect size: [cohen's d](https://stackoverflow.com/questions/21532471/how-to-calculate-cohens-d-in-python)
    - Interpretation:
        - Small effect = 0.2 ( cannot be seen by naked eye)
        - Medium effect  = 0.5
        - Large Effect = 0.8 (can be seen by naked eye)
        
5. **If significant, follow up with post-hoc tests (if have more than 2 groups)**
    - [Tukey's](https://www.statsmodels.org/stable/generated/statsmodels.stats.multicomp.pairwise_tukeyhsd.html)


In [None]:
# def test_assumptions(df_means,grps=None,
#                      group_col='Group',
#                      grp1='ChR2',
#                      grp2='Control',
#                      data_col='BL'):
    
#     if grps is None:
#         grps = df_means.groupby(group_col).groups
        
        
#     group1 = {'name':grp1,
#               'data':df_means.loc[grps[grp1],data_col]}
    
#     group2 = {'name':grp2,
#               'data':df_means.loc[grps[grp2],data_col]}
    
#     results = [['Col','Test','Group(s)','Stat','p','p<.05']]
    
#     ## Normality testing
#     stat,p = stats.normaltest(group1['data'])
#     results.append([data_col,'Normality',group1['name'],
#                   stat, p, p<.05])
    
#     stat,p = stats.normaltest(group2['data'])    
#     results.append([data_col,'Normality',group2['name'],
#                   stat, p, p<.05])
#     ## Homo. of Variance Testing
#     stat,p = stats.levene(group1['data'],group2['data'])
#     results.append([data_col,'Equal Variance','Both',
#                   stat, p, p<.05])
    
#     ## Parametric T-Test
#     stat,p = stats.ttest_ind(group1['data'],group2['data'])
#     results.append([data_col,'T-Test 2samp','Both',stat,p,p<.05])
    
#     ## Non-Parametric MWU
#     stat,p = stats.mannwhitneyu(group1['data'],group2['data'])
#     results.append([data_col,'Mann Whitney U','Both',stat,p,p<.05])
    
#     ## Effect size with Cohen's d
#     d = Cohen_d(group1['data'],group2['data'])
#     results.append([data_col, "Cohen's d", 'Both','','',d])
    
#     return pd.DataFrame(results[1:],columns=results[0])

# test_assumptions(df_means)