<img src="./img/HWNI_logo.svg"/>

# Lab 05b - Multiple Comparisons and ANOVA

In [1]:
# makes our plots show up inside Jupyter
%matplotlib inline

import numpy as np
import pandas as pd

import scipy.stats

import matplotlib.pyplot as plt
import seaborn as sns

# choose colors that work for most color-blind folks
sns.set_palette("colorblind")
sns.set(color_codes=True)

import util.lab05utils as utils 

# this makes our tables easier to read
utils.formatDataframes()

When N-way ANOVAs are performed in an exploratory fashion (i.e., without pre-specifying which interactions are of interest), there is a 
[rarely-acknowledged multiple-comparisons effect](https://arxiv.org/pdf/1412.3416)
that
[massively increases the error rate](http://deevybee.blogspot.co.uk/2013/06/interpreting-unexpected-significant.html).

Below, we'll repeatedly simulate null-distributed data for an N-way ANOVA and then perform statistical testing. Because we know *a priori* that the null hypothesis is true, we can interpret any finding as a false positive and so get a sense of our error rate.

## Simulating Data

We simulate data by generating a single Gaussian random variable, `outcome`, and then assigning it at random to one of two factor levels in each of a number of factors given by `numFactors`. The factor levels are coded as `0` and `1` and factors are labeled with letters `a-z`. This is done for each of the subjects you provide. The results are returned as a pandas dataframe.

The ANOVA is run using the linear-model-fitting tools in `statsmodels`. The results are also returned as a pandas dataframe. The `p` value is in the column `PR(>F)`.

We can control the power of the study by changing the number of subjects and the standard deviation.

Note that if you're looking at ANOVAs with more factors, you might start coming across linear algebra errors due to the sample size being too low. If that happens, increase the number of subjects.

In [2]:
numSubjects = 1000
numFactors = 7
standardDev = 1

nullData = utils.generateData(numSubjects,numFactors,standardDev)

nullData.sample(10)

Unnamed: 0,a,b,c,d,e,f,g,outcome
3226,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.737226
3821,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.504359
1020,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.230438
4992,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.168202
1702,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.923733
6877,1.0,0.0,1.0,1.0,0.0,1.0,1.0,2.22631
3099,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.480522
581,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.565842
6621,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.073004
8435,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.144795


In [3]:
results = utils.runANOVA(nullData)
results.tail()

Unnamed: 0,sum_sq,df,F,PR(>F)
c:d:e:f:g,0.658016,1.0,0.654196,0.418636
a:c:d:e:f:g,0.244107,1.0,0.242689,0.622281
b:c:d:e:f:g,1.108725,1.0,1.102288,0.29379
a:b:c:d:e:f:g,0.073581,1.0,0.073154,0.786805
Residual,9929.654059,9872.0,,


Write code to run this simulated experiment multiple times, tracking the number of false positives in each experiment. Run the code with the settings below.

In [25]:
def runExperiments(numExperiments,numSubjects,numFactors,standardDev):
    
    familywiseError = np.zeros(numExperiments)
    falsePositiveRate = np.zeros(numExperiments)
    omnibus = np.zeros(numExperiments)
    
    # your code here
        
    return falsePositiveRate,familywiseError, omnibus

def omnibusTest(result):
    #your code here
    return

In [27]:
numExperiments = 100

numSubjects = 1000
numFactors = 3
standardDev = 1

FPR, FWER, omni = runExperiments(numExperiments,numSubjects,numFactors,standardDev)

In [28]:
print(np.mean(FPR))

print(np.mean(FWER))

print(np.mean(omni))

0.0642857142857
0.4
0.05


## Simulations

#### Q1 Use your results to calculate both a family-wise error rate, the chance you get at least one (falsely) signficant result, and a false positive rate, the fraction of signficiant results. Use an $\alpha$ of 0.05. Explain your results below.

#### Q2 Note that you did not calculate a false discovery rate. What is the FDR for these experiments?

#### Q3 What do you predict would be the effect of increasing the number of factors (to, e.g., 7) on the FPR and FWER? Check your prediction against the simulation and report the results.

#### Q4 Make a prediction for the effect of increasing the power (by adding more subjects or decreasing the standard deviation). Check your prediction against the simulation and explain the results.

An "omnibus test" works as follows: first, check whether the model as a whole has a statistically significant mean-square. This can be done by adding the sum-of-squares for each component of the model (except the residuals) and dividing by the sum of the degrees of freedom for each component of the model (again, except the residuals). Then, if the overall result is significant at a level $\alpha$, perform follow-up F-tests. This is exactly how we performed one-way ANOVAs.

#### Q5 How often would you expect the omnibus test to fail if the null hypothesis is true for all interactions? Implement an omnibus test (all the numbers you need to calculate `F` are in the results returned by `runANOVA`) and check your prediction.

#### Q6 What happens to the omnibus test if there is exactly one strong effect in the data? Does it still protect against false positives?