<img src="./img/HWNI_logo.svg"/>

# Lab A - Two-Way ANOVA

In [None]:
# makes our plots show up inside Jupyter
%matplotlib inline

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import scipy.stats

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import util.utils as utils
import util.shared as shared

shared.format_plots()
shared.format_dataframes()

## About the Lab

In part A of the lab for this section, we run ANOVA on a simulated dataset using `statsmodels` and connect ANOVA outputs to features of our data visualizations.

## About the Dataset

The data for this lab is simulated.

After learning of [Nemeroff, Heim, et al.](http://www.pnas.org/content/100/24/14293.full)'s finding that psychotherapy is more effective at treating depression in individuals with childhood trauma, you become interested in whether the same might be true for treating anxiety.

You run a clinical experiment in which individuals with and without childhood trauma are treated with a placebo, the standard of care for psychotherapy (CBT, or 
["Cognitive Behavioral Therapy"](https://en.wikipedia.org/wiki/Cognitive_behavioral_therapy)),
or the standard of care for pharmacotherapy
[(an anxiolytic GABA agonist)](https://en.wikipedia.org/wiki/Anxiolytic).
As an aside: in actual clinical studies, treatment with a placebo is considered unethical if there's already a standard of care, and a proper clinical trial compares alternative treatments to standard treatments.

The results of your experiment are summarized in the table `anxiety_dataset.csv`.

## Loading the Data

In [None]:
df = pd.read_csv('./data/anxiety_dataset.csv',index_col=0)

data = df.copy()

In [None]:
data.sample(10)

For the independent variables `treatment` and `trauma`, use the pandas `Series` method `unique` to determine the levels and store these in variables called `treatments` and `traumas`.

In [None]:
treatments = data.treatment.unique()
traumas = data.trauma.unique()

## Visualizing the Data

Visualize the data. Aim for a visualization that makes it possible to see any main effects along with the interaction effect you're interested in. Examples include: histograms, factorial plots (make sure to include error bars), and strip/swarm/violin plots.

#### Q1 Discuss the connections between visual aspects of your graph and 1) the assumptions of ANOVA and 2) the outcomes you expect.

### Running ANOVA

Because the ANOVA test is, from one perspective, a test of the goodness-of-fit of a linear model, modeling packages provide the tools in Python for performing ANOVAs more complicated than one-way between-subjects.

In this course, we'll be using the `statsmodels` package.

In [None]:
import statsmodels.api as sm
import statsmodels.formula.api as smf

We specify models by describing them with strings that look like:

```
    "outcome ~ factor"
```

for one-way ANOVAs,

```
    "outcome ~ factor1*factor2*...factorN"
```

for N-way ANOVAs where we want to compute all main effects and interactions,
and

```
    "outcome ~ factor1:randomFactor1:randomFactor2:...randomFactorN"
```

for ANOVAs with N factors where we're only interested in interactions (e.g., a subject factor).

In [None]:
ols_lm = smf.ols('anxiety_reduction ~ treatment*trauma', data=df)

fit = ols_lm.fit()

table = sm.stats.anova_lm(fit, typ=2)

table

#### Q2 Interpret the pattern of significant and non-significant results that you see. First, phrase your answer as you would write it in the results section of a paper (e.g. using the [APA guidelines](https://depts.washington.edu/psych/files/writing_center/stats.pdf)), then describe the results less formally, as in a research talk.

Just as we performed an overall $F$-test before doing $t$-tests in a one-way ANOVA, one can also perform an "omnibus" $F$-test before performing the individual $F$-tests of a multi-way ANOVA. In this test, we are essentially checking to see whether the model as a whole has a significant between-groups mean-square.

Just as the one-way ANOVA allowed us to perform follow-up $t$-tests without worrying as much about multiple comparisons, the omnibus $F$-test lets us perform an ANOVA with many terms without worrying as much about multiple comparisons. The issue of multiple-comparisons in ANOVA and the role of the omnibus test is explored in more detail in the second half of the lab.

We can calculate the model's overall between-groups mean-square by adding up the sums of squares for each component of the model and dividing by the sum of the degrees of freedom of each component of the model. Comparing this to the residual mean square gives us an F for which we can compute a p-value.

Implement an omnibus test and run it on the results table above. You'll need the `cdf` method of `scipy.stats.f`. Note that the results table contains all the information you need to run an omnibus test.

In [None]:
#scipy.stats.f?

Template:

```python
def compute_p(f_value, dof_b, dof_w):
    cdf = scipy.stats.f.cdf
    ...
    return p

def omnibus_test(result):
    
    # get residual and model sum of squares from result table
    
    # calculate residual and model degrees of freedom from result table
    
    # compute explained and unexplained mean squares from the above
    
    # compute F from explained and unexplained mean squares
    
    # compute p from F and the degrees of freedom using compute_p

    return (F,p)
```

#### Q3 Is the omnibus test significant? What does this mean?