# 1-way ANOVA

In [1]:
import sweepystats as sw
import numpy as np
import pandas as pd
import patsy

Suppose we are given an example data set, and we want to know:

> Do samples in different `Group` have different `Outcome`s?

In [2]:
df = pd.DataFrame({
    'Outcome': [3.6, 3.5, 4.2, 2.7, 4.1, 5.2, 3.0, 4.8, 4.0],
    'Group': pd.Categorical(["A", "A", "B", "B", "A", "C", "B", "C", "C"]), 
})
df

Unnamed: 0,Outcome,Group
0,3.6,A
1,3.5,A
2,4.2,B
3,2.7,B
4,4.1,A
5,5.2,C
6,3.0,B
7,4.8,C
8,4.0,C


Statistically, we want to test whether the mean of each group (i.e. categories A vs B vs C) is different. The null hypothesis is $\mu_A = \mu_B = \mu_C$ . For this, we can conduct a 1-way ANOVA. 

`Sweepystats` accepts patsy's [formula](https://patsy.readthedocs.io/en/latest/formulas.html) to specify which variable is being considered.

In [3]:
formula = "Outcome ~ Group"
one_way = sw.ANOVA(df, formula)
one_way.fit()

100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 8519.24it/s]


The F-statistic and p-value can be extracted as:

In [6]:
f_stat, pval = one_way.f_test("Intercept")
f_stat, pval

(np.float64(113.34939759036061), np.float64(4.047672672725335e-05))

In [7]:
f_stat, pval = one_way.f_test("Group")
f_stat, pval

(np.float64(3.9668674698794675), np.float64(0.07984562357182835))

If we reject the null at $\alpha = 0.05$ level, then no, there is no statistically significant difference between at least one pair of group means.

## Check answer is correct

We can compare the answer via sweep operator is correct using `statsmodels` package:

In [8]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Fit the model
model = ols('Outcome ~ Group', data=df).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=3)  # Type I ANOVA
anova_table

Unnamed: 0,sum_sq,df,F,PR(>F)
Intercept,41.813333,1.0,113.349398,4e-05
Group,2.926667,2.0,3.966867,0.079846
Residual,2.213333,6.0,,
