# $k$-way ANOVA

In [1]:
import sweepystats as sw
import numpy as np
import pandas as pd

Continuing on our 1-way ANOVA example, now suppose we have another covariate `Factor` that was measured, and we want to know:

> Do samples in different `Group` and `Factor` have different `Outcome`s?

In [2]:
df = pd.DataFrame({
    'Outcome': [3.6, 3.5, 4.2, 2.7, 4.1, 5.2, 3.0, 4.8, 4.0],
    'Group': pd.Categorical(["A", "A", "B", "B", "A", "C", "B", "C", "C"]), 
    'Factor': pd.Categorical(["X", "X", "Y", "X", "Y", "Y", "X", "Y", "X"])
})
df

Unnamed: 0,Outcome,Group,Factor
0,3.6,A,X
1,3.5,A,X
2,4.2,B,Y
3,2.7,B,X
4,4.1,A,Y
5,5.2,C,Y
6,3.0,B,X
7,4.8,C,Y
8,4.0,C,X


We previously saw that `Group` alone is not significant, using 1-way ANOVA. Lets additionally adjust for `Factor` and the interaction effect between `Group` and `Factor`.

In [3]:
formula = "Outcome ~ Group + Factor + Group:Factor"
two_way = sw.ANOVA(df, formula)
two_way.fit()

100%|███████████████████████████████████████████| 6/6 [00:00<00:00, 12627.11it/s]


Now, we can test for significance of `Group`, `Factor`, and their interaction using an F-test

In [4]:
f_stat, pval = two_way.f_test("Intercept")
f_stat, pval

(np.float64(581.653846153789), np.float64(0.00015624006011285467))

In [5]:
f_stat, pval = two_way.f_test("Group")
f_stat, pval

(np.float64(11.561538461537285), np.float64(0.0389175406918902))

In [6]:
f_stat, pval = two_way.f_test("Factor")
f_stat, pval

(np.float64(4.65384615384568), np.float64(0.11988267006105519))

In [7]:
f_stat, pval = two_way.f_test("Group:Factor")
f_stat, pval

(np.float64(2.4743589743587298), np.float64(0.23186556325015506))

Note that when testing `Group` variable, internally we are **NOT** refitting the reduced model - we simply *swept* out the (one-hot encoded) variable from the full model!

## Check answer is correct

We can compare the answer is correct using `statsmodels` package:

In [27]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Fit the model
model = ols('Outcome ~ Group + Factor + Group:Factor', data=df).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=3)  # Type III ANOVA (note: use type 2 if no interaction term)
print(anova_table)

                 sum_sq   df           F    PR(>F)
Intercept     25.205000  1.0  581.653846  0.000156
Group          1.002000  2.0   11.561538  0.038918
Factor         0.201667  1.0    4.653846  0.119883
Group:Factor   0.214444  2.0    2.474359  0.231866
Residual       0.130000  3.0         NaN       NaN
