<strong>One way ANOVA</strong>
<br>Basis hypotheses : independence, normality and equivariance
<ul>
    <li>H0 : the mean of all groups in concern is equal</li>
    <li>H1 : the mean of all groups in concern is inequal; there is at least one group whose mean differes from others</li>
</ul>

In [1]:
import numpy as np

# fertilizer data as an example
# values are the height of plants
treat1 = np.random.normal(20,10,50)
treat2 = np.random.normal(20,5,50) # different in variance
treat3 = np.random.normal(10,10,50) # different in mean

In [2]:
# normality check
from scipy import stats
print(stats.shapiro(treat1))
print(stats.shapiro(treat2))
print(stats.shapiro(treat3))

ShapiroResult(statistic=0.9853936049034514, pvalue=0.7883915180226313)
ShapiroResult(statistic=0.9740867520796885, pvalue=0.33659504444052835)
ShapiroResult(statistic=0.9693441086006391, pvalue=0.21763473664493965)


Sure they have generated to be normal!

In [3]:
# equivariance check
print(stats.levene(treat1, treat2))
print(stats.levene(treat1, treat3)) # equivariant
print(stats.levene(treat2, treat3))

LeveneResult(statistic=18.694266810379226, pvalue=3.69100881519706e-05)
LeveneResult(statistic=1.0773087756996325, pvalue=0.3018545723880243)
LeveneResult(statistic=10.1328240743497, pvalue=0.0019523208770349974)


In [4]:
# do ANOVA to the pair of treat 1 and 3
# there are two ways to go; one with stats.f_oneway; the other with ols
stats.f_oneway(treat1, treat2)

F_onewayResult(statistic=0.05257425321748644, pvalue=0.8191207863826159)

Given the pvalue, accept the result, that is, their mean is equal. This is not true; so the alpha error occurs.

In [9]:
# to use ols, data needs to be in a certain form.
import pandas as pd
treat1_df = pd.DataFrame({'height':treat1, 'name':np.repeat('treat1',50)})
treat2_df = pd.DataFrame({'height':treat2, 'name':np.repeat('treat2',50)})
treat3_df = pd.DataFrame({'height':treat3, 'name':np.repeat('treat3',50)})
plant = pd.concat([treat1_df, treat2_df, treat3_df])
plant

Unnamed: 0,height,name
0,34.356188,treat1
1,22.937798,treat1
2,8.713438,treat1
3,22.609524,treat1
4,44.635738,treat1
...,...,...
45,11.744999,treat3
46,4.726503,treat3
47,22.211026,treat3
48,10.597587,treat3


In [12]:
# with the use of ols, ANOVA can be done this way:
from statsmodels.stats.anova import anova_lm
from statsmodels.formula.api import ols
model = ols('height ~C(name)', plant).fit()
anova_lm(model)

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(name),2.0,3125.029363,1562.514682,20.37942,1.542394e-08
Residual,147.0,11270.667356,76.671207,,


Reject H0 and conclude that there is a difference in height between the groups