## One way ANVOA

The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of three or more independent groups.

Thus one would want to test the following null hypothesis,

$$H_0 : \mu_1 = \mu_2 = \dots = \mu_n$$

where $u_i$ is the mean of group $i$. The alternative hypothesis is that at least one of the means is different. It is not that all of the means are different; a common mistake.


Like all other tests ANVOA certain assumtions about the data. One requires;

* The residuals that need to be normally distributed
* Homogeneity of variances. In other words we have that the population variances in each   group are equal.
* Independence of observations. This is a feature of expirimental design.


## Mathematical formulation



First will define our notation for the ANOVA calculation.

* $k$ = the number of groups
* $n_i$ = the sample size taken from group $i$
* $x_{ij}$= the $j^{\text{th}}$ sample from the $i^{\text{th}}$ group
* $\bar{x}_i$ = the sample mean of group $i = \frac{1}{n_{i}} \sum_{j=1}^{n_{i}} x_{i j}$
* $s_i$ = the sample mean of group $i = \frac{1}{n_{i}-1} \sum_{j=1}^{n_{i}}\left(x_{i j}-\overline{x}_{i}\right)^{2}$
* n = the total number of sample = $\sum_{i=1}^{k} n_{i}$
* $\bar{x}$ = the mean of all samples $\frac{1}{n} \sum_{i j} x_{i j}$

The total amount of variability among observations is calculated by summing the squares of the dfferences between  $x_{i j} \text { and } \overline{x}$. That the total sum of squares,

$$\text{SST} = \sum_{i=1}^{k} \sum_{j=1}^{n_{i}}\left(x_{i j}-\overline{x}\right)^{2}$$.

We can separate the variability into two distinct sources. 
The variability Variability between group means, specifically the variation around the overall mean.
$$\mathrm{SSG} :=\sum_{i=1}^{k} n_{i}\left(\overline{x}_{i}-\overline{x}\right)^{2}$$



The variability within groups, specifically variation of the observations about there group means $x_{i}$ 

$$\mathrm{SSE} :=\sum_{i=1}^{k} \sum_{j=1}^{n_{i}}\left(x_{i j}-\overline{x}_{i}\right)^{2}=\sum_{i=1}^{k}\left(n_{i}-1\right) s_{i}^{2}$$

In [1]:
df = read.csv('fruitfly.csv')

In [2]:
head(df)

fecundity,line
12.8,R
21.6,R
14.8,R
23.1,R
34.6,R
19.7,R


In [3]:
mod <- aov(fecundity ~ line, data =df)

In [4]:
summary(mod)

            Df Sum Sq Mean Sq F value   Pr(>F)    
line         2   1362   681.1   8.666 0.000424 ***
Residuals   72   5659    78.6                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1