Analysis of Variance (ANOVA) is a statistical method for determining the existence of differences among several population means. 


The Hypothesis Test of Analysis of Variance: Assumptions

+  We assume independent random sampling from each of the r populations

+ We assume that the r populations under study: 

    + are normally distributed, 

    + with means $\mu_i$ that may or may not be equal, 

    + but with equal variances



$$H_0: \mu_{1} = \mu_{2} = \mu_{3} = ... = \mu_r$$
$$H_1: Not\ all\ \mu_i\ (i = 1, 2, ..., r)\ are\ equal$$

Exercise 1: The presence of harmful insects in farm fields is detected by erecting boards covered with a sticky material and then examining the insects trapped on the board. To investigate which colours are most attractive to cereal leaf beetles researchers placed six boards of each of four  colours in a field of oats in July. The table below gives the numbers of cereal leaf beetles trapped. 

|Color|Insects trapped|
|--|--|
|Lemon yellow|45 59 48 46 38 47|
|White|21  12  14  17  13  17 |
|Green|37  32  15  25  39  41|
|Blue|16  11  20  21  14   7 |

Does this data suggest that colour influences the number of beetles?   

$$H_0: Mean\ abundance\ is\ the\ same\ for\ all\ colours\ of\ boards $$


$$H_1: At\ least\ one\ pair\ of\ means\ differ $$

In [None]:
# Create lists containing the number of insects trapped in each treatment
Yellow = [45,  59,  48,  46,  38,  47 ]
White = [21,  12,  14,  17,  13,  17 ]
Green = [37,  32,  15,  25,  39,  41 ]
Blue = [16,  11,  20,  21,  14,   7]
Total = Yellow + White + Green + Blue


In [5]:
# Compute average for each treatment and grand mean
import statistics
Yellow_bar = statistics.mean(Yellow)
print("The average number of insects trapped in yellow board, Yellow_bar =", Yellow_bar)

White_bar = statistics.mean(White)
print("The average number of insects trapped in white board, White_bar =", White_bar)

Green_bar = statistics.mean(Green)
print("The average number of insects trapped in green board, Green_bar =", Green_bar)

Blue_bar = statistics.mean(Blue)
print("The average number of insects trapped in blue board, Blue_bar =", Blue_bar)

Grand_Mean = statistics.mean(Total)
print("Grand mean =", Grand_Mean)

The average number of insects trapped in yellow board, Yellow_bar = 47.166666666666664
The average number of insects trapped in white board, White_bar = 15.666666666666666
The average number of insects trapped in green board, Green_bar = 31.5
The average number of insects trapped in blue board, Blue_bar = 14.833333333333334
Grand mean = 27.291666666666668


In [21]:
# Create a function to compute error deviation
def ComputeError(Color):
    List_Error = []
    for i in Color:
        ErrorDeviation = i - statistics.mean(Color)
        List_Error.append(ErrorDeviation)
    return List_Error

print("List error deviation of Yellow treatment =", ComputeError(Yellow))
print("List error deviation of White treatment =", ComputeError(White))
print("List error deviation of Green treatment =", ComputeError(Green))
print("List error deviation of Blue treatment =", ComputeError(Blue))


List error deviation of Yellow treatment = [-2.1666666666666643, 11.833333333333336, 0.8333333333333357, -1.1666666666666643, -9.166666666666664, -0.1666666666666643]
List error deviation of White treatment = [5.333333333333334, -3.666666666666666, -1.666666666666666, 1.333333333333334, -2.666666666666666, 1.333333333333334]
List error deviation of Green treatment = [5.5, 0.5, -16.5, -6.5, 7.5, 9.5]
List error deviation of Blue treatment = [1.166666666666666, -3.833333333333334, 5.166666666666666, 6.166666666666666, -0.8333333333333339, -7.833333333333334]


In [23]:
Error_Total = ComputeError(Yellow) + ComputeError(White) + ComputeError(Green) + ComputeError(Blue)

print("List error deviation of all treament=", Error_Total)

List error deviation of all treament= [-2.1666666666666643, 11.833333333333336, 0.8333333333333357, -1.1666666666666643, -9.166666666666664, -0.1666666666666643, 5.333333333333334, -3.666666666666666, -1.666666666666666, 1.333333333333334, -2.666666666666666, 1.333333333333334, 5.5, 0.5, -16.5, -6.5, 7.5, 9.5, 1.166666666666666, -3.833333333333334, 5.166666666666666, 6.166666666666666, -0.8333333333333339, -7.833333333333334]


In [26]:
# Compute Sum of squares of error (SSE)
Square_Error_List = []
for i in Error_Total:
    i = i**2
    Square_Error_List.append(i)

SSE = statistics.fsum(Square_Error_List)
print("SSE =",SSE)

SSE = 920.5


$$MSE\ (Mean\ Square\ Error) = \frac{SSE}{n-r}$$
where:
+ r = number of treatments (or groups)
+ n = total observations

In [28]:
# Total observation n = 24
n = 24

# Number of treatments r = 4
r = 4

MSE = SSE/(n-r)
print("MSE =",MSE)

MSE = 46.025


$$SSTr = n_1*(\bar{X_1} - \bar{\bar{X}})^2 + n_2*(\bar{X_2} - \bar{\bar{X}})^2 + n_3*(\bar{X_3} - \bar{\bar{X}})^2 + ... + n_i*(\bar{X_i} - \bar{\bar{X}})^2$$
where:
+ $\bar{\bar{X}}$ = Grand mean

In [33]:
n_1 = len(Yellow)
print("n_1 =",n_1)

n_2 = len(White)
print("n_2 =",n_2)

n_3 = len(Green)
print("n_3 =",n_3)

n_4 = len(Blue)
print("n_4 =",n_4)

n_1 = 6
n_2 = 6
n_3 = 6
n_4 = 6


In [35]:
SSTr = n_1*(Yellow_bar - Grand_Mean)**2 + n_2*(White_bar - Grand_Mean)**2 + n_3*(Green_bar - Grand_Mean)**2 + n_4*(Blue_bar - Grand_Mean)**2
print("SSTr =", SSTr)

SSTr = 4218.458333333332


$$MSTr\ (Mean\ Square\ Treatment) = \frac{SSTr}{r - 1} $$

In [36]:
MSTr = SSTr / (r -1)
print("MSTr =", MSTr)

MSTr = 1406.1527777777774


$$F_{(r - 1, n - r)} = \frac{MSTr}{MSE}$$

In [39]:
F_score = MSTr/MSE
print("F_score =", F_score)

# Compute critical value
"""Anova is a right tailed test"""
alpha = 0.05

from scipy.stats import f
critical_value = f.ppf(1 - alpha, r - 1, n - r)
print("Critical value =", critical_value)

F_score = 30.55193433520429
Critical value = 3.09839121214078


In [40]:
# Decision making
if F_score < critical_value:
    print("Do not reject Ho")
else:
    print("Reject Ho")

Reject Ho


Conclusion: There is evidence of significant differences among the mean abundances