### <b>|Q.|</b> &emsp; ANOVA (ANalysis Of VAriance)

<hr>
Three archers – Pat, Jack, and Alex are participating in an archery contest. They are shooting at
targets with 10 evenly spaced concentric rings. The rings have score values from 1 through 10
assigned to them, with 10 being the highest. Each participant shoots 6 arrow, scoring the
following points:<br>
Pat – 5, 4, 4, 3, 9, 4<br>
Jack – 4, 8, 7, 5, 1, 5<br>
Alex – 9, 9, 8, 10, 4, 10<br>
Based on the above results we would like to know who the best archer is. In other words our null
hypothesis is that the means of all populations are equal.

# <b>|Ans.| </b>
###<font color='red'> NULL HYPOTHESIS :- </font>
# <b>H<sub>0</sub> : </b><font color='blue'>  μ<sub>1</sub> = μ<sub>2</sub> = μ<sub>3</sub> </font>
i.e, our null hypothesis is that the means of all populations are equal (for Pat, Jack and Alex).
Rejecting the null hypothesis would mean that there is a significant difference between at least two of the archers.

The decision to reject the null hypothesis and accept the alternative hypothesis is based on the significance level of the test <b><i>(&alpha;)</b></i> and the probability of observing the effect given that the null hypothesis is true (p-value). If <b><i>"p &leq; &alpha;"</b></i>, the null hypothesis is ruled out. We typically use a value of <b><i>'&alpha; = 0.05'</b></i>, which corresponds to 95% confidence. 

In [13]:
import numpy as np
from scipy import stats
 
data = np.rec.array([
('Pat', 5),
('Pat', 4),
('Pat', 4),
('Pat', 3),
('Pat', 9),
('Pat', 4),
('Jack', 4),
('Jack', 8),
('Jack', 7),
('Jack', 5),
('Jack', 1),
('Jack', 5),
('Alex', 9),
('Alex', 8),
('Alex', 8),
('Alex', 10),
('Alex', 5),
('Alex', 10)], dtype = [('Archer','|U5'),('Score', '<i8')])
 
f, p = stats.f_oneway(data[data['Archer'] == 'Pat'].Score,
                      data[data['Archer'] == 'Jack'].Score,
                      data[data['Archer'] == 'Alex'].Score)
 
print ('One-way ANOVA')
print ('=============')
 
print ('F value:', round(f,5),"(in %)")
print ('P value:', p, '\n')

One-way ANOVA
F value: 5.0 (in %)
P value: 0.021683749320078414 



As <b>0.02 &leq; 0.05</b> we reject the null hypothesis and we conclude that at least one of the means is different from at least one other population mean (i.e. not all archers perform equally).

The thing with one-way ANOVA is that although we now know that there is difference in the performance of the archers, we do not know know exactly who performs best or worst.<hr>

# <font color='Indigo'><b>Tukey’s range test</b></font>
This test compares all possible pairs and we can use it to precisely identify difference between two means that’s greater than the expected standard error.

In [14]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison
import pandas.util.testing as tm

multi_comp = MultiComparison(data['Score'], data['Archer'])
result = multi_comp.tukeyhsd(alpha=0.05)
 
print(result)
print(multi_comp.groupsunique)

Multiple Comparison of Means - Tukey HSD, FWER=0.05 
group1 group2 meandiff p-adj   lower   upper  reject
----------------------------------------------------
  Alex   Jack  -3.3333 0.0435 -6.5755 -0.0911   True
  Alex    Pat     -3.5 0.0337 -6.7422 -0.2578   True
  Jack    Pat  -0.1667    0.9 -3.4089  3.0755  False
----------------------------------------------------
['Alex' 'Jack' 'Pat']


The results above reveal that Alex (group-0) significantly differs from the other two archers. The third column tells us that there is significant evidence to reject the null hypothesis for the groups Alex-Jack (0-1) and Alex-Pat(0-2).

The test also shows the difference between the group means (the meandiff column).<br> <br>
<b>
&mu;<sub>Jack</sub> - &mu;<sub>Alex</sub> = -3.3333 <br>
&mu;<sub>Pat</sub> - &mu;<sub>Alex</sub> = - 3.5 <br>
So,&emsp; &mu;<sub>Alex</sub> > &mu;<sub>Jack</sub> > &mu;<sub>Pat</sub>
</b><br>
This leads to the conclusion that Alex is the best archer in the group, and Pat is the worst, comparatively.