# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Using-One-way-ANOVA-and-Tukey's-test-to-compare-data-sets" data-toc-modified-id="Using-One-way-ANOVA-and-Tukey's-test-to-compare-data-sets-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Using One-way ANOVA and Tukey's test to compare data sets</a></div><div class="lev1 toc-item"><a href="#Tukey's-range-test" data-toc-modified-id="Tukey's-range-test-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Tukey's range test</a></div>

In [7]:
import numpy as np
import pandas as pd

import scipy.stats as stats
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Using One-way ANOVA and Tukey's test to compare data sets

http://cleverowl.uk/2015/07/01/using-one-way-anova-and-tukeys-test-to-compare-data-sets/

In [2]:
df = pd.DataFrame({'Archer': ['Pat']*6 + ['Jack']*6 + ['Alex']*6,
                  'Score': [5,4,4,3,9,4,4,8,7,5,1,5,9,8,8,10,5,10]})
df

Unnamed: 0,Archer,Score
0,Pat,5
1,Pat,4
2,Pat,4
3,Pat,3
4,Pat,9
5,Pat,4
6,Jack,4
7,Jack,8
8,Jack,7
9,Jack,5


In [4]:
f, p = stats.f_oneway(df[df['Archer'] == 'Pat'].Score,
                      df[df['Archer'] == 'Jack'].Score,
                      df[df['Archer'] == 'Alex'].Score)
 
print ('One-way ANOVA')
print ('=============')
 
print ('F value:', f)
print ('P value:', p, '\n')

One-way ANOVA
F value: 5.0
P value: 0.0216837493201 



As 0.02≤0.05 we reject the null hypothesis and we conclude that at least one of the means is different from at least one other population mean (i.e. not all archers perform equally).

The thing with one-way ANOVA is that although we now know that there is difference in the performance of the archers, we do not know know exactly who performs best or worst.

This is why the analysis of variance is often followed by a post hoc analysis.

# Tukey's range test
Tukey's range test, named after the American mathematician John Tukey, is a common method used as post hoc analysis after one-way ANOVA. This test compares all possible pairs and we can use it to precisely identify difference between two means that's greater than the expected standard error.

The statsmodels library provides an easy to use implementation of Tukey's range test. First, we have to modify our code to import the required classes:

In [5]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison

In [6]:
mc = MultiComparison(df['Score'], df['Archer'])
result = mc.tukeyhsd()
 
print(result)
print(mc.groupsunique)

Multiple Comparison of Means - Tukey HSD,FWER=0.05
group1 group2 meandiff  lower   upper  reject
---------------------------------------------
 Alex   Jack  -3.3333  -6.5755 -0.0911  True 
 Alex   Pat     -3.5   -6.7422 -0.2578  True 
 Jack   Pat   -0.1667  -3.4089  3.0755 False 
---------------------------------------------
['Alex' 'Jack' 'Pat']


Note the last line in the snippet. We need this to see the group IDs assigned to the archers, as the algorithm won't necessarily follow the group order from the array. 

Also note that the tukeyhsd() function has a parameter named alpha, 
which we are not setting explicitly as we are happy with its default value (α=0.05).

The results above reveal that Alex (group 0) significantly differs from the other two archers. 

The third column tells us that there is significant evidence to reject the null hypothesis
for the groups Alex-Jack (0-1) and Alex-Pat(0-2).

The test also shows the difference between the group means (the meandiff column).

μJack − μAlex = −3.3333  
μPat − μAlex = −3.5  

This leads to the conclusion that Alex is the best archer in the group.

a = alex j = jack p = pat

NOTE: muAlex - muJack = -3.33 is WRONG
      muJack - muAlex = -3.33 Read it wrong way.
      
      
a > j   IF meandiff is negative first group is better.
a > p   
j > p  

a > j > p ==> a is the best.