# Post-hoc Tests 

After conducting an ANOVA and finding a significant difference, a post hoc test is needed to determine exactly which groups differ from each other.<br>
Demonstration of post hoc tests for one-way,two-way and N-way ANOVA

1. One way ANOVA

In [1]:
# post hoc test for multiple comparisons 
import seaborn as sb
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd
# Load the dataset
data = sb.load_dataset('titanic')
# Perform one-way ANOVA 
model = ols('age ~ C(survived)', data=data).fit() 
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table) 
# Perform Tukey's HSD test for multiple comparisons 
tukey = pairwise_tukeyhsd(endog=data['age'], groups=data['survived'], alpha=0.05) 
print(tukey)

                    sum_sq     df         F    PR(>F)
C(survived)     897.187582    1.0  4.271195  0.039125
Residual     149559.448362  712.0       NaN       NaN
Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------
     0      1      nan   nan   nan   nan  False
-----------------------------------------------


Interpretation:
This is a post-hoc test used to compare all pairs of groups after a significant ANOVA result.

meandiff, p-adj, lower, upper are all NaN, and reject is False.

This likely means:

i. There is either missing data or

ii. The grouping variable Survived has fewer than 2 valid groups in the post-hoc subset.

2. Two way ANOVA

In [12]:
# post-hoc test for two-way ANOVA 
model2 = ols('age ~ C(survived) + C(pclass)', data=data).fit()
anova_table2 = sm.stats.anova_lm(model2, typ=2)
print(anova_table2)
# Perform Tukey's HSD test for two-way ANOVA
tukey2 = pairwise_tukeyhsd(endog=data['age'], groups=data[['survived', 'pclass']].apply(lambda x: '_'.join(x.astype(str)), axis=1), alpha=0.05)
print(tukey2)

                    sum_sq     df          F        PR(>F)
C(survived)    7523.129884    1.0  43.780758  7.235629e-11
C(pclass)     27555.570057    2.0  80.179643  4.015342e-32
Residual     122003.878306  710.0        NaN           NaN
Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------
   0_1    0_2      nan   nan   nan   nan  False
   0_1    0_3      nan   nan   nan   nan  False
   0_1    1_1      nan   nan   nan   nan  False
   0_1    1_2      nan   nan   nan   nan  False
   0_1    1_3      nan   nan   nan   nan  False
   0_2    0_3      nan   nan   nan   nan  False
   0_2    1_1      nan   nan   nan   nan  False
   0_2    1_2      nan   nan   nan   nan  False
   0_2    1_3      nan   nan   nan   nan  False
   0_3    1_1      nan   nan   nan   nan  False
   0_3    1_2      nan   nan   nan   nan  False
   0_3    1_3      nan   nan   nan   nan  False
   1_1    1_2      nan   nan   nan   nan

3. N-way(3-way) ANOVA

In [4]:
#post hoc test for 3-way ANOVA 
model3 = ols('age ~ C(survived) + C(pclass) + C(sex)',data=data).fit()
anova_table3 = sm.stats.anova_lm(model3, typ=2)
print(anova_table3)
# Perform Tukey's HSD test for three-way ANOVA
tukey3 = pairwise_tukeyhsd(
    endog=data['age'],
    groups=data[['survived', 'pclass', 'sex']].apply(lambda x: '_'.join(x.astype(str)), axis=1),
    alpha=0.05)
print(tukey3)

                    sum_sq     df          F        PR(>F)
C(survived)    4360.534586    1.0  25.387474  5.957424e-07
C(pclass)     27216.635455    2.0  79.229003  8.810723e-32
C(sex)          226.537529    1.0   1.318924  2.511725e-01
Residual     121777.340777  709.0        NaN           NaN
  Multiple Comparison of Means - Tukey HSD, FWER=0.05  
  group1     group2   meandiff p-adj lower upper reject
-------------------------------------------------------
0_1_female   0_1_male      nan   nan   nan   nan  False
0_1_female 0_2_female  10.3333   nan   nan   nan  False
0_1_female   0_2_male      nan   nan   nan   nan  False
0_1_female 0_3_female      nan   nan   nan   nan  False
0_1_female   0_3_male      nan   nan   nan   nan  False
0_1_female 1_1_female      nan   nan   nan   nan  False
0_1_female   1_1_male      nan   nan   nan   nan  False
0_1_female 1_2_female      nan   nan   nan   nan  False
0_1_female   1_2_male      nan   nan   nan   nan  False
0_1_female 1_3_female      nan   

4. Pairwise Comparison

In [3]:
# Tukey's HSD pairwise comparisons 
from statsmodels.stats.multicomp import pairwise_tukeyhsd 
tukey = pairwise_tukeyhsd(data['age'], data['pclass'] * data['who'], alpha=0.05)
tukey.summary()

group1,group2,meandiff,p-adj,lower,upper,reject
child,childchild,-3.2763,,,,False
child,childchildchild,-1.0024,,,,False
child,man,,,,,False
child,manman,,,,,False
child,manmanman,,,,,False
child,woman,,,,,False
child,womanwoman,,,,,False
child,womanwomanwoman,,,,,False
childchild,childchildchild,2.2739,,,,False
childchild,man,,,,,False
