# Robust Independent Groups ANOVA 

In this notebook I will demonstrate how to conduct a robust independent groups ANOVA. This will include testing for homogeneity of variance using Levene's test, conducting a Welch's ANOVA test in the case of a signficant Levene's test where equal variances cannot be assumed, and following this up with robust post-hoc tests using Games-Howell pairwise comparisons. 

The dataset that I will be using contains the responses of 152 police officers to a large questionnaire study investigating the topic of stress in the police force. I will conduct a one-way robust ANOVA on an independent variable (IV) of length of service. This variable records an officer's length of service and has been transformed into a categorical variable by binning length of service into three groups: Group 1: up to 10 years service, Group 2: 10 - 20 years service, Group 3: Over 20 years service. The dependent variable (DV) is a likert scale variable recording participants mean scores on a 7 item, 5 pt (0-4) likert scales measuring operational uplifts. This was a measure of a series of operational factors and whether they were rated positively or negatively over the previous month of service. 


In [1]:
# Importing key software libraries. 

import pandas as pd
from scipy.stats import levene
import numpy as np
import pingouin as pg

In [2]:
# Importing the dataset.

uplifts = pd.read_csv('uplifts.csv')

uplifts.head()

Unnamed: 0,children,sex,servegrp,age,operupm,orgupm
0,1.0,1.0,2.0,38.0,3.0,3.5
1,2.0,2.0,3.0,42.0,4.0,4.0
2,2.0,2.0,1.0,34.0,2.43,3.5
3,2.0,1.0,1.0,27.0,2.86,2.0
4,1.0,1.0,2.0,32.0,3.0,3.0


### Levene's Test

Conducting Levene's test to check the assumption of equal variances for the three length of service categories. 

Firstly, I need to create three new objects, one for each length of service category. These are then passed to the scipy Levene method to check for equality of variances. 

In [3]:
# Using the series.map function to add labels to the servegrp categorical variable.

a = [1, 2, 3]
b = ["upto10", "10-20", "20plus"]

uplifts['service_cat'] = uplifts['servegrp'].map(dict(zip(a, b)))

uplifts.head()

Unnamed: 0,children,sex,servegrp,age,operupm,orgupm,service_cat
0,1.0,1.0,2.0,38.0,3.0,3.5,10-20
1,2.0,2.0,3.0,42.0,4.0,4.0,20plus
2,2.0,2.0,1.0,34.0,2.43,3.5,upto10
3,2.0,1.0,1.0,27.0,2.86,2.0,upto10
4,1.0,1.0,2.0,32.0,3.0,3.0,10-20


In [4]:
# I want to compare variance around the mean of operupm score for each of the groups, so to do this I will
# create three new variable objects that can then be passed to the scipy levene method for comparison.

cat = uplifts['service_cat']
scale = uplifts['operupm']

In [5]:
# Next creating three boolean objects spltting group (cat) up by level/ condition.

cat_1 = cat == 'upto10'
cat_2 = cat == '10-20'
cat_3 = cat == '20plus'

In [6]:
# Finally create a list of each operupm score broken down by category. Using the above booleans.

cat_oper_upto10 = scale[cat_1]
cat_oper_10_20 = scale[cat_2]
cat_oper_20plus = scale[cat_3]

In [7]:
# Now running the Levene's test by passing the above three operupm scores by group objects.
# Specifying the mean as measure of centre.

levene(cat_oper_upto10, cat_oper_10_20, cat_oper_20plus, center = 'mean')

LeveneResult(statistic=4.680483599568664, pvalue=0.010680464772079585)

We can see from the above output that the Levene's test was significant so homogeneity of variance cannot be assumed for this dataset. As a consequence we need to conduct a robust ANOVA test that does not rely on the assumption of equal variances between groups and adjusts the test to allow for violation of this assumption. 

To report the above ANOVA we also need the degrees of freedom. As the Levene's test is based on the F-distribution we report both the between and within groups degrees of freedom. The degrees of freedom can be calculated as follows:

- df(between) = k - 1 
- df(within) = n - k 
- df(total) = n - 1

Where k is the number of categories/ groups and n is the total number of scores/ participants.

For this dataset, consisting of 152 participants and 3 groups for the independent variable the df(between) is 3 - 1 = 2, and the df(within) is 152 - 3 = 149. 

We would then report the result of the Levene's test as follows:

F(2, 149) = 4.68, p = 0.01

### Welch's ANOVA

As we cannot assume homogeneity of variance between the three groups in the data, we need to conduct a robust ANOVA test. Here I am going to use the welch_anova method from the pingouin software library. 

In [8]:
# Running the Welch robust ANOVA test. This robust ANOVA is suitable for independent groups designs and the pingouin method 
# takes parameters for the data, the DV, and the between subjects factor. 
pg.welch_anova(data = uplifts, dv = 'operupm', between = 'service_cat')

Unnamed: 0,Source,ddof1,ddof2,F,p-unc,np2
0,service_cat,2,83.207651,4.630646,0.012393,0.068745


The above analysis showed a significant result, indicating that there is a significant difference between the three length of service groups in terms of their mean scores on operational uplifts. 

To report this we would state that: Welch's F(2, 83) = 4.63, p = 0.01

Note: The df(within) have been reduced to 83 for this test. This is how the Welch's test adjusts the p-vlaue to make the test more robust. It is like the test is being conducted on a smaller sized data sample, making the test stricter and less likely to produce a significant result when one does not exist. 

We can now follow this significant result with post-hoc tests. An appropriate post-hoc test, as we cannot assume equal variances due to the significant Levene's test would be Games-Howell. 

### Games-Howell post-hoc tests

Before conducting the Games-Howell post-hoc comparisons, it is useful to get a printout of the mean scores for the three groups. Having these available helps with intepretation of the results. 

In [9]:
# Printing the mean scores on the operational uplifts DV for each of the three length of service groups. 

print(f"Upto 10 mean oper: {cat_oper_upto10.mean():.2f}")
print(f"10 - 20 mean oper: {cat_oper_10_20.mean():.2f}")
print(f"20 Plus mean oper: {cat_oper_20plus.mean():.2f}")

Upto 10 mean oper: 3.05
10 - 20 mean oper: 2.93
20 Plus mean oper: 2.64


In [10]:
# Next conducting Games-Howell pairwise comparisons. 

pg.pairwise_gameshowell(data = uplifts, dv = 'operupm', between = 'service_cat', effsize = 'hedges')

Unnamed: 0,A,B,mean(A),mean(B),diff,se,T,df,pval,hedges
0,10-20,20plus,2.934386,2.64,0.294386,0.142973,2.059029,66.900971,0.106298,0.448419
1,10-20,upto10,2.934386,3.051724,-0.117338,0.102443,-1.145396,107.981907,0.488436,-0.212574
2,20plus,upto10,2.64,3.051724,-0.411724,0.135174,-3.045889,57.22931,0.009667,-0.690258


The above pairwise comparisons table indicates that there is a significant difference between the 20plus service group and the upto 10 years service group, however, all other comparisons are non-significant. We could report these results as below:

- Upto 10 v 10-20 years: t(107) = 1.15, p = 0.49. No significant between these groups in terms of mean uplift scores (Upto10: Mean = 3.05; 10-20 years: Mean = 2.93). 
- 10-20 years v 20plus years: t(67) = 2.06, p = 0.11. No significant difference between these two service groups in terms of mean uplift scores (10-20 years: Mean = 2.93; 20plus years: Mean = 2.64).
- Upto 10 v 20plus years: t(57) = 3.05, p = 0.009. A statistically significant difference was found between these two length of service groups with the Upto 10 years service group have a significantly higher operational uplift scores (Upto10: Mean = 3.05) than the 20plus years service group (20plus years: Mean = 2.64).
