# Tests of Assumptions for ANOVA

Before running an Analysis of Variance (ANOVA) statistical model there are a number of assumptions that should be checked in relation to the data. Standard ANOVA tests assume that the data being compared is suitable for parametric statistical tests. It is often worth checking the data meets these assumptions before running an ANOVA as unsuitable data can lead to erroneous results and findings. 

### Common tests of assumptions for ANOVA are:

- Tests of homogeneity of variance (equal variances) for independent groups (between-subjects) designs. This is to check the assumption that the different groups that are being compared have the same (roughly equal) variances around the group means. A frequently used test for this is Levene's test for homogeneity of variance. 

- Tests of sphericity for repeated measures (within-subjects) designs. This is the repeated measures equivalent of the independent groups test for homogeneity of variance. Sphericity can be defined as the assumption that the variances of differences between scores across groups for data taken from the same participant are equal. A commonly used test of sphericity in repeated measures designs is Mauchly's test of sphericity. 

- Tests of normality. These tests assess the extent to which the distribution of scores for a variable differs from a normal distribution. Normally distributed data is an assumption for all parametric statistical tests for it is necessary for both independent groups and repeated measures ANOVA designs. The Shapiro -Wilk test is a test of normality, where a significant value indicates deviation of the data from a normal distribution. 

All of the above tests have limitations and there are many debates about their value. For example, statistical tests assessing both normality and the equality of variances can be sensitive to differences in sample size, with larger sample sizes making it possible to detect as statistically significant quite small differences (in terms of variance or deviation from normality) that will not actually be problematic or invalidate the results of the ANOVA model. 

In this notebook I will demonstrate how to conduct Levene's test (Homogeneity of variance), Mauchly's test (Sphericity), and the Shapiro-Wilk test (Normality) using the scipy and pingouin statistical packages in python. 

## Levene's Test for Homogeneity of Variance

I will conduct the Levene's test using the levene method from the scipy.stats library. This allows me to specify that the mean is used as the measure of centre for the test. If we use the pingouin package to run Levene's test, by default it uses the median and is thereby conducting a Brown-Forsyth test. In this case we just want to use the mean for a standard Levene's test, so will use scipy.stats for this. 

In [1]:
# Importing the key software libraries.

import pandas as pd
from scipy.stats import levene

In [2]:
# Importing a dataset to work with. 

df_ig = pd.read_csv('Puppies.csv')

df_ig

Unnamed: 0,Person,Dose,Happiness
0,1,1,3
1,2,1,2
2,3,1,1
3,4,1,1
4,5,1,4
5,6,2,5
6,7,2,2
7,8,2,4
8,9,2,2
9,10,2,3


As we can see this is only a small dataset with 15 participants. It contains a categorical variable 'Dose' which is currently shown as numerical values (1, 2, and 3). These three values correspond to categories of Low, Medium, and High. The first thing I will do is create a further variable where these numbers are actually labelled with the appropriate category name. 

In [3]:
# Using the series.map function to add labels to the Dose variable.

a = [1, 2, 3]
b = ["Low", "Medium", "High"]

df_ig['Dose_cat'] = df_ig['Dose'].map(dict(zip(a, b)))

df_ig

Unnamed: 0,Person,Dose,Happiness,Dose_cat
0,1,1,3,Low
1,2,1,2,Low
2,3,1,1,Low
3,4,1,1,Low
4,5,1,4,Low
5,6,2,5,Medium
6,7,2,2,Medium
7,8,2,4,Medium
8,9,2,2,Medium
9,10,2,3,Medium


In [4]:
# I want to compare variance around the mean happiness score for each of the groups, so to do this I will
# create three new variable objects that can then be passed to the scipy levene method for comparison.

cat = df_ig['Dose_cat']
scale = df_ig['Happiness']

In [5]:
# Next creating three boolean objects spltting cat up by level/ condition.

cat_1 = cat == 'Low'
cat_2 = cat == 'Medium'
cat_3 = cat == 'High'

In [6]:
# Finally create a list of each happiness score broken down by category. Using the above booleans.

cat_happ_low = scale[cat_1]
cat_happ_med = scale[cat_2]
cat_happ_high = scale[cat_3]

In [7]:
# Now running the Levene's test by passing the above three happiness scores by group objects.
# Specifying the mean as measure of centre.

levene(cat_happ_low, cat_happ_med, cat_happ_high, center = 'mean')

LeveneResult(statistic=0.09169054441260736, pvalue=0.9130204455480186)

### Degrees of freedom when using scipy levene

We can see above that we have a Levene's statistic of F = 0.09 and an associated p-value of p = 0.91. So, this was a non-significant Levene's test. The hull hypothesis for Levene's test is that the variances between the groups are equal, which is the assumption we are seeking to meet to be able to conduct an ANOVA test on the data. Here, as the result was non-significant we can assume homogeneity of variance for our data and are probably safe to run and interpret a standard ANOVA test of whatever our research hypothesis was.

Note here that the output for Levene's test using scipy does not give us the degress of freedom. To formally report this result we would want to also report the degrees of freedom as the test statistic for the Levene's test follows an F-distribution. The degrees of freedom can be calculated as follows:

- df(between) = k - 1 
- df(within) = n - k 
- df(total) = n - 1

Where k is the number of categories/ groups and n is the total number of scores/ participants.

The degrees of freedom can easily be calculated on a small dataset like that above with only 15 participants and 3 groups.

- df(between) = 3 - 1 = 2 
- df(within) = 15 - 3 = 12 
- df(total) = 15 - 1 = 14

To formally report the above result of the Levene's test we only need to include the df(between) and df(within) and so would format the result of Levene's test as: F(2, 12) = 0.09, p = 0.91.

We could create some formulas to work out the degrees of freedom for us. This may be helpful with a large dataset with a large number of groups being comapred. The code below illustrates how to do this using the existing dataset.

In [8]:
# Firstly, we find k (number of groups) and n (number of scores or participants).

k = len(pd.unique(cat))
n = pd.crosstab(cat, scale).sum().sum()

k, n

(3, 15)

In [9]:
# With these values we can find the degrees of freedom by adding the objects k and n into the degrees of freedom formula above.

df_between = k - 1
df_within = n - k
df_total = n - 1

df_between, df_within, df_total

(2, 12, 14)

## Mauchly's Test of Sphericity

As stated in the opening section, Mauchly's test of Sphericity is a repeated measures equivalent to the Levene's test that looks to see if the variance of difference scores taken from the same participant between all pairs of groups in the categorical variable are equal. 

Below I demonstrate Mauchly's test using the sphericity method from the pingouin package. 

In [10]:
# Importing pingouin.

import pingouin as pg

In [11]:
# Importing a dataset with a repeated measures categorical variable to analyse. 

df_eda = pd.read_csv('EDA Data Long Thin.csv')

df_eda.head()

Unnamed: 0,PersonID,Framing,Questions_LTCol,Arousal
0,1,1,1,1.277222
1,2,1,1,6.444483
2,3,1,1,1.082127
3,4,1,1,2.704305
4,5,1,1,2.852132


In [12]:
# Running Mauchly's test using the pg.sphericity method.

pg.sphericity(data = df_eda, dv = 'Arousal', subject = 'PersonID', within = 'Questions_LTCol')

SpherResults(spher=False, W=0.9177060268963375, chi2=6.698497399305895, dof=2, pval=0.03511072289245792)

The result shows Mauchly's test to be statistically significant (p = 0.04) indicating that the assumption of sphericity has been violated. Like Levene's test, as a test of assumption the null hypothesis for Mauchly's is that the difference scores across categories are the same. As such, a significant Mauchly's test means we cannot assume this and probably need to conduct some type of robust statistical analysis to test our research hypothesis, rather than run a standard ANOVA. 

You can see from the above output that Mauchly's tests returns a test statistic (W), a chi2 value, degrees of freedom for the chi2 test, a p-value (pval), and a boolean value indicating that the assumption of sphericity is False. This could be formally reported as follows:

Mauchly's W: X2(2) = 6.70, p = 0.04. 

## Shapiro - Wilk Test of Normality

The Shapiro - Wilk test can be used to test the normality of a variables distribution. Below I conduct a test on a variable from an independent groups design and to check for normality at each level of a repeated measures factor. 

In [13]:
# Test of normality on the happiness variable from the puppies dataset. 

pg.normality(df_ig['Happiness'])

Unnamed: 0,W,pval,normal
Happiness,0.953505,0.581245,True


This returns a pandas table of values showing the Shapiro - Wilk test statistic (W), the p-value, and a Boolean value stating whether the variable's distribution can be classed as normal (True/ False). 

As a test of assumption, the null hypothesis for the test is that the distribution is normal. A significant result would indicate deviation from normality. In this case the result is non-significant indicating that it is probably safe to assume the variable does not deviate excessively from a normal distribution. The could be formally reported as:

W = 0.95, p = 0.58. 

In [14]:
# Test of normality for the repeated measures design. Note here that we need to include the data, the DV and the 
# categorical/ grouping variable as parameters. 

pg.normality(data = df_eda, dv = 'Arousal', group = 'Questions_LTCol')

Unnamed: 0_level_0,W,pval,normal
Questions_LTCol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,0.988241,0.680608,True
2,0.984343,0.437271,True
3,0.983814,0.408795,True


The output we get is similar to when we just used a single scale variable. This time, however, each group of the categorical variable has been assessed for normality individually. In this case we can see that the distribution of scores for all three groups appears to not deviate from normality as all three have non-significant p-values. 