# Levene Test
*By P. Stikker*<br>
https://PeterStatistics.com<br>
https://www.youtube.com/stikpet<br>

## Introduction

The one-way ANOVA results show if the nominal variable has an influence on the mean results in the scale variable. The next thing to do would then be to find out which category are significantly different from each other. A post-hoc analysis that will compare each possible pair.

Since we are doing multiple tests, we have a danger of making a wrong decision of 5% each time. Although this seems low, it can quickly compound to making at least one wrong decision. Therefor the regular significance for each pairwise test gets adjusted. There are various methods to do this adjustment,  and they are divided into two camps: those if the variance is the same in the population, and those for if the variances are not the same. Before we can go to testing the means, we therefor first need to test if the variances in the population could be the same. One possible test for this is the Levene F-test (Levene, 1960).

## Example

To show an example, I'll load some data as a pandas dataframe. So I'll need the '<a href="https://pandas.pydata.org">pandas</a>' library:

In [1]:
#!pip install pandas
import pandas as pd

And then load the example data using the <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html">'read_csv'</a>. 

In [2]:
myDf = pd.read_csv('../../Data/csv/StudentStatistics.csv')
myDf.head()

Unnamed: 0,RespNr,Location,OAA_ObjCourse,OAA_ObjClass,OAA_CourseExec,OAA_RelActObj,OAA_RelActExa,OAA_RelObjExa,OAA_LearProcAct,OAA_LearProcPrep,...,Over_Grade,Over_Strong,Over_Impr,Gen_Gender,Gen_Age,Gen_SecSchool,Gen_Classes,Gen_NumberSubj,Gen_Time,Comments
0,1.0,Rotterdam,Fully Disagree,Fully Disagree,Fully Disagree,Disagree,Fully Disagree,Fully Disagree,Fully Disagree,Fully Disagree,...,20.0,"None, if there was a teacher that teaches how ...",A better teacher/teaching method,Female,22.0,,,Fully agree,20 < 30,Even when I revise my work I still cannot unde...
1,2.0,Haarlem,Disagree,Disagree,,Fully Disagree,Neither disagree nor agree,Agree,Disagree,Neither disagree nor agree,...,50.0,Blackboard,More motivation! Clearer explanation in class,Male,,The Netherlands,6.0,Disagree,10 < 20,"If the survey is anonymous, there shouldn't be..."
2,3.0,Diemen,Fully agree,Fully agree,Agree,Fully agree,Fully agree,Fully agree,Fully agree,Agree,...,80.0,Notably it has motivated alot about my study c...,,Male,37.0,Africa,7.0,Agree,10 < 20,
3,4.0,Rotterdam,Fully Disagree,Neither disagree nor agree,Disagree,Neither disagree nor agree,Neither disagree nor agree,Fully Disagree,Fully Disagree,Neither disagree nor agree,...,15.0,The clearly layout of every subject eacht week,The explanation of the teacher and motivation,Female,24.0,The Netherlands,6.0,Agree,10 < 20,Practice exams
4,5.0,Haarlem,Disagree,Agree,Fully Disagree,Neither disagree nor agree,Fully agree,Fully agree,Neither disagree nor agree,Fully agree,...,40.0,The online learning material,Classes were just really bad and were very con...,Male,19.0,The Netherlands,7.0,Fully agree,10 < 20,


The example will use as a nominal field the 'Location', and as scale field the 'Over_Grade' (the overall grade the student gave for the course). So we'll select those.

In [3]:
myNom = myDf['Location']
myScale = myDf['Over_Grade']

Then, we create a list of booleans (true/false) that is True for each location:

In [4]:
myCat1 = myNom == 'Diemen'
myCat2 = myNom == 'Haarlem'
myCat3 = myNom == 'Rotterdam'

And finally create a list of each scores per category, using those boolean lists:

In [5]:
myCatScores1 = myScale[myCat1].dropna()
myCatScores2 = myScale[myCat2].dropna()
myCatScores3 = myScale[myCat3].dropna()

To perform the Levene test we can then import the '<a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.levene.html">levene</a>' function from the '<a href="https://docs.scipy.org/doc/scipy/reference/stats.html">stats</a>' library in the '<a href="https://www.scipy.org/">scipy</a>' package.

In [6]:
from scipy.stats import levene

To use the function, we simply fill in the different lists with scores:

In [7]:
levene(myCatScores1, myCatScores2, myCatScores3, center='mean')

LeveneResult(statistic=4.683385718785412, pvalue=0.014199623741220082)

The results show a p-value of 0.0142. This is the chance of a W value as in the sample, or even more extreme, if the variances would be equal in the population. The chance is below .05 (the usual threshold) so the assumption is rejected, and we conclude that the variances in the population are most likely not equal.

Pingouin also has a 'levene' function, but that actually performs a Brown–Forsythe test. 

## Degrees of Freedom
Usually when reporting the test results, the degrees of freedom need to be added. The W-value of the Levene test follows a F-distribution. For its degrees of freedom we have the following formulas:


\begin{equation*}
df_{between} = k - 1
\end{equation*}
\begin{equation*}
df_{within} = n - 1
\end{equation*}
\begin{equation*}
df_{total} = n - 1
\end{equation*}

The $k$ is the number of categories, which we can simply get by taking the list of unique scores in the nominal field, using '<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.unique.html">unique</a>', and then Python's '<a href="https://docs.python.org/3/library/functions.html#len">len</a>' to get the length (i.e. the number of elements).

The $n$ is the total number of scores. We can get this by creating a cross table from the nominal and scale variable, using Pandas '<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html">crosstab</a>', then sum twice (the first gets the row totals, then the sum of those will be the grand total.

So first we get k and n:

In [8]:
k = len(pd.unique(myNom))
n = pd.crosstab(myNom, myScale).sum().sum()
k, n

(3, 48)

The degrees of freedom are then easily found by filling out the formulas:

In [9]:
dfBetween = k - 1
dfWithin = n - k
dfTotal = n - 1

dfBetween, dfWithin, dfTotal

(2, 45, 47)

The df<sub>between</sub> and df<sub>within</sub> are the ones usually reported. See https://PeterStatistics.com for more details on reporting the results.

In the appendix I'll show how you can avoid almost packages entirely by going over the formulas of the Levene test. Only for the F-distribution itself a package is still needed then.

## References

Levene, H. (1960). Robust tests for equality of variances. In I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow, & H. B. Mann (Eds.), *Contributions to probability and statistics: Essays in honor of Harold Hotelling* (pp. 278–292). Stanford University Press.

## Appendix: The Hard Way

First we create true Python lists of the three lists of scores, and store them in a separate list as one:

In [None]:
scores1 = list(myCatScores1)
scores2 = list(myCatScores2)
scores3 = list(myCatScores3)

scores = [scores1, scores2, scores3]
print(scores)

We can also have the number of categories using Python's '<a href="https://docs.python.org/3/library/functions.html#len">len</a>':

In [None]:
k = len(scores)
k

We will need the mean of each category. The formula for the mean is:

\begin{equation*}
\bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i} 
\end{equation*}

Where $\bar{x}_i$ indicates the mean of category i, $x_{i,j}$ the j-th score in category i, and $n_i$ the number of scores in category i.

First the numerator of this fraction. It contains the sum for each category, and lets also keep the number of scores in each category:

In [None]:
ns = []
sums = []

for i in scores:
    ns.append(len(i))
    sums.append(sum(i))

ns, sums

The means can now simply be calculated:

In [None]:
means = []

for i in range(k):
    means.append(sums[i]/ns[i])

means

We'll now convert each score using the following formula:

\begin{equation*}
z_{i,j} = |x_{i,j} - \bar{x}_i|
\end{equation*}

Where $x_{i,j}$ is the j-th score in category i, and $\bar{x}_i$ the mean of category i. The |...| indicate that we take the absolute value of those.

In [None]:
zScores = []
for i in range(k):
    zScoresCat = []
    for j in range(len(scores[i])):
        z = abs(scores[i][j] - means[i])
        zScoresCat = zScoresCat + [z]
    zScores = zScores + [zScoresCat]
    
print(zScores)    

In [None]:
print(zScores)

We will need the mean of each category again, but now for those z-values. The formula for the mean is:

\begin{equation*}
\bar{z}_i = \frac{\sum_{j=1}^{n_i} z_{i,j}}{n_i} 
\end{equation*}

Where $\bar{z}_i$ indicates the mean of z-scores from category i, $z_{i,j}$ the j-th z-score in category i, and $n_i$ the number of z-scores in category i.

First the numerator of this fraction. It contains the sum for each category. We can re-use the number of scores we stored earlier:

In [None]:
sums = []
for i in zScores:
    sums.append(sum(i))

sums

We can now replace our old means, with the new ones based on the z-scores:

In [None]:
means = []
for i in range(k):
    means.append(sums[i]/ns[i])

means

Next we'll need the sum of squares within, which has the scary formula of:

\begin{equation*}
SS_w = \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left( z_{i,j} - \bar{z}_i \right)^2
\end{equation*}

But we actually have all the information we need for this:

In [None]:
SSw = 0

for i in range(k):
    for j in range(ns[i]):
        SSw = SSw + (zScores[i][j] - means[i])**2

SSw

We will also need the mean of all z-scores:

\begin{equation*}
\bar{z} = \frac{\sum_{i=1}^{n} z_i}{n}
\end{equation*}

where

\begin{equation*}
n = \sum_{i=1}^{k} n_i
\end{equation*}


In [None]:
nTot = sum(ns)
sumTot = sum(sums)
meanTot = sumTot / nTot

nTot, sumTot, meanTot

Then we can calculate the sum of squares between using:

\begin{equation*}
SS_b = \sum_{i=1}^{k} n_i \times \left( \bar{z}_i - \bar{z} \right)^2
\end{equation*}

Again, we have all we need already so:

In [None]:
SSb = 0
for i in range(k):
    SSb = SSb + ns[i] * (means[i] - meanTot)**2
    
SSb

Almost there. We need those degrees of freedom. As before:

\begin{equation*}
df_b = k - 1
\end{equation*}


\begin{equation*}
df_w = n - k
\end{equation*}



In [None]:
dfb = k - 1
dfw = nTot - k

dfb, dfw

The mean square within is then defined as:

\begin{equation*}
MS_w = \frac{SS_w}{df_w}
\end{equation*}


In [None]:
MSw = SSw / dfw
MSw

And the mean square between as:


\begin{equation*}
MS_b = \frac{SS_b}{df_b}
\end{equation*}



In [None]:
MSb = SSb / dfb
MSb

Finally the f-value is given by:

\begin{equation*}
F = \frac{MS_b}{MS_w}
\end{equation*}


In [None]:
F = MSb / MSw
F

To find the corresponding p-value we need the f-distribution. This becomes a bit too complicated to work out so here's where we will need a package. The scipy.stats module has an '<a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f.html">f<a>' function, so lets import that one.

In [None]:
from scipy.stats import f

Then to find the corresponding p-value we can use the 'sf' function, with the F-value, the degrees of freedom between, and the degrees of freedom within:

In [None]:
f.sf(F, dfb, dfw)

Et voila. The same as we saw when using the levene function from scipy.