# One-way Repeated Measures ANOVA

ANOVA allows us to compare the differences between multiple group means on some scale dependent variable whilst controlling for the family-wise error rate. Repeated Measures (or within-subjects) ANOVA allows us to compare multiple groups when the scores within those groups all come from the same participants recorded at different time points.  

In this notebook I will demonstrate how to run a one-way repeated measures ANOVA using the AnovaRM method from the statsmodels library. This is a design with one independent variable(IV), with 3 groups, and a scale dependent variable (DV). 

An important thing to note when running this analysis is that a participant/ subject identification number variable (column) is needed to that the structure that exists in the data from the fact that participants contribute multiple datapoints can be taken into account. To make the analysis easier to run it is helpful to have the data in a 'long and thin' format with the DV measure contained in one variable (column) rather than spread over multiple columns, and have a second variable with multiple levels for the grouping (IV) variable. 

In [1]:
# Starting with the key software library imports.

import numpy as np
import pandas as pd
from statsmodels.stats.anova import AnovaRM

In [2]:
# Importing the dataset for use in the analysis.

eda_df = pd.read_csv('EDA Data Long Thin.csv')

eda_df.head()

Unnamed: 0,PersonID,Framing,Questions_LTCol,Arousal
0,1,1,1,1.277222
1,2,1,1,6.444483
2,3,1,1,1.082127
3,4,1,1,2.704305
4,5,1,1,2.852132


The variable Question_LTCol has three levels (Lie, Truth, Col_Sqr) represeting when a participant answered a question by lying, telling the truth or completed a control taks of looking at a coloured sheet of paper. 

We are going to look at differences in mean arousal, as measured through electro-dermal activity (EDA) between these three conditions. So the IV is the questions variable, with three levels, and the DV is the arousal variable which is a scale measure of mean arousal, for each participant, in each of the three questions conditions. 

In [3]:
# Running the RM Anova analysis. First specifying the parameters for the model, then fitting the model, and printing the result.

aovrm = AnovaRM(eda_df, 'Arousal', 'PersonID', within = ['Questions_LTCol'])
res = aovrm.fit()

print(res)

                    Anova
                F Value Num DF  Den DF  Pr > F
----------------------------------------------
Questions_LTCol 14.2167 2.0000 158.0000 0.0000



We get a very simple table of output for the RM ANOVA model. We can see above that the analysis returned a statistically significant result: F(2, 158) = 14.22, p<0.001. 

As this result was significant we would then need to perform post-hoc follow-up analysis to identify the levels of the IV that were significantly different in terms of mean arousal. 