# Higher-order Repeated Measures ANOVA
In the previous part of this lesson, we examined the most simple case of the repeated measures ANOVA. Despite the simplicity of the design, we saw how this analysis had several complications around the correct partition of the error, as well as the assumptions made about the covariance structure. These alone were enough to suggest that the repeated measures ANOVA framework was problematic to apply in practice. Yet, there are even more complex situations where this framework can be applied. In this final part of the lesson, we will see how the repeated measures ANOVA is used in situations where there are additional *between-subjects* factors, as well as multiple *within-subject* factors. This is not to condone the use of the repeated measures ANOVA in these situations, rather it is to help you understand (a) how this method should be used correctly and (b) why an approach method such as mixed-effects will provide a much better alternative.

## Adding Between-subjects Factors
The first additional complexity we may come across is when we have *between-subjects* factor alongside the repeated measurements. For example, the `datarium` package contains the dataset `anxiety`. Here, 3 repeated measurements of anxiety have been taken at 3 different time points. Each subject comes from one of 3 groups practising different exercise regimes. So, `time` is the repeated measurement and `group` is the between-subjects factor. This is effectively a $3 \times 3$ ANOVA, where our interest falls on the main effect of `group`, main effect of `time` and the `group:time` interaction. We can see this dataset below in its original form

In [9]:
library('datarium')
data('anxiety')
print(head(anxiety))

[38;5;246m# A tibble: 6 × 5[39m
  id    group    t1    t2    t3
  [3m[38;5;246m<fct>[39m[23m [3m[38;5;246m<fct>[39m[23m [3m[38;5;246m<dbl>[39m[23m [3m[38;5;246m<dbl>[39m[23m [3m[38;5;246m<dbl>[39m[23m
[38;5;250m1[39m 1     grp1   14.1  14.4  14.1
[38;5;250m2[39m 2     grp1   14.5  14.6  14.3
[38;5;250m3[39m 3     grp1   15.7  15.2  14.9
[38;5;250m4[39m 4     grp1   16    15.5  15.3
[38;5;250m5[39m 5     grp1   16.5  15.8  15.7
[38;5;250m6[39m 6     grp1   16.9  16.5  16.2


and then reworked into long-format for univariate modelling

In [2]:
library('reshape2')

# repeats and number of subjects
t <- 3
n <- 45

# reshape wide -> long
anxiety.long <- melt(anxiety,                 # wide data frame
                     id.vars=c('id','group'), # what stays fixed?
                     variable.name='time',    # name for the new predictor
                     value.name='anxiety')    # name for the new outcome

anxiety.long           <- anxiety.long [order(anxiety.long$id),] # order by ID
rownames(anxiety.long) <- seq(1,n*t)                             # fix row names
anxiety.long$id        <- as.factor(anxiety.long$id)             # id as factor

In [3]:
print(head(anxiety.long))

  id group time anxiety
1  1  grp1   t1    14.1
2  1  grp1   t2    14.4
3  1  grp1   t3    14.1
4  2  grp1   t1    14.5
5  2  grp1   t2    14.6
6  2  grp1   t3    14.3


### The Between-subjects Error Term



In [15]:
anxiety.lm <- lm(anxiety ~ group*time + id, data=anxiety.long)
anova(anxiety.lm)

Analysis of Variance Table

Response: anxiety
           Df  Sum Sq Mean Sq F value    Pr(>F)    
group       2  61.992  30.996 367.701 < 2.2e-16 ***
time        2  66.579  33.289 394.909 < 2.2e-16 ***
id         42 299.146   7.123  84.494 < 2.2e-16 ***
group:time  4  37.154   9.288 110.188 < 2.2e-16 ***
Residuals  84   7.081   0.084                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



... Getting this wrong could actually be disastrous for our inference, because we might falsely claim that the between-subjects factor has an effect because the error term is too small. Considering that these factors could correspond to something like treatment effects in patients and controls, concluding a false difference could have serious implications.

In [13]:
anxiety.aov <- aov(anxiety ~ group*time + Error(id), data=anxiety.long)
summary(anxiety.aov)


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)  
group      2  61.99  30.996   4.352 0.0192 *
Residuals 42 299.15   7.123                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)    
time        2  66.58   33.29   394.9 <2e-16 ***
group:time  4  37.15    9.29   110.2 <2e-16 ***
Residuals  84   7.08    0.08                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So now, even though the effect of `group` is still significant, we can see that this is much weaker when we use the correct error term ($F = 367.70$ vs $F = 4.35$).

## Adding More Within-subject Factors
This is where things start to get *really* tricky. Remember, as we go through this, that this is *not* something you would ever need to do in practice because we are not condoning the use of the repeated measures ANOVA. However, this complexity is shown to help you understand why you really do *not* want to do this. This is one of clearest cases where the automation provided by software such as SPSS actively *hides* so much of the complexity that researchers do not think twice about designing studies that require these approaches. As we will see below, we can also do this using `ezANOVA()`, but using `aov()` forces us to directly address how complex these methods become and why we really want a much more flexible and less cumbersome approach.

In [4]:
library('datarium')
library('reshape2')
data('weightloss')

# repeats and number of subjects
t <- 12
n <- 12

# reshape wide -> long
weightloss.long <- melt(weightloss,                         # wide data frame
                        id.vars=c('id','diet','exercises'), # what stays fixed?
                        variable.name="time",               # name for the new predictor
                        value.name="weight")                # name for the new outcome

weightloss.long <- weightloss.long[order(weightloss.long$id),] # order by ID
rownames(weightloss.long) <- seq(1,n*t)              # fix row names

print(head(weightloss.long))

  id diet exercises time weight
1  1   no        no   t1  10.43
2  1   no       yes   t1  11.12
3  1  yes        no   t1  10.20
4  1  yes       yes   t1  10.43
5  1   no        no   t2  13.21
6  1   no       yes   t2  12.51


### Multiple Within-subject Error Terms

In [5]:
library('ez')
weightloss.ez <- ezANOVA(data=weightloss.long, dv=weight, wid=id, within=.(diet,exercises,time))
print(weightloss.ez)

$ANOVA
               Effect DFn DFd          F            p p<.05        ges
2                diet   1  11   6.021440 3.202562e-02     * 0.02774675
3           exercises   1  11  58.928078 9.650954e-06     * 0.28434954
4                time   2  22 110.941583 3.218470e-12     * 0.54133853
5      diet:exercises   1  11  75.356051 2.980284e-06     * 0.15716889
6           diet:time   2  22   0.602562 5.561945e-01       0.01332945
7      exercises:time   2  22  20.825889 8.408790e-06     * 0.27352201
8 diet:exercises:time   2  22  14.246076 1.074451e-04     * 0.14663048

$`Mauchly's Test for Sphericity`
               Effect         W          p p<.05
4                time 0.9833425 0.91944157      
6           diet:time 0.5493166 0.05001654      
7      exercises:time 0.6835227 0.14919857      
8 diet:exercises:time 0.9589434 0.81089547      

$`Sphericity Corrections`
               Effect       GGe        p[GG] p[GG]<.05       HFe        p[HF]
4                time 0.9836155 4.732515e

In [6]:
weightloss.aov <- aov(weight ~ diet*exercises*time + 
                               Error(id                + 
                                     id:diet           +
                                     id:exercises      +
                                     id:time           +
                                     id:exercises:time +
                                     id:exercises:diet +
                                     id:time:diet      +
                                     id:time:exercises:diet), 
                        data=weightloss.long)
summary(weightloss.aov)


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 11   27.1   2.463               

Error: id:diet
          Df Sum Sq Mean Sq F value Pr(>F)  
diet       1  5.111   5.111   6.021  0.032 *
Residuals 11  9.337   0.849                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:exercises
          Df Sum Sq Mean Sq F value   Pr(>F)    
exercises  1  71.16   71.16   58.93 9.65e-06 ***
Residuals 11  13.28    1.21                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:time
          Df Sum Sq Mean Sq F value   Pr(>F)    
time       2 211.39  105.69   110.9 3.22e-12 ***
Residuals 22  20.96    0.95                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:exercises:time
               Df Sum Sq Mean Sq F value   Pr(>F)    
exercises:time  2  67.43   33.72   20.83 8.41e-06 ***
Residuals      22  35.62    1.62                     
---
Signif. codes:  0 ‘***’ 0.00

So now we have 8 ANOVA tables to deal with!

There is an easier way to write this, using a special character inside the formula

In [7]:
weightloss.aov <- aov(weight ~ diet*exercises*time + Error(id/(diet*exercises*time)), data=weightloss.long)

### Using the `ezANOVA()` Function
As we can see above, using a partitioned error model with `aov()` is a tricky business and it would be very easy to get this wrong. As an alternative, we can use the `ezANOVA()` function from the `ez` package. As the name implies, this is designed to allow for an RM ANOVA without the usual difficulties associated with the `aov()` or `lm()` functions. Unfortuantely, the aim of this package is largely to make the `R` output the same as SPSS. So it does away with the linear model framework. This means, no residuals, no parameter estimates, no diagnostic plots or anything else we have made use of so far. If you *have* to use an RM ANOVA, this is the simplest way to get it *right*. However, as we will discuss below, we would disuade you from ever considering RM ANOVA as an option in the future. About the only utility of this is showing doubtful researchers that our better options of GLS and mixed-effects models are, in fact, giving them the same answer as an RM ANOVA.

In [8]:
library(ez)
ezANOVA()

: [1m[33mError[39m in `ezANOVA()`:[22m
[33m![39m argument "data" is missing, with no default

## Why We Should *Not* Use RM ANOVA
Everything we have discussed above has really been an exercise in telling you why you really do not want to use RM ANOVA. All the unncessary fiddling with error terms and different tests requiring different errors is a complication that we could simply do without. Even if we do manage to successfully work out what needs to go where (or get a function like `ezANOVA()` to sort it for us), we are still left with a method that has a number of meaningful restrictions. ... Because of this, the RM ANOVA is both tricky to understand, tricky to use correctly and massively inflexible. It is no wonder that statisticians abandoned this method decades ago! And yet, this is the method that has persisted in psychology until releatively recently.

... testing assumptions and follow-up tests...

This section has largely been motivational to understand why we want to use something more flexible and more modern, but it is important to recognise that you may well end up working with someone who knows nothing beyond the RM ANOVA. In those situations, it is useful to (a) motivate the need for something better and (b) understand how to get the RM ANOVA results in `R`, in case they require further convincing. So, we do not condone the use of the RM ANOVA, but we understand its place in psychology and also understand that there are times where you may want to see what the RM ANOVA says, even if you do not wish to use it.

[^submodel-foot]: An alternative perspective here is that each error term represents a different *sub-model*. So, we can think of specifying *multiple* models, some of which require us to *average-over* certain factors. For instance, if we were to average-over the repeated measurements and then fit a model on the resultant outcome variable, this model would automatically have $\sigma^{2}_{b}$ as its error term. This does make the whole procedure feel a little bit less of a hack, however, it is very impractical to do this, especially when the number of factors and interactions gets larger. You can read more about this approach in [McFarquhar (2019)](https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2019.00352/full).

[^noterr-foot]: Note that this is *not* the errors from the linear model, even though the Greek letter is the same.