# PCHN63112 Workshop: Linear Models with Correlated Errors
In the associated lesson, we examined the theory behind traditional methods for analysing repeated measures data. In that lesson, we introduced the paired $t$-test and then discussed the one-way repeated measures ANOVA as a generalisation of the paired $t$-test. In this workshop, we will take things further and see how more complex repeated measures ANOVAs fit into this framework. As we will see, things start to become much more complicated, providing even further motivation for finding a better approach. The examples here are therefore *not* an endorsement of the repeated measures ANOVA. Rather, they serve as the clearest evidence of why you probably want to steer clear of this method.

## Factorial Repeated Measures ANOVA Models
In the associated lesson, we examined the simplest case of the repeated measures ANOVA. Despite the simplicity of the design, we saw how this analysis had several complications around the correct partition of the error, as well as the assumptions made about the covariance structure. These alone were enough to suggest that the repeated measures ANOVA framework was problematic to apply in practice. Yet, there are even more complex situations where this framework can be applied. In this workshop, we will see how the repeated measures ANOVA is used in situations where there are additional *between-subjects* factors, as well as multiple *within-subject* factors.

## Adding Between-subjects Factors
The first complexity we may come across is when we have a *between-subjects* factor alongside the repeated measurements. For example, the `datarium` package contains the dataset `anxiety`. Here, repeated measurements of anxiety have been taken at 3 different time points. The 45 subjects are split between 3 different exercise regimes and the experimental question concerns the relationship between exercise `group` and `time` on anxiety `score`. So, `time` is the repeated measurement and `group` is the between-subjects factor. This is effectively a $3 \times 3$ ANOVA, as illustrated in the table below. 

|             | Group: Low | Group: Moderate | Group: High | 
|-------------|------------|-----------------|-------------|
| **Time: 1** | $\mu_{11}$ | $\mu_{12}$      | $\mu_{13}$  |
| **Time: 2** | $\mu_{21}$ | $\mu_{22}$      | $\mu_{23}$  |
| **Time: 3** | $\mu_{31}$ | $\mu_{32}$      | $\mu_{33}$  |

Our interest falls on the main effect of `group`, main effect of `time` and the `group:time` interaction. Based on what we covered in the lesson, we can apply the following two-way ANOVA model with partitioned errors

$$
\begin{alignat*}{1}
    y_{ijk}    &=    \mu + \alpha_{j} + \beta_{k} + (\alpha\beta)_{jk} + S_{i} + \eta_{ijk} \\
    S_{i}      &\sim \mathcal{N}(0,\sigma^{2}_{b}) \\ 
    \eta_{ijk} &\sim \mathcal{N}(0,\sigma^{2}_{w})
\end{alignat*}
$$

Here we have added a term for the *between-subject* effect (denoted $\beta_{k}$), as well as the *between* $\times$ *within* interaction (denoted $(\alpha\beta)_{jk}$). So now $i$ indexes the subject ($i = 1,\dots,45$), $j$ indexes the repeated measurements ($j = 1,\dots,3$) and $k$ indexes the exercise group ($k = 1,\dots,3$).

We can see this dataset below in its original form

In [1]:
library('datarium')
data('anxiety')
print(head(anxiety))

  id group   t1   t2   t3
1  1  grp1 14.1 14.4 14.1
2  2  grp1 14.5 14.6 14.3
3  3  grp1 15.7 15.2 14.9
4  4  grp1 16.0 15.5 15.3
5  5  grp1 16.5 15.8 15.7
6  6  grp1 16.9 16.5 16.2


<div class="alert alert-block alert-info"> 
<b>ACTIVITY 1</b> Try to convert these data into <i>long</i>-format yourself, before looking at the code below. You can refer to the examples in the lesson to try and work this out.
</div>

This can be reworked into long-format for univariate modelling using the code below

In [2]:
library('reshape2')

# repeats and number of subjects
t <- 3
n <- 45

# reshape wide -> long
anxiety.long <- melt(anxiety,                 # wide data frame
                     id.vars=c('id','group'), # both id and group stay fixed?
                     variable.name='time',    # name for the new predictor
                     value.name='score')      # name for the new outcome

anxiety.long           <- anxiety.long [order(anxiety.long$id),] # order by ID
rownames(anxiety.long) <- seq(1,n*t)                             # fix row names
anxiety.long$id        <- as.factor(anxiety.long$id)             # id as factor
anxiety.long$group     <- as.factor(anxiety.long$group)          # group as factor
anxiety.long$time      <- as.factor(anxiety.long$time)           # time as factor

print(head(anxiety.long))

  id group time score
1  1  grp1   t1  14.1
2  1  grp1   t2  14.4
3  1  grp1   t3  14.1
4  2  grp1   t1  14.5
5  2  grp1   t2  14.6
6  2  grp1   t3  14.3


So far, nothing has changed from what we have seen previously. However, we now have to think more carefully about the possible error terms and which is most sensible to use for each test. 

### Error Terms for Tests
Recall from the lesson that a repeated measures ANOVA is effectively a linear model with *partitioned errors*. By splitting the errors into $\epsilon_{ijk} = S_{i} + \eta_{ijk}$, we are effectively specifying a model with *two* variance terms. So, the variance function of the model becomes

$$
\text{Var}(y_{ijk}) = \sigma^{2} = \sigma^{2}_{b} + \sigma^{2}_{w}.
$$

The point of doing this was two-fold: 
1. It removes the correlation from the model errors and thus the model now meets the $i.i.d.$ assumptions. 
2. It removes error variance associated with the differences *between* the subjects, allowing any inferential tests to only use the variance associated with measurements *within* a subject.

Practically, this means there are *two* possible error terms available for our hypothesis tests. So, we need to think about which source of variance is *most appropriate* for each test: $\sigma^{2}_{b}$ or $\sigma^{2}_{w}$. In the one-way ANOVA example, the only effects were repeated measurement effects and the error term was derived from $\sigma^{2}_{w}$. This makes sense from the name alone, as the *within-subject* effects are tested using the *within-subject* variance.

<div class="alert alert-block alert-info"> 
<b>ACTIVITY 2</b> Given the logic above, which error do you think is most suitable for tests on <i>between-subjects</i> terms?
</div>

### Specifying Between-subjects Error in `R`
As is hopefully now clear, the introduction of a between-subjects effect into the model requires us to use the *within-subject* error term for some tests, as well as the *between-subjects* error term for other tests. As we established in the lesson, we can use `aov()` to express how we want the arithmetic of the ANOVA table to be organised. Again, this is a limitation of the repeated measures ANOVA because this requires some degree of manual intervention to state which terms we want treated as *error* and which we want treated as effects of interest. This can get complex very quickly and it is up to the analyst to indicate which terms are which. 

For this example, the specification is fairly simple because we just denote the subject term as error and `aov()` will automatically arrange the tests for us. This is shown below

In [3]:
anxiety.aov <- aov(score ~ group*time + Error(id), data=anxiety.long)
summary(anxiety.aov)


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)  
group      2  61.99  30.996   4.352 0.0192 *
Residuals 42 299.15   7.123                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)    
time        2  66.58   33.29   394.9 <2e-16 ***
group:time  4  37.15    9.29   110.2 <2e-16 ***
Residuals  84   7.08    0.08                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As we can see, the effect of `group` is now associated with `Error: id`, which represents the estimate of $\sigma^{2}_{b}$. The effect of `time` remains associated with `Error: Within`, which represents the estimate of $\sigma^{2}_{w}$. Notice as well that the interaction term `group:time` also uses $\sigma^{2}_{w}$.

<div class="alert alert-block alert-info"> 
<b>ACTIVITY 3</b> Can you work out why the interaction term is associated with the <i>within-subject</i> error, rather than the <i>between-subjects</i> error? <b>HINT</b> This is quite tricky, but think about the <i>definition</i> of an interaction as well as what happens to the variance of the difference between two correlated random variables.
</div>

We can also calculate the above using `ezANOVA()`, without having to explicitly specify the error terms. We show this below, where we have just printed the ANOVA table for simplicity.

In [4]:
library('ez')
anxiety.ez <- ezANOVA(data=anxiety.long, dv=score, wid=id, within=time, between=group)
print(anxiety.ez$ANOVA) # just the ANOVA table w/o sphericity info

      Effect DFn DFd          F            p p<.05       ges
2      group   2  42   4.351811 1.916093e-02     * 0.1683558
3       time   2  84 394.909490 1.905584e-43     * 0.1785886
4 group:time   4  84 110.187610 1.384653e-32     * 0.1081997


<div class="alert alert-block alert-info"> 
<b>ACTIVITY 4</b> Check the outputs from <code>aov()</code> and <code>ezANOVA()</code> to see the similarities and differences. What aspects is <code>ezANOVA()</code> <i>hiding</i> from you? If you were told to just use <code>ezANOVA()</code>, what aspects of the repeated measures ANOVA would not be immediately obvious?
</div>

### Getting the Errors Wrong
Although both the above approaches are reasonably simple to specify, it is worth considering the implications of getting this *wrong*. To see this, let us specify this model using `lm()` and then look at the traditional ANOVA table.

In [5]:
anxiety.wrong.lm <- lm(score ~ group*time + id, data=anxiety.long)
anova(anxiety.wrong.lm)

Analysis of Variance Table

Response: score
           Df  Sum Sq Mean Sq F value    Pr(>F)    
group       2  61.992  30.996 367.701 < 2.2e-16 ***
time        2  66.579  33.289 394.909 < 2.2e-16 ***
id         42 299.146   7.123  84.494 < 2.2e-16 ***
group:time  4  37.154   9.288 110.188 < 2.2e-16 ***
Residuals  84   7.081   0.084                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Without any further information, both `lm()` and `anova()` treat every term in the formula as part of the *mean function* and thus assumes we are interested in each term for the purpose of inference. As we saw earlier, we can just ignore the tests on `id`. Its inclusion only serves to partitioned $\sigma^{2}_{b}$ out from $\sigma^{2}$. However, this does mean that the `Residuals` term in the ANOVA table only represents $\sigma^{2}_{w}$.

<div class="alert alert-block alert-info"> 
<b>ACTIVITY 5</b> Compare this table to the one from either <code>ezANOVA()</code> or <code>aov()</code>. Look specifically at the <i>F</i>-values and ignore the test on <code>id</code>. What are the biggest discrepancies you can see? Can you explain <i>why</i> before moving on?
</div>

The use of $\sigma^{2}_{w}$ is fine for both the test on `time` and `group:time`, which are no different to the tests produced by `aov()` and `ezANOVA()`. However, the crucial difference comes from the test of `group`. Notice that the *correct* test has $F = 4.35$, whereas the test above has $F = 367.70$. This is approximately 85 times *larger* than it should be! Such a huge discrepancy comes from the fact that the *incorrect* test is using $\sigma^{2}_{w}$ as the denominator, which is *much smaller* than $\sigma^{2}_{b}$. This makes the test *far too liberal*. 

Although both versions of this test are still "significant", it is not hard to imagine a situation where you could get a significant effect of `group` simply by specifying the wrong error. Getting this wrong could be disastrous because we might falsely claim that the between-subjects factor has an effect when it does not. Considering that these factors may correspond to something like patients vs controls, concluding a false difference could have serious implications. You might think this would be well understood, but only relatively recently was it brought up as an issue for fMRI data analysis as if it were some sudden new discovery (see poster 763 on page 67 of the following [Abstracts Booklet](https://www.humanbrainmapping.org/files/2011MeetingFiles/HBM2011AbstractBook.pdf)). This actually shows how *little* the way the repeated measures ANOVA works is understood by most researchers.

We could also get this "wrong" by not partitioning the error at all and instead working with a *pooled* error of $\sigma^{2}_{b} + \sigma^{2}_{w}$ for every test. An example of this is shown below

In [6]:
anxiety.pooled.lm <- lm(score ~ group*time, data=anxiety.long)
anova(anxiety.pooled.lm)

Analysis of Variance Table

Response: score
            Df  Sum Sq Mean Sq F value    Pr(>F)    
group        2  61.992  30.996 12.7536 9.038e-06 ***
time         2  66.579  33.289 13.6973 4.143e-06 ***
group:time   4  37.154   9.288  3.8218  0.005753 ** 
Residuals  126 306.227   2.430                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Notice that, by not acknowledging or accommodating the correlation or different variance sources, many of the tests have got *substantially* weaker. So we have `time`: $F = 394.9 \rightarrow 13.70$ and `time:group`: $F = 110.2 \rightarrow 3.82$. So, by not doing this at all, we kill the power advantage of having a repeated measures experiment in the first place.

## Adding More Within-subject Factors
If all of the above was not bad enough, things get even trickier when we add *more* within-subject factors. We will not dwell on this too much because, as should be clear by now, we do not condone using the repeated measures ANOVA at all. However, this will provide the final clearest demonstration of *why* we want a much better framework for these type of data. The complexity demonstrated below is there to help you understand why you really do *not* want to do this. This is one of clearest cases where the automation provided by software such as SPSS (and, to an extent, `ezANOVA`) actively *hides* so much of the complexity that researchers do not think twice about designing studies that require these methods.

As our example, the `datarium` package contains a dataset called `weightloss` that represents a fully within-subject $2 \times 2 \times 3$ design. There were 12 subjects whose weight was measured under combination of `diet` (`yes` or `no`) and `exercises` (`yes` or `no`). For each combination of `diet` and `exercises`, the trial lasted 9 weeks, with measurements taken at 3 time-points. As such, every subject has 12 repeated measurements, representing every combination of `diet` and `exercises` across 3 values of `time`. The original dataset is shown below

In [7]:
library('datarium')
data('weightloss')
print(head(weightloss))

[38;5;246m# A tibble: 6 × 6[39m
  id    diet  exercises    t1    t2    t3
  [3m[38;5;246m<fct>[39m[23m [3m[38;5;246m<fct>[39m[23m [3m[38;5;246m<fct>[39m[23m     [3m[38;5;246m<dbl>[39m[23m [3m[38;5;246m<dbl>[39m[23m [3m[38;5;246m<dbl>[39m[23m
[38;5;250m1[39m 1     no    no         10.4 13.2   11.6
[38;5;250m2[39m 2     no    no         11.6 10.7   13.2
[38;5;250m3[39m 3     no    no         11.4 11.1   11.4
[38;5;250m4[39m 4     no    no         11.1  9.5   11.1
[38;5;250m5[39m 5     no    no          9.5  9.73  12.3
[38;5;250m6[39m 6     no    no          9.5 12.7   10.4


which we again convert to long format. These data are arranged slight strangely because they are long-format, but per-time point. So `time` is wide-formatted, but both `diet` and `exercise` are long-formatted. This half-way house is not actually very useful for *any* type of modelling. The code below will get these data in a clearer long-format across all the variables.

In [8]:
library('reshape2')

# repeats and number of subjects
t <- 12 # 2 * 2 * 3
n <- 12

# reshape wide -> long
weightloss.long <- melt(weightloss,                         # wide data frame
                        id.vars=c('id','diet','exercises'), # what stays fixed?
                        variable.name="time",               # name for the new predictor
                        value.name="weight")                # name for the new outcome

weightloss.long           <- weightloss.long[order(weightloss.long$id),] # order by ID
rownames(weightloss.long) <- seq(1,n*t)                                  # fix row names

weightloss.long$id        <- as.factor(weightloss.long$id)               # id as factor
weightloss.long$diet      <- as.factor(weightloss.long$diet)             # diet as factor
weightloss.long$exercises <- as.factor(weightloss.long$exercises)        # exercises as factor
weightloss.long$time      <- as.factor(weightloss.long$time)             # time as factor

print(head(weightloss.long))

  id diet exercises time weight
1  1   no        no   t1  10.43
2  1   no       yes   t1  11.12
3  1  yes        no   t1  10.20
4  1  yes       yes   t1  10.43
5  1   no        no   t2  13.21
6  1   no       yes   t2  12.51


<div class="alert alert-block alert-info"> 
<b>ACTIVITY 6</b> Can you think about how these data would be arranged if they were <i>purely</i> wide-formatted? If they were, could you convert them cleanly to long-format, or would you require the specification of <i>new</i> variables?
</div>

### Multiple Within-subject Error Terms
In order to understand what we have to do when there are *multiple* within-subject factors, let us start by specifying the same partitioned error model we used above with `aov()`.

In [9]:
weightloss.aov <- aov(weight ~ diet*exercises*time + Error(id), data=weightloss.long)
summary(weightloss.aov)


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 11   27.1   2.463               

Error: Within
                     Df Sum Sq Mean Sq F value   Pr(>F)    
diet                  1   5.11    5.11   4.069   0.0459 *  
exercises             1  71.16   71.16  56.647 1.03e-11 ***
time                  2 211.39  105.69  84.134  < 2e-16 ***
diet:exercises        1  33.40   33.40  26.586 9.96e-07 ***
diet:time             2   2.42    1.21   0.963   0.3846    
exercises:time        2  67.43   33.72  26.839 2.25e-10 ***
diet:exercises:time   2  30.77   15.39  12.249 1.43e-05 ***
Residuals           121 152.01    1.26                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In this scenario, `diet`, `exercises` and `time` are all using the *same* within-subject error term. But is this appropriate? 

If we focus on the test of `diet`, the error term is $\sigma^{2}_{w}$. This *will* contain the variation associated with the different diet conditions within each subject, which is what we want. However, it will *also* contain variation associated with the different levels of `exercises` and the different levels of `time`. If our focus is `diet` alone, it seems inappropriate to include the uncertainty around `exercises` and `time`. 

Similarly, if we are interested in the `diet:time` interaction, using $\sigma^{2}_{w}$ means including variation associated with both `diet` and `time`, as we want, but will *also* include variation associated with `exercises`. This additional variation is of no relevance if we are only interested in the `diet:time` effect. 

What this means is that whenever there are *multiple* within-subject factors, we can *further partition* $\sigma^{2}_{w}$ into *more error terms* that are *more suitable* for each of these effects. The way this is achieved is by specifying *interactions* between the subject effects and the factors associated with each effect. Doing so will create subject-specific errors for just the terms of interest, averaging-over all other terms. So, for instance, `id:diet` produces subject-specific effects for `diet`, averaging-over both `time` and `exercises`. This produces a more suitable error term for testing the effect of `diet` compared to just using $\sigma^{2}_{w}$. 

If we continue this logic for all other terms, we get an error structure where we can partition $\sigma^{2}_{w}$ into further terms by taking all possible interactions between `subject` and the within-subject factors. Using `aov()`, this becomes

In [10]:
weightloss.aov <- aov(weight ~ diet*exercises*time + 
                               Error(id                + 
                                     id:diet           +
                                     id:exercises      +
                                     id:time           +
                                     id:exercises:time +
                                     id:exercises:diet +
                                     id:time:diet      +
                                     id:time:exercises:diet), 
                        data=weightloss.long)
summary(weightloss.aov)


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 11   27.1   2.463               

Error: id:diet
          Df Sum Sq Mean Sq F value Pr(>F)  
diet       1  5.111   5.111   6.021  0.032 *
Residuals 11  9.337   0.849                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:exercises
          Df Sum Sq Mean Sq F value   Pr(>F)    
exercises  1  71.16   71.16   58.93 9.65e-06 ***
Residuals 11  13.28    1.21                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:time
          Df Sum Sq Mean Sq F value   Pr(>F)    
time       2 211.39  105.69   110.9 3.22e-12 ***
Residuals 22  20.96    0.95                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:exercises:time
               Df Sum Sq Mean Sq F value   Pr(>F)    
exercises:time  2  67.43   33.72   20.83 8.41e-06 ***
Residuals      22  35.62    1.62                     
---
Signif. codes:  0 ‘***’ 0.00

<div class="alert alert-block alert-info"> 
<b>ACTIVITY 7</b> Take a moment to examine the output above. It looks very complicated, because it <i>is</i> very complicated. This is the logical conclusion of a method where error partitioning is performed within a model that only really has <i>one</i> error term and then requires <i>manual assignment</i> of error terms for different tests. 
<br><br>
If this output is surprising, it might be worth convincing yourself that this is <i>exactly</i> how a repeated measures ANOVA works in other software. For instance, consider the output table from SPSS in the image before. Notice that this is a single ANOVA table, but split into multiple <i>sub-tables</i>, depending upon the error terms. Now you know why.
<br><br>
<img src="images/SPSS-ANOVA.png">
</div>

We are making no excuses for this approach. It is hideous to specify, difficult to understand, hard to interpret and very easy to get wrong. You may see examples where a short-hand is used to construct the error terms, as shown below

In [11]:
aov.alt <- aov(weight ~ diet*exercises*time + Error(id/(diet*exercises*time)), data=weightloss.long)

The syntax `id/(diet*exercises*time)` can be read as a request to include the main effect of the term on the *left* of `/` alongside all possible interactions with the terms on the *right* of `/`. You will often see `Error()` code written like this, but it is not very informative unless you understand what this evaluates to, as shown *explicitly* in the earlier example.

In general, it is *not recommended* to use `aov()` like this. Not only is it *difficult* and prone to *mistakes*, but the output is *unwieldly*. A better approach, if you *must* use a repeated measures ANOVA, is the `ezANOVA()` function

In [12]:
library('ez')
weightloss.ez <- ezANOVA(data=weightloss.long, dv=weight, wid=id, within=.(diet,exercises,time))
print(weightloss.ez$ANOVA)

               Effect DFn DFd          F            p p<.05        ges
2                diet   1  11   6.021440 3.202562e-02     * 0.02774675
3           exercises   1  11  58.928078 9.650954e-06     * 0.28434954
4                time   2  22 110.941583 3.218470e-12     * 0.54133853
5      diet:exercises   1  11  75.356051 2.980284e-06     * 0.15716889
6           diet:time   2  22   0.602562 5.561945e-01       0.01332945
7      exercises:time   2  22  20.825889 8.408790e-06     * 0.27352201
8 diet:exercises:time   2  22  14.246076 1.074451e-04     * 0.14663048


The output will match the result of `aov()`, but in a much nicer format, with the additional advantage of the sphericity corrections. 

In [13]:
print(weightloss.ez$`Sphericity Corrections`)

               Effect       GGe        p[GG] p[GG]<.05       HFe        p[HF]
4                time 0.9836155 4.732515e-12         * 1.1960214 3.218470e-12
6           diet:time 0.6893303 5.008306e-01           0.7558161 5.144265e-01
7      exercises:time 0.7596029 7.470601e-05         * 0.8559657 3.105108e-05
8 diet:exercises:time 0.9605626 1.395812e-04         * 1.1594775 1.074451e-04
  p[HF]<.05
4         *
6          
7         *
8         *


However, this approach does hide the fundamental difficulty with assigning tests to different error terms and thus hides much of the complexity and the disadvantages of the repeated measures ANOVA framework. We also have the problem that, by going down this route, we have effectively thrown away the linear model framework we have been working so hard to build. We end up with an ANOVA table and nothing else. Is this is all you want, then so be it. However, the better approaches to this situation, as we will learn in the coming weeks.

## Why Bother with the Repeated Measures ANOVA?
Everything we have discussed in this workshop has been an exercise in telling you why you really *do not want to use a repeated measures ANOVA*. All the unnecessary fiddling with different error terms and different tests is a complication that we could simply do without. In the past, this was the *only* way to deal with repeated measures, but that is no longer the case. Ultimately, the concept of multiple error terms and multiple sources of error variance are perfectly sound. Indeed, we will see that this is the basis of mixed-effects models. However, getting this to work within the linear models framework, and within the traditional ANOVA framework, is fraught with difficulties. The repeated measures ANOVA is one solution to this, but is really an *ad hoc* workaround for an analysis that is ill-suited for this framework. We can use software to hide much of this complexity, but that does not mean it goes away.

Even if we are comfortable generating the correct output using `R`, particularly with access to something like `ezANOVA()`, we are still left with a method that has a number of meaningful restrictions. Perhaps most concerning is the assumption of *compound symmetry*, because this will always be dubious in real-world data. Yes we can use some sort of correction for this, but this correction only applies to the inferential tests (not the model) and remains a further *ad hoc* adjustment to make the framework *approximately* correct.

Taking all this together, we can conclude that the repeated measures ANOVA is tricky to understand, tricky to use correctly and largely inflexible. It is perhaps no wonder that statisticians abandoned this method decades ago. Unfortunately, you are likely to come across people still using the repeated measures ANOVA to this day. You may even work for some of them who will *insist* that you use a repeated measures ANOVA to analyse their data. In those situations, it is useful to (a) motivate the need for something better and (b) understand how to get the ANOVA results in `R`. So, we do not condone the use of the repeated measures ANOVA, but we understand its place in psychology and also understand that there are times where you may need to see what the repeated measures ANOVA says, even if you do not wish to use it. Over the next few weeks, we will show you how to replace this framework with something much better in the form of *mixed-effects models*.