# Higher-order Repeated Measures ANOVA
In the previous part of this lesson, we examined the most simple case of the repeated measures ANOVA. Despite the simplicity of the design, we saw how this analysis had several complications around the correct partition of the error, as well as the assumptions made about the covariance structure. These alone were enough to suggest that the repeated measures ANOVA framework was problematic to apply in practice. Yet, there are even more complex situations where this framework can be applied. In this final part of the lesson, we will see how the repeated measures ANOVA is used in situations where there are additional *between-subjects* factors, as well as multiple *within-subject* factors. This is not to condone the use of the repeated measures ANOVA in these situations, rather it is to help you understand (a) how the repeated measures ANOVA generalises and (b) why an approach method such as mixed-effects will provide a better alternative.

## Adding Between-subjects Factors
The first additional complexity we may come across is when we have a *between-subjects* factor alongside the repeated measurements. For example, the `datarium` package contains the dataset `anxiety`. Here, repeated measurements of anxiety have been taken at 3 different time points. The 45 subjects are split between 3 different exercise regimes and the experimental question concerns the relationship between exercise and time on anxiety. So, `time` is the repeated measurement and `group` is the between-subjects factor. This is effectively a $3 \times 3$ ANOVA, as illustrated in the table below. 

|             | Group: Low | Group: Moderate | Group: High | 
|-------------|------------|-----------------|-------------|
| **Time: 1** | $\mu_{11}$ | $\mu_{12}$      | $\mu_{13}$  |
| **Time: 2** | $\mu_{21}$ | $\mu_{22}$      | $\mu_{23}$  |
| **Time: 3** | $\mu_{31}$ | $\mu_{32}$      | $\mu_{33}$  |

As such, our interest falls on the main effect of `group`, main effect of `time` and the `group:time` interaction. Based on what we have covered do far, we can easily apply the following two-way ANOVA model with partitioned errors

$$
\begin{alignat*}{1}
    y_{ijk}    &=    \mu + \alpha_{j} + \beta_{k} + (\alpha\beta)_{jk} + S_{i} + \eta_{ijk} \\
    S_{i}      &\sim \mathcal{N}(0,\sigma^{2}_{b}) \\ 
    \eta_{ijk} &\sim \mathcal{N}(0,\sigma^{2}_{w})
\end{alignat*}
$$

Here we have added a term for the *between-subject* effect (denoted $\beta_{k}$), as well as the *between* $\times$ *within* interaction (denoted $(\alpha\beta)_{jk}$). So now $i$ indexes the subject ($i = 1,\dots,45$), $j$ indexes the repeated measurements ($j = 1,\dots,3$) and $k$ indexes the groups ($k = 1,\dots,3$).

We can see this dataset below in its original form

In [1]:
library('datarium')
data('anxiety')
print(head(anxiety))

  id group   t1   t2   t3
1  1  grp1 14.1 14.4 14.1
2  2  grp1 14.5 14.6 14.3
3  3  grp1 15.7 15.2 14.9
4  4  grp1 16.0 15.5 15.3
5  5  grp1 16.5 15.8 15.7
6  6  grp1 16.9 16.5 16.2


and then reworked into long-format for univariate modelling

In [2]:
library('reshape2')

# repeats and number of subjects
t <- 3
n <- 45

# reshape wide -> long
anxiety.long <- melt(anxiety,                 # wide data frame
                     id.vars=c('id','group'), # what stays fixed?
                     variable.name='time',    # name for the new predictor
                     value.name='anxiety')    # name for the new outcome

anxiety.long           <- anxiety.long [order(anxiety.long$id),] # order by ID
rownames(anxiety.long) <- seq(1,n*t)                             # fix row names
anxiety.long$id        <- as.factor(anxiety.long$id)             # id as factor

In [3]:
print(head(anxiety.long))

  id group time anxiety
1  1  grp1   t1    14.1
2  1  grp1   t2    14.4
3  1  grp1   t3    14.1
4  2  grp1   t1    14.5
5  2  grp1   t2    14.6
6  2  grp1   t3    14.3


So far, nothing has changed from what we have seen previously. However, we now have to think more carefully about our possible error terms and which is most sensible to use as the denominator for each test. 

### The Between-subjects Error Term
Recall from the previous part of this lesson that a repeated measures ANOVA is effectively a linear model with *partitioned errors*. We stated earlier that by splitting the errors into $\epsilon_{ijk} = S_{i} + \eta_{ijk}$ we are effectively specifying a model with *two* variance terms. So, the variance function becomes

$$
\text{Var}(y_{ijk}) = \sigma^{2} = \sigma^{2}_{b} + \sigma^{2}_{w}.
$$

The point of doing this was two-fold. Firstly, it *removes* the correlation from the model errors and thus the model now meets the $i.i.d.$ assumptions. Secondly, it removes error variance associated with the differences *between* the subjects, allowing any inferential tests to only use the variance associated with measurements *within* a subject. This is obviously the most suitable error to use for inference on the repeated measures because it captures the random fluctuations in measurements *within* an individual. 

Conceptually, the repeated measurements from subject $i$ can be considered multiple draws from a distribution with a variance $\sigma^{2}_{w}$ and a mean $E(y_{ijk}) = \mu_{jk} + S_{i}$. The distribution of each subject is therefore conceptualised as having the *same* variance but a *different* mean, unique to each subject. These means are then further conceptualised as random draws from a larger distribution of *different subjects* with a variance $\sigma^{2}_{b}$. Thus, the variation *between* subjects is represented by this larger distribution, whereas as the variation *within* each subject is represented by the smaller individual distributions. This is illustrated in {numref}`mixed-sampling-fig` for two separate group-level distributions.

```{figure} ./images/mixed-measures-sampling.png
---
width: 600px
name: mixed-sampling-fig
---
An illustration of the sampling model that underlies a repeated measures design with 3 measurements per-subject, as taken from 2 independent groups. Two example subjects are shown for each group. The most important element here is seeing the different variance sources ($\sigma^{2}_{b}$ and $\sigma^{2}_{w}$) and how these correspond to inference for either the between-subjects or within-subject effects.
```

Importantly, we need to think about which measure of *uncertainty* is most suitable for inference about the different elements of this sampling model. For the repeated measurements, we can see that these all come from distributions with variance $\sigma^{2}_{w}$. Thus, our uncertainty around our estimates of these effects is tied to how much we expect them to vary for each subject. This means that $\sigma^{2}_{w}$ is the most suitable error term. Now, in terms of looking at the *group* effects, we can see that these come from distributions with variance $\sigma^{2}_{b}$. These are distributions of the *subject means*, capturing how much the subjects differ from each other. This is parameterised in terms of the subject-specific errors given by $S_{i}$, which capture the magnitude of the discrepancy between each subject and the group average. Thus, the variance of the subject means is given by $\sigma^{2}_{b}$. Any inference about subjects *as a whole* needs to take the variability of the subjects into account. Thus, for inference about the group effects, $\sigma^{2}_{b}$ is the most suitable error term. So, taking this all together, we now have a model where *different effects* require *different error terms*.

### Specifying Between-subjects Error in `R`
As we established earlier, we can use `aov()` to express how we want the arithmetic of the ANOVA table to be organised. Again, this is a serious limitation of the repeated measures ANOVA because this requires some degree of manual intervention to state which terms we want treated as *error* and which we want treated as effects of interest. This can get complex very quickly and it is up to the analyst to indicate which terms are which[^ems-foot]. For this example, the specification is fairly simple because we just denote the subject term as error and `aov()` will automatically arrange the tests for us

In [4]:
anxiety.aov <- aov(anxiety ~ group*time + Error(id), data=anxiety.long)
summary(anxiety.aov)


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)  
group      2  61.99  30.996   4.352 0.0192 *
Residuals 42 299.15   7.123                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)    
time        2  66.58   33.29   394.9 <2e-16 ***
group:time  4  37.15    9.29   110.2 <2e-16 ***
Residuals  84   7.08    0.08                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As we can see, the effect of `group` is now associated with `Error: id`, which represents the estimate of $\sigma^{2}_{b}$. The effect of `time` remains associated with `Error: Within`, which represents the estimate of $\sigma^{2}_{w}$. Notice as well that the interaction term `group:time` also uses $\sigma^{2}_{w}$. This may seem puzzling, given that this contains a comparison *across* the groups. However, remember that when calculating the variance of the difference between two random variables, the covariance is subtracted. Because $\sigma^{2}_{b}$ also acts as the covariance, it cancels-out when comparing the repeated measurements and thus cancels-out when calculating the interaction.

We can also calculate the above using `ezANOVA()`, without having to explicitly specify the error terms, as shown below

In [5]:
library('ez')
anxiety.ez <- ezANOVA(data=anxiety.long, dv=anxiety, wid=id, within=time, between=group)
print(anxiety.ez)

$ANOVA
      Effect DFn DFd          F            p p<.05       ges
2      group   2  42   4.351811 1.916093e-02     * 0.1683558
3       time   2  84 394.909490 1.905584e-43     * 0.1785886
4 group:time   4  84 110.187610 1.384653e-32     * 0.1081997

$`Mauchly's Test for Sphericity`
      Effect         W          p p<.05
3       time 0.8836439 0.07919252      
4 group:time 0.8836439 0.07919252      

$`Sphericity Corrections`
      Effect       GGe        p[GG] p[GG]<.05       HFe        p[HF] p[HF]<.05
3       time 0.8957715 3.484600e-39         * 0.9330916 1.037156e-40         *
4 group:time 0.8957715 1.966104e-29         * 0.9330916 1.461019e-30         *




Although both the above approaches are reasonably simple to specify, it is worth considering the implications of getting this *wrong*. To see this, let us specify this model using `lm()` and then look at the traditional ANOVA table.

In [6]:
anxiety.wrong.lm <- lm(anxiety ~ group*time + id, data=anxiety.long)
anova(anxiety.wrong.lm)

Analysis of Variance Table

Response: anxiety
           Df  Sum Sq Mean Sq F value    Pr(>F)    
group       2  61.992  30.996 367.701 < 2.2e-16 ***
time        2  66.579  33.289 394.909 < 2.2e-16 ***
id         42 299.146   7.123  84.494 < 2.2e-16 ***
group:time  4  37.154   9.288 110.188 < 2.2e-16 ***
Residuals  84   7.081   0.084                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Without any further information, both `lm()` and `anova()` treat every term in the formula as part of the *mean function* and thus assumes we are interested in each term for the purpose of inference. As we saw earlier, we can just ignore the tests on `id`. Its inclusion has partitioned $\sigma^{2}_{b}$ out from $\sigma^{2}$, meaning that the `Residuals` term in the ANOVA table only represents $\sigma^{2}_{w}$. This is fine for both the test on `time` and `group:time`, which are no different to the tests produced by `aov()`. However, the crucial difference comes from the test of `group`. Notice that the *correct* test has $F = 4.35$, whereas the test above has $F = 367.70$. This is approximately 85 times *larger*! Such a huge discrepancy comes from the fact that the *incorrect* test is using $\sigma^{2}_{w}$ as the denominator, which is *much smaller* than $\sigma^{2}_{b}$. This makes the test *far too liberal*. 

Although both versions of this test are still "significant", it is not hard to imagine a situation where you could get a significant effect of `group` simply by specifying the wrong error. Getting this wrong could be disastrous because we might falsely claim that the between-subjects factor has an effect when it does not. Considering that these factors may correspond to something like patients vs controls, concluding a false difference could have serious implications.

We could also get this "wrong" by not partitioning the error at all and instead working with a *pooled* error of $\sigma^{2}_{b} + \sigma^{2}_{w}$ for every test. An example of this is shown below

In [7]:
anxiety.pooled.lm <- lm(anxiety ~ group*time, data=anxiety.long)
anova(anxiety.pooled.lm)

Analysis of Variance Table

Response: anxiety
            Df  Sum Sq Mean Sq F value    Pr(>F)    
group        2  61.992  30.996 12.7536 9.038e-06 ***
time         2  66.579  33.289 13.6973 4.143e-06 ***
group:time   4  37.154   9.288  3.8218  0.005753 ** 
Residuals  126 306.227   2.430                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Notice that, by not acknowledging or accommodating the correlation or different variance sources, many of the tests have got *substantially* weaker. So we have `time`: $F = 394.9 \rightarrow 13.70$ and `time:group`: $F = 110.2 \rightarrow 3.82$. So, by not doing this at all, we kill the power advantage of having a repeated measures experiment in the first place.

## Adding More Within-subject Factors
If all of the above was not bad enough, things get even trickier when we add *more* within-subject factors. We will not dwell on this too much because, as should be clear by now, we do not condone using the repeated measures ANOVA at all. However, this will provide the final clearest demonstration of *why* we want a much better framework for these type of data. The complexity demonstrated below is there to help you understand why you really do *not* want to do this. This is one of clearest cases where the automation provided by software such as SPSS actively *hides* so much of the complexity that researchers do not think twice about designing studies that require these methods.

As our example, the `datarium` package contains a dataset called `weightloss` that represents a fully within-subject $2 \times 2 \times 3$ design. There were 12 subjects whose weight was measured under combination of `diet` (`yes` or `no`) and `exercises` (`yes` or `no`). For each combination of `diet` and `exercises`, the trial lasted 9 weeks, with measurements taken at 3 time-points. As such, every subject has 12 repeated measurements, representing every combination of `diet` and `exercises` across 3 values of `time`. The original dataset is shown below

In [24]:
library('datarium')
data('weightloss')
print(head(weightloss))

[38;5;246m# A tibble: 6 × 6[39m
  id    diet  exercises    t1    t2    t3
  [3m[38;5;246m<fct>[39m[23m [3m[38;5;246m<fct>[39m[23m [3m[38;5;246m<fct>[39m[23m     [3m[38;5;246m<dbl>[39m[23m [3m[38;5;246m<dbl>[39m[23m [3m[38;5;246m<dbl>[39m[23m
[38;5;250m1[39m 1     no    no         10.4 13.2   11.6
[38;5;250m2[39m 2     no    no         11.6 10.7   13.2
[38;5;250m3[39m 3     no    no         11.4 11.1   11.4
[38;5;250m4[39m 4     no    no         11.1  9.5   11.1
[38;5;250m5[39m 5     no    no          9.5  9.73  12.3
[38;5;250m6[39m 6     no    no          9.5 12.7   10.4


and again after conversion to long format

In [28]:
library('reshape2')

# repeats and number of subjects
t <- 12 # 2 * 2 * 3
n <- 12

# reshape wide -> long
weightloss.long <- melt(weightloss,                         # wide data frame
                        id.vars=c('id','diet','exercises'), # what stays fixed?
                        variable.name="time",               # name for the new predictor
                        value.name="weight")                # name for the new outcome

weightloss.long           <- weightloss.long[order(weightloss.long$id),] # order by ID
rownames(weightloss.long) <- seq(1,n*t)                                  # fix row names
weightloss.long$id        <- as.factor(weightloss.long$id)               # id as factor

print(head(weightloss.long))

  id diet exercises time weight
1  1   no        no   t1  10.43
2  1   no       yes   t1  11.12
3  1  yes        no   t1  10.20
4  1  yes       yes   t1  10.43
5  1   no        no   t2  13.21
6  1   no       yes   t2  12.51


### Multiple Within-subject Error Terms
In order to facilitate understanding what we have to do when there are multiple within-subject factors, let us start by specifying the same partitioned error model we used above with `aov()`.

In [9]:
weightloss.aov <- aov(weight ~ diet*exercises*time + Error(id), data=weightloss.long)
summary(weightloss.aov)


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 11   27.1   2.463               

Error: Within
                     Df Sum Sq Mean Sq F value   Pr(>F)    
diet                  1   5.11    5.11   4.069   0.0459 *  
exercises             1  71.16   71.16  56.647 1.03e-11 ***
time                  2 211.39  105.69  84.134  < 2e-16 ***
diet:exercises        1  33.40   33.40  26.586 9.96e-07 ***
diet:time             2   2.42    1.21   0.963   0.3846    
exercises:time        2  67.43   33.72  26.839 2.25e-10 ***
diet:exercises:time   2  30.77   15.39  12.249 1.43e-05 ***
Residuals           121 152.01    1.26                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In this scenario, `diet`, `exercises` and `time` are all using the *same* within-subject error term. But is this appropriate? If we focus on `diet`, $\sigma^{2}_{w}$ will contain the variation associated with the different diet conditions within each subject, which is what we want. However, it will *also* contain variation associated with the different levels of `exercises` and the different levels of `time`. If our focus is `diet` alone, it seems inappropriate to include the uncertainty around `exercises` and `time`. Similarly, if we are interested in the `diet:time` interaction, $\sigma^{2}_{w}$ will contain variation associated with both `diet` and `time`, as we want, but it will *also* include variation associated with `exercises`. This additional variation is of no relevance if we are only interested in the `diet:time` effect. What this means is that whenever there are *multiple* within-subject factors, we can *further partition* $\sigma^{2}_{w}$ into more error terms that are more suitable for each of these effects. 

Going back to `diet` as an example, the errors we want are only those associated with the different levels of `diet` for each subject. This means we want to *average-over* all other repeated measurements associated with both `time` and `exercises`. This can be achieved by specifying an *interaction* between the subject effects and the levels of `diet`. This will create subject-specific errors (just like $S_{i}$) for each level of `diet`. The variance from these errors will then reflect within-subject variation associated with `diet`, ignoring both `time` and `exercises`. If we continue this logic for all other terms, we get an error structure where we can partition $\sigma^{2}_{w}$ into further terms by taking all possible interactions between `subject` and the within-subject factors. Using `aov()`, this becomes

In [10]:
weightloss.aov <- aov(weight ~ diet*exercises*time + 
                               Error(id                + 
                                     id:diet           +
                                     id:exercises      +
                                     id:time           +
                                     id:exercises:time +
                                     id:exercises:diet +
                                     id:time:diet      +
                                     id:time:exercises:diet), 
                        data=weightloss.long)
summary(weightloss.aov)


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 11   27.1   2.463               

Error: id:diet
          Df Sum Sq Mean Sq F value Pr(>F)  
diet       1  5.111   5.111   6.021  0.032 *
Residuals 11  9.337   0.849                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:exercises
          Df Sum Sq Mean Sq F value   Pr(>F)    
exercises  1  71.16   71.16   58.93 9.65e-06 ***
Residuals 11  13.28    1.21                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:time
          Df Sum Sq Mean Sq F value   Pr(>F)    
time       2 211.39  105.69   110.9 3.22e-12 ***
Residuals 22  20.96    0.95                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:exercises:time
               Df Sum Sq Mean Sq F value   Pr(>F)    
exercises:time  2  67.43   33.72   20.83 8.41e-06 ***
Residuals      22  35.62    1.62                     
---
Signif. codes:  0 ‘***’ 0.00

which is both hideous to specify, but also to interpret given that we now have *8* ANOVA tables to deal with! It would also be very easy to get this wrong by missing a term somewhere. Luckily, there is an easier way to write this using

In [11]:
aov(weight ~ diet*exercises*time + Error(id/(diet*exercises*time)), data=weightloss.long)


Call:
aov(formula = weight ~ diet * exercises * time + Error(id/(diet * 
    exercises * time)), data = weightloss.long)

Grand Mean: 12.68132

Stratum 1: id

Terms:
                Residuals
Sum of Squares   27.09682
Deg. of Freedom        11

Residual standard error: 1.569506

Stratum 2: id:diet

Terms:
                    diet Residuals
Sum of Squares  5.111367  9.337474
Deg. of Freedom        1        11

Residual standard error: 0.9213367
5 out of 6 effects not estimable
Estimated effects are balanced

Stratum 3: id:exercises

Terms:
                exercises Residuals
Sum of Squares   71.16328  13.28392
Deg. of Freedom         1        11

Residual standard error: 1.098922
5 out of 6 effects not estimable
Estimated effects are balanced

Stratum 4: id:time

Terms:
                     time Residuals
Sum of Squares  211.38837  20.95943
Deg. of Freedom         2        22

Residual standard error: 0.9760642
6 out of 8 effects not estimable
Estimated effects may be unbalanced

Strat

where the syntax `id/(diet*exercises*time)` can be read as a request to include the main effect of the term on the *left* of `/` alongside all possible interactions with the terms on the *right* of `/`[^nesting-foot]. Importantly, only the *within-subject* factors appear in the `Error()` syntax because our aim is to further partition $\sigma^{2}_{w}$ and *not* $\sigma^{2}_{b}$.

In general, it is not recommended to use `aov()` like this. Not only is it *difficult* and prone to *mistakes*, but the output becomes unwieldly. A better approach, if you *must* use a repeated measures ANOVA, is the `ezANOVA()` function

In [12]:
library('ez')
weightloss.ez <- ezANOVA(data=weightloss.long, dv=weight, wid=id, within=.(diet,exercises,time))
print(weightloss.ez)

$ANOVA
               Effect DFn DFd          F            p p<.05        ges
2                diet   1  11   6.021440 3.202562e-02     * 0.02774675
3           exercises   1  11  58.928078 9.650954e-06     * 0.28434954
4                time   2  22 110.941583 3.218470e-12     * 0.54133853
5      diet:exercises   1  11  75.356051 2.980284e-06     * 0.15716889
6           diet:time   2  22   0.602562 5.561945e-01       0.01332945
7      exercises:time   2  22  20.825889 8.408790e-06     * 0.27352201
8 diet:exercises:time   2  22  14.246076 1.074451e-04     * 0.14663048

$`Mauchly's Test for Sphericity`
               Effect         W          p p<.05
4                time 0.9833425 0.91944157      
6           diet:time 0.5493166 0.05001654      
7      exercises:time 0.6835227 0.14919857      
8 diet:exercises:time 0.9589434 0.81089547      

$`Sphericity Corrections`
               Effect       GGe        p[GG] p[GG]<.05       HFe        p[HF] p[HF]<.05
4                time 0.9836155

The output will match the result of `aov()`, but in a much nicer format, with the additional advantage of the sphericity corrections. However, this approach does hide the fundamental difficulty with assigning tests to different error terms and thus hides much of the complexity and disadvantages of the repeated measures ANOVA framework. We also have the problem that, by going down this route, we have effectively thrown away the linear model framework we have been working so hard to build. We end up with an ANOVA table and nothing else.

## Why We Should *Not* Use RM ANOVA
Everything we have discussed above has really been an exercise in telling you why you really *do not want to use a repeated measures ANOVA*. All the unnecessary fiddling with error terms and different tests requiring different errors is a complication that we could simply do without. Life is much too short to be concerning ourselves with such things, especially as a better alternative *does* exist. In the past, this was the *only* way to deal with repeated measures, but that is no longer the case. Ultimately, the concept of multiple error terms and multiple sources of error variance are perfectly sound. Indeed, we will see that this is the basis of mixed-effects models. However, getting this to work within the linear models framework, and within the traditional ANOVA framework, is fraught with difficulties. The repeated measures ANOVA is a solution to this, but is really an *ad hoc* workaround for an analysis that is ill-suited for this framework. We can use software to hide much of this complexity, but that does not mean it goes away.

Even if we are ok generating the correct output using `R`, particularly with access to something like `ezANOVA()`, we are still left with a method that has a number of meaningful restrictions. Perhaps most concerning is the assumption of compound symmetry, because this will almost never be true in real-world data. Yes we can use some sort of correction for this, but this correction only applies to the inferential tests (not the model) and remains a further ad hoc adjustment to make the framework *approximately* correct. This is somewhat unsatisfying, especially as we would much prefer a method where the covariance matrix could just be *estimated* from the data and be allowed to take on any form it likes. This is particularly true when the correlation between repeated measurements is *negative*, because the repeated measures ANOVA simply *does not allow this*. Consider that $\sigma^{2}_{b}$ is treated as the covariance and this is derived from $\sigma^{2} = \sigma^{2}_{b} + \sigma^{2}_{w}$. Thus, the covariance *is* a variance component and variance can *never be negative*. This is a point that very few people understand about the repeated measures ANOVA.  

Taking all this together we can conclude that the repeated measures ANOVA is both tricky to understand, tricky to use correctly and largely inflexible. It is perhaps no wonder that statisticians abandoned this method decades ago. Indeed, notice the dates around when the sphericity corrections were published: 1959 and 1976. This gives you a hint about when these approaches were last being developed and studied in mathematical statistics. And yet, this is the method that has persisted in psychology until only relatively recently. Indeed, as alluded to already, you are likely to come across people still using the repeated measures ANOVA to this day. You may even work for some of them who will *insist* that you use a repeated measures ANOVA to analyse their data. In those situations, it is useful to (a) motivate the need for something better and (b) understand how to get the ANOVA results in `R`, if absolutely necessary. So, we do not condone the use of the repeated measures ANOVA, but we understand its place in psychology and also understand that there are times where you may need to see what the repeated measures ANOVA says, even if you do not wish to use it. Over the next few weeks, we will show you how to replace this framework with something much better in the form of *mixed-effects models*.

[^submodel-foot]: An alternative perspective here is that each error term represents a different *sub-model*. So, we can think of specifying *multiple* models, some of which require us to *average-over* certain factors. For instance, if we were to average-over the repeated measurements and then fit a model on the resultant outcome variable, this model would automatically have $\sigma^{2}_{b}$ as its error term. This does make the whole procedure feel a little bit less of a hack, however, it is very impractical to do this, especially when the number of factors and interactions gets larger. You can read more about this approach in [McFarquhar (2019)](https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2019.00352/full).

[^noterr-foot]: Note that this is *not* the errors from the linear model, even though the Greek letter is the same.

[^ems-foot]: There is a more principled way of determining the correct error term via calculation of something called the *expected mean squares* (EMS). However, this is an old topic that is not that relevant given that we are not suggesting you actually use the repeated measures ANOVA. If you are curious, the EMS are discussed at length by [McFarquhar (2019)](https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2019.00352/full) and can be calculated for you automatically by the `R` package `EMSaov`.

[^nesting-foot]: This syntax is used to embody the concept of *nesting*, but that is beyond the scope of this lesson. All you really need to know is that `A/B = A + A:B`. So, specifying `S/A = A + S:A` and `S/(A*B) = S/(A + B + A:B) = S + S:A + S:B + S:AB`.