# The Repeated Measures ANOVA
In the previous part of this lesson, we examined the paired $t$-test as the most basic method for dealing with repeated measurements. In doing do, we established that taking the *difference* between the pairs was enough to remove correlation and render the data independent, allowing us to specify a simple one-sample test on the mean difference. In order to understand this, we split our usual single error term into two components, one associated with each subject and one associated with any other deviations. We then saw how the term associated with each subject *cancelled* when the pairs were subtracted, implying that this component captured the correlation between the repeated measurements. Furthermore, by viewing these terms in relation to a partition of the error variance, we established that it is the *between-subjects* variance that is removed by subtraction, leaving only the *within-subject* variance. Because this is smaller than the total variance, and smaller than the between-subjects variance, this rendered the standard error of the estimated mean difference smaller, thus making the test statistic larger. This not only established *why* the paired $t$-test is different to the independent measures $t$-test, but also why the paired $t$-test is typically *more powerful* and thus desirable, from an inferential perspective. 

## Limitations of the Model of *Paired Differences*
Although this *paired differences* framework works well when the data consists of only *two* repeated measurements, we hit a problem if we have *more than two* repeated measurements, *more than one* repeated measures factor or even when we have a mixture of repeated measurements and independent measurements. In many of these cases, we cannot take a simple subtraction and then analyse the resultant differences. So, while the paired $t$-test is fine in simple cases, it does not generalise very easily. 

In order to allow the analysis of more complex experimental situations, we need to abandon the idea of making the data independent through subtraction. Instead, we need to accommodate the correlation *directly* within the model. The general theme of this section of the unit is *mixed-effects models*, which represent the most modern solution to this problem. However, in this part of the lesson we will discuss an older solution in the form of the *repeated measures ANOVA*. This is really a stepping-stone towards mixed-effects models, rather than a recommendation, as the repeated measures ANOVA is rather limited in practice. Nevertheless, this method is still used widely in psychological research and so is still worth understanding.

## The Model of *Partitioned Errors*
As mentioned above, because *subtraction* only works in a number of limited cases, we need to abandon it as our general solution to dealing with dependent data. Instead, we need to work with data that we *know* is correlated. Unfortunately, as we have already established, the linear model framework assumes that the errors of the model are $i.i.d.$, which will not be true under repeated measurements. However, we have already seen a possible solution to this problem, as the removal of the subject terms $S_{i}$ theoretically *removes* the correlation and renders the errors independent. Thus, if we were to *include* the term $S_{i}$ in the model, the errors would meet the $i.i.d.$ criteria. Furthermore, the single error variance assumed by the linear model would be the *within-subject* variance and thus would be suitable for inference on the repeated measurements. Putting all this together gives us the model

$$
\begin{alignat*}{1}
    y_{ij}    &=                      \mu + \alpha_{j} + S_{i} + \eta_{ij} \\
    \eta_{ij} &\overset{i.i.d.}{\sim} \mathcal{N}(0,\sigma^{2}_{w})
\end{alignat*}
$$

which is the basis for the repeated measures ANOVA.

Now, in the specification above, we have treated $S_{i}$ like any other factor in an ANOVA model. However, this is not really correct. As discussed in the previous part of this lesson, $S_{i}$ comes from *splitting* the overall error term $\epsilon_{ij}$. So, in theory, rather than representing population constants (like $\mu$ and $\alpha_{j}$), the $S_{i}$ are a *random variable*. This makes sense because the subjects represent a *random sample*, rather than a fixed quantity. If we were to run the experiment again, $\mu$ and $\alpha_{j}$ would be the *same*, but the $S_{i}$ would be *different*. Rather than $S_{i}$ representing the $i$th level of an $n$-dimensional experimental factor, it is the $i$th *random deviation*. From this perspective, $S_{i}$ is clearly an *additional error term*. Unfortunately, the linear model only has *one* error term. Thus, what we *want* to use is the model

$$
\begin{alignat*}{1}
    y_{ij}    &=    \mu + \alpha_{j} + S_{i} + \eta_{ij} \\
    S_{i}     &\sim \mathcal{N}(0,\sigma^{2}_{b})        \\
    \eta_{ij} &\sim \mathcal{N}(0,\sigma^{2}_{w})
\end{alignat*}
$$

but we cannot, because this would require a method that could flexibly accommodate multiple error terms (which is exactly what *mixed-effects* model do). Instead, the repeated measures ANOVA aims to *replicate* this situation within the confines of a modelling framework that *does not allow it*. As you might imagine, this involves jumping through a number of hoops and places a number of restrictions on what is possible.

### Partitioned Errors in `R`
Before discussing this in more detail, we will demonstrate the general idea in `R` in two ways, one using `lm()` and one using the `aov()` function. We have not covered `aov()` previously, but this is a wrapper for `lm()` that can accommodate partitioned errors.

To begin with, let us specify the model above that contains the subject terms. We will do this using the `mice2` data again, so we can show agreement with the paired $t$-test. However, we need to first convert this data into *long* format.

In [11]:
library('datarium')
library('reshape2')
data('mice2')

# repeats and number of subjects
t <- 2
n <- dim(mice2)[1]

# reshape wide -> long
mice2.long <- melt(mice2,                       # wide data frame
                   id.vars='id',                # what stays fixed?
                   variable.name="time",        # name for the new predictor
                   value.name="weight")         # name for the new outcome

mice2.long <- mice2.long[order(mice2.long$id),] # order by ID
rownames(mice2.long) <- seq(1,n*t)              # fix row names

# make id a factor
mice2.long$id <- as.factor(mice2.long$id)

print(mice2.long)

   id   time weight
1   1 before  187.2
2   1  after  429.5
3   2 before  194.2
4   2  after  404.4
5   3 before  231.7
6   3  after  405.6
7   4 before  200.5
8   4  after  397.2
9   5 before  201.7
10  5  after  377.9
11  6 before  235.0
12  6  after  445.8
13  7 before  208.7
14  7  after  408.4
15  8 before  172.4
16  8  after  337.0
17  9 before  184.6
18  9  after  414.3
19 10 before  189.6
20 10  after  380.3


Now, we can specify the partitioned error model using `lm()` below

In [None]:
rm.mod <- lm(weight ~ time + id, data=mice2.long)
summary(rm.mod)


Call:
lm(formula = weight ~ time + id, data = mice2.long)

Residuals:
    Min      1Q  Median      3Q     Max 
-21.410  -7.155   0.000   7.155  21.410 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  208.610     12.949  16.110 6.06e-08 ***
timeafter    199.480      7.809  25.546 1.04e-09 ***
id2           -9.050     17.460  -0.518   0.6167    
id3           10.300     17.460   0.590   0.5698    
id4           -9.500     17.460  -0.544   0.5996    
id5          -18.550     17.460  -1.062   0.3157    
id6           32.050     17.460   1.836   0.0996 .  
id7            0.200     17.460   0.011   0.9911    
id8          -53.650     17.460  -3.073   0.0133 *  
id9           -8.900     17.460  -0.510   0.6225    
id10         -23.400     17.460  -1.340   0.2130    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.46 on 9 degrees of freedom
Multiple R-squared:  0.987,	Adjusted R-squared:  0.9725 
F-statistic: 68.

Ignoring the subject effects, if we look at the test on `timeafter`, we can see that this agrees with the paired $t$-test from earlier, as repeated below

In [13]:
print(t.test(mice2$before, mice2$after, paired=TRUE))


	Paired t-test

data:  mice2$before and mice2$after
t = -25.546, df = 9, p-value = 1.039e-09
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -217.1442 -181.8158
sample estimates:
mean difference 
        -199.48 



We can tidy this output up a bit by calling `Anova()` on the model

In [None]:
library('car')
print(Anova(rm.mod))

Anova Table (Type II tests)

Response: weight
          Sum Sq Df F value    Pr(>F)    
time      198961  1 652.613 1.039e-09 ***
id          9013  9   3.285   0.04559 *  
Residuals   2744  9                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Where, again, we would ignore the test on `id` and just focus on the effect of `time`. However, the fact that a test on `id` has been produced suggests that this model term is not really being treated correctly. After all, do we want a hypothesis test based on the null that all the subject means are the same? Given that this is a random sample of subjects, how meaningful would this be unless these were the only subjects in the world that we were interested in? As indicated above, the `id` effect is not really an effect of interest in the same way as `time`. In fact, `id` is considered *error*. It would be like specifying a hypothesis test on the model residuals. We are not interested in estimating some universal truth here because what we have are *random deviations* reflective of *error variance*. As such, in much the same way that the residuals are used to estimate the *within-subject* variance, we want the values of `id` to be used to estimate the *between-subjects* variance. 

In order to tell `R` that `id` is an *additional error term*, we can use the `aov()` function. This is a wrapper for `lm()` designed to produce an ANOVA table where some of the model terms represent *error* rather than a traditional ANOVA effect. From an *estimation* perspective, this distinction does not make any difference because ANOVA *mean squares* are already estimates of variance. However, what *does* matter, is whether these mean squares form the *numerator* of an $F$-ratio (variance associated with *mean differences*) or the *denominator* of an $F$-ratio (variance associated with *error*). So, what we are really doing is telling `R` where to place the different mean squares in the ANOVA table.

An example of using `aov()` for the `mice2` data is shown below. Here we use the `Error()` syntax within the model formula to indicate that `id` is an *error term*. This results in *two* ANOVA tables. One where $S_{i}$ forms the error term and one where $\eta_{ij}$ forms the error term. Based on the model structure, `aov()` can work out that `time` should have $\sigma^{2}_{w}$ as its denominator in the $F$-ratio and thus has associated it with the correct ANOVA table. As there are no other terms in the model, the ANOVA table with $id$ as the error is empty.

In [None]:
rm.aov.mod <- aov(weight ~ time + Error(id), data=mice2.long)
print(summary(rm.aov.mod))


Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9   9013    1002               

Error: Within
          Df Sum Sq Mean Sq F value   Pr(>F)    
time       1 198961  198961   652.6 1.04e-09 ***
Residuals  9   2744     305                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Notice that the test results have remained the same throughout all these examples. So, for this simple situation, we could use `t.test(..., paired=TRUE)`, `lm()`, `Anova(lm())`, or `aov(... + Error(id))`. However, as we add between-subjects factors and additional within-subject factors things start to get much more complicated.

### The Implied Covariance Structure

$$
\begin{alignat*}{2}
    y_{ij}             &\sim \mathcal{N}(\mu_{j}, \sigma^{2})             &\quad \text{(Population distribution)} \\
    E(y_{ij})          &=    \mu_{j} = \mu + \alpha_{j}                   &\quad \text{(Mean function)}           \\
    \text{Var}(y_{ij}) &=    \sigma^{2} = \sigma^{2}_{b} + \sigma^{2}_{w} &\quad \text{(Variance function)}.      \\
\end{alignat*}
$$

So, we now have a model with a slightly more complex variance function that accommodates the fact that we have two sources of error whenever there are repeated measurements. This also connects directly with the idea that data from the same subject are *correlated*. The *covariance* between two measurements from the same subject is given by

$$
\text{Cov}(y_{i1},y_{i2}) = \text{Cov}(\mu + \alpha_{1} + S_{i} + \epsilon_{i1}, \mu + \alpha_{2} + S_{i} + \epsilon_{i2}) 
$$

Because $\mu$, $\alpha_{1}$ and $\alpha_{2}$ are *population constants*, they have 0 variance and thus do not contribute to the definition of covariance, leading to

$$
\text{Cov}(y_{i1},y_{i2}) = \text{Cov}(S_{i} + \epsilon_{i1}, S_{i} + \epsilon_{i2}) 
$$

This can be expanded like so

$$
\text{Cov}(y_{i1},y_{i2}) = \text{Cov}(S_{i},S_{i}) + \text{Cov}(S_{i},\epsilon_{i2}) + \text{Cov}(\epsilon_{i1},S_{i}) + \text{Cov}(\epsilon_{i1}, \epsilon_{i2}). 
$$

The subject effects and the errors are not correlated as these represent independent partitions of the overall error. As such, $\text{Cov}(S_{i},\epsilon_{i2}) = \text{Cov}(\epsilon_{i1},S_{i}) = 0$. Similarly, the final errors are uncorrelated because the correlation has been *removed* by partitioning-out the subject effects. So $\text{Cov}(\epsilon_{i1}, \epsilon_{i2}) = 0$. This leaves

$$
\text{Cov}(y_{i1},y_{i2}) = \text{Cov}(S_{i},S_{i}).
$$

A key result from the definition of covariance is that the covariance of a random variable with itself is simply its variance, meaning

$$
\text{Cov}(y_{i1},y_{i2}) = \text{Cov}(S_{i},S_{i}) = \text{Var}(S_{i}) = \sigma^{2}_{b}.
$$

All of which is to say that the variance associated with the subject-specific deflections *is* the correlation induced by the repeated measurements.

...

$$
\begin{alignat*}{1}
    \text{Var}\left(y_{i1} - y_{i2}\right) &= \text{Var}(y_{i1}) + \text{Var}(y_{i2}) - 2\text{Cov}(y_{i1},y_{i2}) \\
                                           &= \left[\sigma^{2}_{b} + \sigma^{2}_{w}\right] + \left[\sigma^{2}_{b} + \sigma^{2}_{w}\right] - 2\sigma^{2}_{b} \\
                                           &= \sigma^{2}_{w} + \sigma^{2}_{w}
\end{alignat*}
$$

So, we can see that the correlation *cancels-out*, which is exactly as expected from our exploration of the *model of paired differences* from earlier.

Now, we will connect what we have done above with the idea of modelling the variance-covariance matrix. Rather than doing this *explicitly*, the method above was an *implicit* modelling of the covariance structure...

$$
\begin{bmatrix}
    y_{11} \\
    y_{12} \\
    y_{21} \\
    y_{22} \\
\end{bmatrix}
\sim\mathcal{N}\left(
\begin{bmatrix}
    \mu + \alpha_{1} \\
    \mu + \alpha_{2} \\
    \mu + \alpha_{1} \\
    \mu + \alpha_{2} \\
\end{bmatrix}, 
\begin{bmatrix}
    \sigma^{2}_{b} + \sigma^{2}_{w}  & \sigma^{2}_{b}                  & 0           & 0                     \\
    \sigma^{2}_{b}                   & \sigma^{2}_{b} + \sigma^{2}_{w} & 0           & 0                      \\
    0                                & 0                               & \sigma^{2}_{b} + \sigma^{2}_{w}  & \sigma^{2}_{b}            \\
    0                                & 0           & \sigma^{2}_{b} & \sigma^{2}_{b} + \sigma^{2}_{w}  \\
\end{bmatrix}
\right)
$$

So this means that a covariance matrix is never actually calculated within a repeated measures ANOVA, the structure is simply *implied* via the various model assumptions. Because of this, the correct error terms for the various tests are not derived logically from the structure of the covariance matrix, they need to be organised manually. This is one of the biggest complexities of this framework and is why modern statistics prefers methods where this is taken care of *automatically* via an explicit covariance structure.

### The Sphericity Condition

... These corrections are available through the `ezANOVA()` function which, as we will see, is generally the most straight-forward way of generating a repeated measures ANOVA with the correct error partition. For simple models, this will make limited difference. However, as we will see below, more complex models because tricky to specify using `aov()`. This is partly a limitation in the structure of `aov()`, but also points to the general complexity of the repeated measures ANOVA framework.

## Between-subjects Factors

... So what is the correct error here? The obvious, and correct, answer is that it is the *between-subjects* error. So how do we use this in our test statistics? At present, all we have done is *removed* the between-subjects error by including the `Subject` factor in our models. However, we somehow have to use this removed error as the denominator in tests that are based on between-subjects effects.

## Additional Within-subject Factors

### Using the `ezANOVA()` Function
As we can see above, using a partitioned error model with `aov()` is a tricky business and it would be very easy to get this wrong. As an alternative, we can use the `ezANOVA()` function from the `ez` package. As the name implies, this is designed to allow for an RM ANOVA without the usual difficulties associated with the `aov()` or `lm()` functions. Unfortuantely, the aim of this package is largely to make the `R` output the same as SPSS. So it does away with the linear model framework. This means, no residuals, no parameter estimates, no diagnostic plots or anything else we have made use of so far. If you *have* to use an RM ANOVA, this is the simplest way to get it *right*. However, as we will discuss below, we would disuade you from ever considering RM ANOVA as an option in the future. About the only utility of this is showing doubtful researchers that our better options of GLS and mixed-effects models are, in fact, giving them the same answer as an RM ANOVA.

In [3]:
library(ez)
ezANOVA()

ERROR: Error in ezANOVA(): argument "data" is missing, with no default


## Why We Should *Not* Use RM ANOVA
Everything we have discussed above has really been an exercise in telling you why you really do not want to use RM ANOVA. All the unncessary fiddling with error terms and different tests requiring different errors is a complication that we could simply do without. Even if we do manage to successfully work out what needs to go where (or get a function like `ezANOVA()` to sort it for us), we are still left with a method that has a number of meaningful restrictions. ... Because of this, the RM ANOVA is both tricky to understand, tricky to use correctly and massively inflexible. It is no wonder that statisticians abandoned this method decades ago! And yet, this is the method that has persisted in psychology until releatively recently.

This section has largely been motivational to understand why we want to use something more flexible and more modern, but it is important to recognise that you may well end up working with someone who knows nothing beyond the RM ANOVA. In those situations, it is useful to (a) motivate the need for something better and (b) understand how to get the RM ANOVA results in `R`, in case they require further convincing. So, we do not condone the use of the RM ANOVA, but we understand its place in psychology and also understand that there are times where you may want to see what the RM ANOVA says, even if you do not wish to use it.