# The Paired *t*-test
At this point in the lesson, we have established what repeated measures designs are, why they cause problems and how we can use the multivariate normal distribution as a general probabilistic framework for accommodating correlation. We will now move on to discussing models that are suitable for repeated measurements. The most basic repeated measures scenario is when we have *two* measurements from each subject. In this situation, it is typical to use a *paired* $t$-test. Given what we have now established, our interest lies in *how* the paired $t$-test is able to accommodate correlation. As we will discuss below, rather than modelling the correlation directly, the paired $t$-test is structured such that the correlation is effectively *removed* from the data. Although this *side-steps* many of the issues around repeated measurements, understanding *why* this works is key for for developing more complex and general-purpose approaches. 

## Two-sample vs Paired *t*-tests
To begin with, it is useful to examine *how* the results differ between a *two-sample* and *paired* $t$-test. We can don this in `R` by comparing the results of the `t.test()` function with `paired=FALSE` and `paired=TRUE`. To do this, we use the `mice2` data set from the `datarium` package. This contains the weight of a sample of 10 mice both *before* and *after* some treatment. The experimental question concerns whether the treatment affects the weight of the mice. The data is shown below.

In [1]:
library('datarium')
data('mice2')
print(mice2)

   id before after
1   1  187.2 429.5
2   2  194.2 404.4
3   3  231.7 405.6
4   4  200.5 397.2
5   5  201.7 377.9
6   6  235.0 445.8
7   7  208.7 408.4
8   8  172.4 337.0
9   9  184.6 414.3
10 10  189.6 380.3


We can compare the output from a *two-sample* $t$-test and a *paired* $t$-test by changing the `paired=` argument of `t.test()`, as shown below

In [2]:
print(t.test(mice2$before, mice2$after, var.equal=TRUE, paired=TRUE))  # paired t-test
print(t.test(mice2$before, mice2$after, var.equal=TRUE, paired=FALSE)) # two-sample t-test


	Paired t-test

data:  mice2$before and mice2$after
t = -25.546, df = 9, p-value = 1.039e-09
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -217.1442 -181.8158
sample estimates:
mean difference 
        -199.48 


	Two Sample t-test

data:  mice2$before and mice2$after
t = -17.453, df = 18, p-value = 9.974e-13
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -223.4926 -175.4674
sample estimates:
mean of x mean of y 
   200.56    400.04 



The output is a bit different between the two methods, so let us spend a little time unpacking this. To begin with, the clearest differences concern the $t$-statistic itself, the degrees of freedom, the $p$-value and the confidence interval. These are summarised in the table below 

| Test       | *t*-statistic | DoF | *p*-value | 95% CI            | 
| ---------- | ------------- | --- | --------- | ----------------- |
| Paired     | -25.546       | 9   | 1.039e-09 | [-217.14 -181.82] |
| Two-sample | -17.453       | 18  | 9.974e-13 | [-223.49 -175.47] |

Although it may seem like *everything* is different, there is actually one element that is *identical*, though it is somewhat hidden. To see it, consider that the structure of a $t$-test is

$$
t = \frac{\mu_{1} - \mu_{2}}{\text{SE}\{\mu_{1} - \mu_{2}\}},
$$

meaning that we think of the $t$ as the ratio between the *mean difference* and the *standard error of the mean difference*. The $t$-statistic itself is different between the *two-sample* and the *paired* tests, but this does not necessarily mean that all elements of this ratio are also different. Indeed, if we look at the output above we can see that the *paired* test reports a mean difference of `-199.48` and the *two-sample* test reports the individual means as `200.56` and `400.04`. If we calculate the mean difference from the values reported by the *two-sample* test we get 

In [3]:
print(200.56 - 400.04)

[1] -199.48


So, this is *identical* between the *paired* and *two-sample* tests. This should not be surprising, as we already established that repeated measurements do not affect the mean function. However, this does make it clear that it is not the *numerator* that differs between the paired and two-sample tests. As such, it must be the *denominator* of the $t$-statistic that leads to the differences above. In other words, *the standard error of the difference changes under repeated measurements*.

Given that we know the numerator for both tests, we can recover the denominators and see that this is true

In [4]:
mean.diff  <- -199.48
paired.t   <- -25.546
twosamp.t  <- -17.453
paired.se  <-  mean.diff / paired.t
twosamp.se <-  mean.diff / twosamp.t

print(c(paired.se, twosamp.se))

[1]  7.808659 11.429554


The *standard error* of the difference is much *smaller* in the *paired* test (`7.81`), compared to the *two-sample* test (`11.43`). This should not be a surprise as we know that the variance of the difference between two random variables should get *smaller* when they are positively correlated. From this, we can conclude that the standard error in the two-sample test is *too large* for this particular dataset, which has led to a $t$-statistic that is *smaller* than it should be. Application of the wrong method here has led to a *loss* of statistical power.

This tracks with everything we have discussed so far. However, the key question remains *how* the paired $t$-test is able to do this? There are two equivalent ways of conceptualising this, but the simplest is to think of a paired $t$-test as a model of *differences* between the repeated measurements. This is the perspective we will discuss below. The alternative perspective will be presented when we start discussing the *repeated measures ANOVA* as a generalisation of the paired $t$-test.

## The Model of *Paired Differences*
The key idea behind the *paired differences* approach is that a paired $t$-test is identical to a *one-sample* $t$-test on the *differences* between the pairs. This is a really key conceptual step because it introduces the idea that we can correctly model repeated measurements by *removing* something from the data. If we can make the paired test correct via subtraction it means that we are able to *remove* correlation and make the data *independent*. As we will see further below, formalising this idea allows us to conceptualise models of repeated measures in a more general fashion that will be useful going forward. 


### Paired Differences in `R`
As a first step, we can demonstrate that the idea of subtracting pairs *does* work. We can use the `mice2` data again and create a new variable that represents the *difference* between `before` and `after` the treatment.

In [5]:
mice2$treat.diff <- mice2$before - mice2$after
print(mice2)

   id before after treat.diff
1   1  187.2 429.5     -242.3
2   2  194.2 404.4     -210.2
3   3  231.7 405.6     -173.9
4   4  200.5 397.2     -196.7
5   5  201.7 377.9     -176.2
6   6  235.0 445.8     -210.8
7   7  208.7 408.4     -199.7
8   8  172.4 337.0     -164.6
9   9  184.6 414.3     -229.7
10 10  189.6 380.3     -190.7


Now, we can simply perform a one-sample $t$-test on the difference. To do this, we could use the `t.test()` function, but given that our general focus is linear models, we will use `lm()` instead.

In [6]:
onesamp.mod <- lm(treat.diff ~ 1, data=mice2)
summary(onesamp.mod)


Call:
lm(formula = treat.diff ~ 1, data = mice2)

Residuals:
   Min     1Q Median     3Q    Max 
-42.82 -11.17   1.28  19.66  34.88 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -199.480      7.809  -25.55 1.04e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 24.69 on 9 degrees of freedom


The test on the intercept parameter $(\beta_{0})$ is now *identical* to the paired $t$-test from earlier. Importantly, we have done *nothing* to explicitly model the covariance structure. Indeed, remember that `lm()` assumes that there is *no* correlation in the data. So, we have now managed to analyse repeated measurements using a model that assumes that there are no repeated measurements. In effect, we have *created* independent data via subtraction.

### Removing the Effect of the Subjects
Given the demonstration above, the key insight is that taking $d_{i} = y_{i1} - y_{i2}$ allows us to treat the values of $d_{i}$ as *independent*. This means that this subtraction *must* be removing the correlation from the data. This is fairly intuitive because we have reduced correlated pairs of data down to only a single value per-subject. Given that the subjects are *independent*, the values of $d_{i}$ must also be independent. In effect, there are *no* repeated measurements anymore. However, expressing this formally is a useful stepping-stone to more general approaches.

To see what is happening, let us return to our basic $t$-test model from last semester, where we parameterised each group mean in terms of the grand mean plus a group-specific deflection. The model is given by

$$
y_{ij} = \mu + \alpha_{j} + \epsilon_{ij},
$$

so the means of the two repeated measurements are given by 

$$
\begin{alignat*}{1}
    E(y_{i1}) &= \mu_{1} = \mu + \alpha_{1} \\
    E(y_{i2}) &= \mu_{2} = \mu + \alpha_{2}.
\end{alignat*}
$$

To capture the concept of *dependence* between these repeats, we will now *split* the error into *two* components. This may seem an odd step, but stick with it because this simple maneuver is key to understanding why the paired $t$-test works. So, we take our error term and define $\epsilon_{ij} = S_{i} + \eta_{ij}$. This now consists of a *shared component* for each subject, called $S_{i}$, as well as a *unique component* for each observation, called $\eta_{ij}$. For the two repeated measurements, the model is now

$$
\begin{alignat*}{1}
    y_{i1} &= \mu + \alpha_{1} + \overbrace{S_{i} + \eta_{i1}}^{\epsilon_{ij}} \\
    y_{i2} &= \mu + \alpha_{2} + S_{i} + \eta_{i2} 
\end{alignat*}.
$$

So, the reason why $y_{i1}$ and $y_{i2}$ are correlated is because they *share* the same component $S_{i}$. This captures the idea that these measurements come from the *same subject*. If we then *subtract* $y_{i1}$ and $y_{i2}$, the term $S_{i}$ will cancel-out

$$
\begin{alignat*}{1}
    d_{i} = y_{i1} - y_{i2} &= (\mu_{1} + S_{i} + \eta_{i1}) - (\mu_{2} + S_{i} + \eta_{i2}) \\
                            &= (\mu_{1} + \eta_{i1}) - (\mu_{2} + \eta_{i2})
\end{alignat*}.
$$

Because we end up with a single value of $d_{i}$ per-subject, these values must be *independent* because the subjects are *independent*. So, because the subtraction removes $S_{i}$, this tells us that $S_{i}$ *must* capture the *correlation*. As such, the simple act of splitting the error term into *two parts* directly explains exactly *how* and *why* the paired $t$-test works. 

`````{admonition} A Poetic Explanation
:class: tip
In their book "Analysis of Repeated Measures", Crowder and Hand (1990) refer to the elements of the model

$$
y_{ij} = \mu + \alpha_{j} + S_{i} + \eta_{ij}
$$

in a more poetic way that may help get a sense of what the model terms are capturing. They refer to $\mu_{j} = \mu + \alpha_{j}$ as an "immutable constant of the universe", $S_{i}$ as a "lasting characteristic of the individual" and $\eta_{ij}$ as a "fleeting aberration of the moment". So, $\mu_{j} = \mu + \alpha_{j}$ represents something fundamental and universal about the effect of the different treatments that is true across all measurements of those treatments. $S_{i}$ captures something that is specific and unique to subject $i$ that is true across all their repeated measurements, and $\eta_{ij}$ represents random noise that occurred at the point of measurement that is unrelated to the experimental condition or the individual.
``````

## Inference in the Model of *Paired Differences*
Earlier, we indicated that a key difference between a *paired* $t$-test and a *two-sample* $t$-test was the standard error used for the test statistic. We will now see how this aligns with the model of *paired differences* and the idea of *splitting* the error term into $\epsilon_{ij} = S_{i} + \eta_{ij}$ and then *removing* the $S_{i}$ terms.

### A Model with *Two* Error Terms
To understand the consequences of splitting the error in two, recall that in the normal linear model the errors are a random variable of the form 

$$
\epsilon_{ij} \sim \mathcal{N}(0,\sigma^{2}).
$$

As such, when we split the errors, we get *two* random variables of the form

$$
\begin{alignat*}{1}
    S_{i}     &\sim \mathcal{N}(0,\sigma^{2}_{1}) \\
    \eta_{ij} &\sim \mathcal{N}(0,\sigma^{2}_{2}).
\end{alignat*}
$$

We do not assume these have the same variance and thus we now have *two* variances as well. So, not only have we *partitioned* the errors, we have also *partitioned* the variance into two separate chunks

$$
\text{Var}\left(y_{ij}\right) = \sigma^{2} = \sigma^{2}_{1} + \sigma^{2}_{2}.
$$

This says that the variation in our data is the sum of two separate chunks that come from two different sources. Within the context of a basic repeated measures model, these variance components are often called the *between-subjects* variance and the *within-subject* variance. So we can equivalently write the distribution of the errors as

$$
\begin{alignat*}{1}
    S_{i}     &\sim \mathcal{N}(0,\sigma^{2}_{b}) \\
    \eta_{ij} &\sim \mathcal{N}(0,\sigma^{2}_{w}).
\end{alignat*}
$$

In this scheme, $\sigma^{2}_{b}$ captures variation due to the fact that we have measured *different subjects*, and $\sigma^{2}_{w}$ captures variation of measurement *within each subject*. We therefore think of variation in these data as attributable to two different sources of error. One concerns the fact that we have different people in the data and the other concerns the fact that we have multiple measurements from each person in the data. We can therefore specify this model more generally as

$$
y_{ij} \sim\mathcal{N}(\mu + \alpha_{j}, \sigma^{2}_{b} + \sigma^{2}_{w})
$$

We will discuss this model specification in more detail in the next part of this lesson, as it is a key result that leads us both to the repeated measures ANOVA and the mixed-effects models that are the focus of this section of the unit. 

### Error Variance in the Model of *Paired Differences*
Now, let us circle back to the paired $t$-test and the idea of subtracting the pairs to remove the correlation. So, sticking with the concept of a model with *two* error variances, what happens when we take the *difference* $d_{i} = y_{i1} - y_{i2}$? Remember from the beginning of this lesson that the variance of the difference between two random variables is given by

$$
\text{Var}(y_{1} - y_{2}) = \text{Var}(y_{1}) + \text{Var}(y_{2}) - 2\text{Cov}(y_{1},y_{2}).
$$

As such, the variance of $d_{i}$ is

$$
\begin{alignat*}{3}
    \text{Var}(d_{i}) &= \text{Var}(y_{i1}) &&+ \text{Var}(y_{i2}) &&- 2\text{Cov}(y_{i1},y_{i2}) \\
                      &= \left(\sigma^{2}_{b} + \sigma^{2}_{w}\right) &&+ \left(\sigma^{2}_{b} + \sigma^{2}_{w}\right) &&- 2\text{Cov}(y_{i1},y_{i2})
\end{alignat*}
$$

For reasons we will give in the next part of the lesson, the covariance between the repeated measurements is

$$
\text{Cov}(y_{i1},y_{i2}) = \sigma^{2}_{b}.
$$

Although this might seem an unintuitive result, stick with it for the moment. We will provide more intuition about this later in the unit. For now, let us see what happens when we put all this together 

$$
\begin{alignat*}{1}
    \text{Var}(d_{i}) &= \left(\sigma^{2}_{b} + \sigma^{2}_{w}\right) + \left(\sigma^{2}_{b} + \sigma^{2}_{w}\right) - 2\sigma^{2}_{b} \\
                      &= 2\sigma^{2}_{w} + 2\sigma^{2}_{b} - 2\sigma^{2}_{b}  \\
                      &= 2\sigma^{2}_{w}
\end{alignat*}
$$

So, when we subtract the repeated measurements, $\sigma^{2}_{b}$ *disappears* and we are only left with $\sigma^{2}_{w}$. This is exactly the same idea as $S_{i}$ disappearing in the subtraction, as well as the idea of the correlation disappearing in the subtraction. The only variance left in $d_{i}$ is $\sigma^{2}_{w}$. This means that the standard errors can only be based on $\sigma^{2}_{w}$, which is always going to be smaller than $\sigma^{2}_{w} + \sigma^{2}_{b}$, which is what the *two-sample* $t$-test is using.

Putting this all together, the paired $t$-test works because subtracting the repeated measurements cancels the term $S_{i}$. This both removes the correlation *and* removes a portion of the overall error variance. What is left represents only the *within-subject* variance. Because this is always *smaller* than the total variance, the standard error of the difference is *also* smaller. This makes the associated $t$-statistic *bigger* in the paired case, even though the mean difference never changes. In effect, this scheme allows correlation to be incorporated and the standard error adjusted in an entirely mechanistic way, simply by calculating the *difference* between the pairs.


`````{topic} What do you now know?
In this section, we have explored the paired $t$-test as the most basic model of repeated measurements. After reading this section, you should have a good sense of :

- The core differences when we apply a paired $t$-test vs a two-sample $t$-test to the same data in terms of the test statistic, degrees of freedom and confidence interval.
- The concept that the *numerator* of the $t$-statistic does not change under repeated measurements, but the *denominator* (the *standard error of the differences*) does.
- The idea that a paired $t$-test can be conceptualised as a *one-sample* $t$-test on the *differences* between the pairs.
- The idea that subtracting the pairs renders the data *independent* because this *removes the correlation*.
- The idea that we can conceptualise *why* this happens by: 
    - Splitting the model error into two parts, one that captures a *shared component* for each subject and one that captures *independent error*.
    - Seeing how this *shared component* is the element that *cancels* under subtraction and thus *must* be the element that explains the correlation between the repeated measurements.
    - Seeing that the only remaining variance reflects *within-subject* deviations that are *smaller* than the total variance.
    - Seeing that this *within-subject* variance is the only element that can feeds into the standard errors and thus explaining why the standard errors are *smaller* for the paired test compared to the two-sample test.

`````

[^intercept-foot]: So too will the intercept term $\mu$, but this just means that the data will be *mean centred* with 0 representing no difference between the repeated measurements.