# Generalised Least Squares (GLS)
We will start our journey into the world of mixed-effects models by first examining a *related* approach that we have seen before: Generalised Least Squares (GLS). The reason for doing this is twofold. Firstly, GLS actually provides a simpler solution to many of the issues with the repeated measures ANOVA and thus presents a more logical starting point. Secondly, limitations in the way that GLS does this will provide some motivation for mixed-effects as a more complex, but ultimately more flexible, method of dealing with this problem.

## GLS Theory
We previously came across GLS in the context of allowing different variances for different groups of data in ANOVA-type models. This was motivated as a way of lifting the assumption of *homogeneity of variance*. However, GLS is actually a much more general technique. To see this, note that the probability model for GLS is

$$
\mathbf{y} \sim \mathcal{N}\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right),
$$

where $\boldsymbol{\Sigma}$ can take on *any structure*. In other words, GLS has exactly the same probability model as the normal linear model, except that it allows for a flexible specification of the variance-covariance matrix. In our previous examples, we used GLS to populate the variance-covariance matrix with different variances for each group. For instance, if we had two groups with three subjects each, our GLS model would be

$$
\begin{bmatrix}
y_{11} \\
y_{21} \\
y_{31} \\
y_{12} \\
y_{22} \\
y_{32} \\
\end{bmatrix}
\sim\mathcal{N}\left(
\begin{bmatrix}
\mu_{1} \\
\mu_{1} \\
\mu_{1} \\
\mu_{2} \\
\mu_{2} \\
\mu_{2} \\
\end{bmatrix},
\begin{bmatrix}
\sigma^{2}_{1}  & 0              & 0              & 0              & 0              & 0              \\
0               & \sigma^{2}_{1} & 0              & 0              & 0              & 0              \\
0               & 0              & \sigma^{2}_{1} & 0              & 0              & 0              \\
0               & 0              & 0              & \sigma^{2}_{2} & 0              & 0              \\
0               & 0              & 0              & 0              & \sigma^{2}_{2} & 0              \\
0               & 0              & 0              & 0              & 0              & \sigma^{2}_{2} \\
\end{bmatrix}
\right).
$$

This was actually a special case of GLS known as *weighted least squares* (WLS)[^weights-foot], where all the off-diagonal elements of $\boldsymbol{\Sigma}$ are 0. However, the crucial point is that  we can use GLS to impose differences in *both* the variances *and* the covariances. So while we did not do this previously, we can include *correlation* in the GLS model. Thus, if our general problem with repeated measures is that the variance-covariance structure is not correctly handled by the normal linear model, GLS provides a direct solution. Furthermore, if a core complaint of the repeated measures ANOVA is that the assumed covariance structure is too restrictive, GLS again provides a direct solution. So, on the face of it, GLS directly solves many of the issues we encountered last week.

### What Does GLS Do?
Technically, the machinery behind GLS is based on assuming we know $\boldsymbol{\Sigma}$ *a priori*. Although this would seem a silly place to start (given that we will almost *never* know this), we can go along with it and see where it gets us. So, *if* we know what the true covariance structure is, GLS provides a way of *removing* it from the data[^white-foot]. Once removed, the errors return to $i.i.d.$ and we are back in the world of the normal linear model. This is a very enticing prospect because all the difficulties associated with correlation effectively *disappear* and we can treat the data as a regular collection of independent values. So, although this *removal* procedure happens behind the scenes, we can conceptualise GLS as effectively a *transformation* that treats the covariance structure as *nuisance* and removes it, allowing all the theory from last semester to still apply.

Unfortunately, this only really works is we *know* $\boldsymbol{\Sigma}$. In the real world, where we never know $\boldsymbol{\Sigma}$, we cannot technically use GLS. This would seem a bit of a dead-end. However, it is possible to use a method such as REML to *estimate* $\boldsymbol{\Sigma}$ from the data. This is known as *feasible generalised least squares* (FGLS). The question then becomes, how does working with $\hat{\boldsymbol{\Sigma}}$ rather than $\boldsymbol{\Sigma}$ change things? This is actually a much bigger question that extends to *any* method where we allow for a completely arbitrary covariance structure, rather than restricting it to a very specific form. Although our complaint about repeated measures ANOVA was that the covariance assumptions were unrealistic, as it turns out, these assumptions are *essential* to prevent the exact inferential machinery used within the linear model from breaking. Our desire for generality comes at a cost, as we will now discuss.

## Covariance Constraints
As well as understanding that the very process of estimating $\boldsymbol{\Sigma}$ causes problems, we also need to understand that we cannot have free reign to estimate any old covariance structure we like. One of the most important elements to recognise is that some sort of *constraint* is always needed when estimating a variance-covariance matrix. To see this, note that for a repeated measures experiment there are $nt \times nt$ values in this matrix. The values above and below the diagonal are a mirror image, so the true number of unknown values is $\frac{nt(nt + 1)}{2}$. For instance, if we had $n = 5$ subjects and $t = 3$ repeated measures, there would be $\frac{15 \times 16}{2} = 120$ unique values in the variance-covariance matrix. If we allowed it to be completely unstructured, we would have 120 values to estimate *just* for the covariance structure. Indeed, this is not really possible unless the amount of data we have *exceeds* the number of parameters. So, the data itself imposes a *constraint* on how unstructured the covariance matrix can be.

Luckily, for most applications, we not only assume that $\boldsymbol{\Sigma}$ has a block-diagonal structure (so most off-diagonal entries are 0), but that many of the off-diagonal elements are actually *identical*. We saw this previously with the repeated measures ANOVA. Even though $\boldsymbol{\Sigma}$ may have *hundreds* of values we *could* fill-in, if we assume compound symmetry only within each subject, there are only *two* covariance parameters to be estimated: $\sigma^{2}_{b}$ and $\sigma^{2}_{w}$. The whole matrix can then be constructed using those two alone. This is an example of *extreme simplification*, but it does highlight that we generally do not estimate the *whole* variance-covariance matrix. We only estimate *small parts* of it. Indeed, making the covariance matrix more general is often a risky move because of the number of additional parameters needed. The more we estimate from the same data, the greater our uncertainty will become because each element of the covariance-matrix is supported by *less data*. Complexity always comes at a price.