# Summary
We have spent a lot of time in this lesson discussing GLS. Although you may be wondering *why*, given that this unit is about *mixed-effects*, focusing on GLS to begin with has several advantages:

- Many problems you encounter in practice may be more simply addressed by using GLS, rather than a full mixed-effects approach.
- GLS very directly solves our main issues with the traditional repeated measures ANOVA, and so is the logical *next step* away from those methods.
- GLS makes us focus on the idea of the *covariance structure*, which is helpful for conceptualising how to model repeated measures, as well as understanding mixed-effects.
- In terms of implementation, the syntax used by `gls()` for describing the covariance structure will be repurposed in the context of mixed-effects. So understanding it now is very useful.
- Mixed-effects models can be understood as a *form* of GLS, so conceptually GLS is a good starting point.
- The issues around inference are not *unique* to GLS, as they are also shared by mixed-effects models. So understanding these problems *from the start* will prevent confusion later.

However, as we have seen, there are some *problems* with using GLS. Some of these problems are *practical limitations* based on how GLS is implemented in `R`. These types of problems are potentially *fixable* in software, but are not presently implemented. Alternatively, some of these problems are *conceptual* and apply irrespective of any specific implementation in software. 

In terms of justifying the move to *mixed-effects*, the bigger issues are the *conceptual* ones. Although there may be a *practical* reason for moving to a different framework, this is not wholly convincing when the problem is a *software* one. Instead, we need to consider more fundamental *conceptual* limitations that justify a *more flexible* framework. So, the argument is *not* that mixed-effects models suddenly solve the problem of inference for arbitrary covariance structures (they very much *do not*). The argument is that the whole *framework* of mixed-effects provides us with more sophisticated and flexible models that give us everything GLS does *and more*. 

## Practical Limitations
In terms of the *practical* limitations of the `gls()` function, perhaps the biggest is that the function itself is designed to behave as if we are using GLS, not FGLS. This is a problem because, of all the solutions discussed, this is the *least desirable*. This is further complicated by other functions that are compatible with `gls()` taking a *different* approach. For instance, because `Anova()` refuses to pretend that we are using GLS, it chooses to fall-back on *asymptotic* inference. However, because `Anova()` has no other options, we end up forced into using asymptotics elsewhere in order to keep the inferential framework consistent. This is a *choice* on the part of the `car` developers, because it *would* be possible for `Anova()` to implement *effective degrees of freedom* for `gls()` models. But it does not. So, these are not necessarily limitations of the FGLS *framework*. They are limitations based on the *choices* made when different authors implemented different solutions in `R`. Although someone could quite easily implement a different package that provides different GLS functionality, the `gls()` function from `nlme` is so ubiquitous and so widely-supported by other packages that this would take time to gain traction and become practically useful. So, for now, it makes most sense to simply treat `gls()` as the only option and thus we must manoeuver around these choices as best we can. 

`````{admonition} A Limitation of R?
:class: info
Disagreements across packages in terms of how certain facilities are implemented is an unfortunate consequence of the way that `R` works. Because there is no central authority declaring that we will "always use asymptotics", or "always use effective degrees of freedom", it becomes up to the developers to decide which methods they want to support. Some developers are less opinionated and want to provide *many options*, whereas other are *very* opinionated and want to provide only *some* options. Although this can be frustrating, you need to remember that these disagreements are not based on *nothing*. If there were a single solution that had no issues, there would be no controversy. The very fact that there is no consensus actually tells you something *very important*. If we were to just use something like SPSS, we can absolve responsibility because we have no choice. We just did what SPSS told us! However, this provides an illusion of *certainty* that is simply not there. For good or for ill, controversies in statistics turn into disagreements across `R` packages. This can be *frustrating* for users, but the frustration needs to be directed to the right place. It is not the developers' fault that the sampling distribution is *unknown*. So, there is often a *deeper* reason for these problems that can reveal important truths about the methods we are relying on. The naive researcher may cry foul and shout "I just want some $p$-values!". The correct response is not to make the decision for them, it is to ask "which $p$-values do you want and are you sure you can trust them?" 
`````

## Conceptual Limitations
Moving on to the more pressing *conceptual* limitations, one of the bigger *drawbacks* to GLS is its *simplicity*. Now, this is not necessarily a bad thing as there are times when we *want* (or even *need*) simplicity. However, simplicity makes it much more difficult, or even impossible, to represent complex situations within the GLS framework. Remember, GLS simply treats the covariance structure as a *nuisance*. It is a separate element of the model that we simply want to get rid of. We just take a single covariance matrix and sweep it away using a big mathematical broom. But this is an *extreme* approach to the problem because it *ignores* the structure that caused the correlation in the first place. In effect, GLS says "we do not care above explaining the dependency structure, we just know that it is there and we want it *gone*". GLS takes a *sledgehammer* to the problem. Sometimes a sledgehammer is the best tool for the job, and sometimes we want something more elegant that can capture more subtle nuances of our data. This requires the model to *know* how the data are structured so that the models knows *where* the correlation actually comes from. This is precisely what mixed-effects models do.

If we go back to the beginning of all this discussion about dependency, you should remember that the correlation structure exists *as a consequence* of how the data is structured. We have datapoints that share a commonality because they come from the same subject. The correlation is therefore a *byproduct* of this structure. As such, if our model somehow *embedded* this structure, the correlations would be accounted for *automatically* without any explicit declaration of the form of the variance-covariance matrix. So rather than reasoning about the precise form of this matrix, we simply state the structure of the data and leave the rest up to the model. This is much more elegant than simply guessing the covariance structure, estimating it and then removing it. Furthermore, the model can use the data structure to its advantage. Information could be pooled to provide much more subtle and nuanced predictions that understands the hierarchy under which the data were sampled. This is not something that GLS is capable of doing, but is *precisely* what mixed-effects models are able to do. So far from solving the inferential problems of GLS, mixed-effects models actually provide a more sophisticated alternative that does not require us to focus on the covariance structure as an explicit element of the model. We simply express how the data are structured, and everything else is taken care of.

## When Should We Use GLS?
So, taking everything we have discussed in this lesson, our final question is: when should we use GLS? Answering this is somewhat tricky when we have not covered mixed-effects as the alternative. However, even at this stage, we can provide recommendations around when GLS may be the *better* choice. Oftentimes, if you have a repeated measures or longitudinal dataset, you can start by *considering* GLS and then asking whether mixed-effects actually offers any advantages. If it does not, or if the additional complications that arise from a mixed-effects model do not seem worth it, then GLS is the obvious solution. So, we can conceptualise GLS as something of a *fall-back* position. However, this does the method something of a disservice as there are plenty of situations where GLS is the more obvious choice:

1. When we have repeated measurements, but no additional structure to the data. For instance, a single time-series from a single subject measured using approaches such as eye-tracking, EEG or fMRI.
2. When we genuinely only have a single value per-subject and per-repeated measurement. In these cases, mixed-effects do not have enough variation to work with and will implicitly rely on a very simple covariance structure. If we want an unconstrained covariance structure, we need to use GLS.
3. If we really do not care about *where* the dependency comes from and just want the simplest approach to removing it.

In a way, GLS can serve as the most straightforward plug-in replacement for the repeated measures ANOVA. It effectively does what the repeated measures ANOVA tries to do, but in a much more flexible way. So, if you have very basic data from a simple repeated measures experiment, GLS is often *all you need*. However, if you have more sophisticated data with multiple replications per-subject and per-repeated measurement and you want the model to combine all this information in a way that is difficult to conceptualise as a single covariance matrix, mixed-effects are the way to go. 

An example might make this clearer. Imagine a study where you are taking multiple measures from each student across different classes and across different schools. We can imagine that the repeated measures from a single student will be correlated, but so too could measures taken from students *in the same class*. Furthermore, all classes *from the same school* may well share some element that makes them correlated. So the different schools are all *independent*, but classes from within the school, students from the same class and measurements from the same student will all be *dependent*. So how would we express this complex dependency structure as a single covariance matrix? Do you think you could write this down? Do you think you could tell `gls()` how to construct this? Using a mixed-effects approach, we simply embed this hierarchy of measurement in the model and the model will create the covariance structure automatically. We do not even need to think about it. So this is where mixed-effects have a *real advantage*.