# Introduction to Mixed-effects Models
We will start with a brief high-level discussion about mixed-effects models. Because these types of model can be quite complex, we will just talk very generally about a few key topics, before we turn to the actual theory in the next part of the lesson. 

## Multilevel vs Hierarchical vs Mixed-effects
Perhaps one of the more confusing aspects of mixed-effects models is that they have *multiple names*. Worse than this, those multiple names come with different ways of *writing* and *thinking about* these models. This is to the extent that naive researchers sometimes do not even realise they are the same thing. So, it is important that we establish this from that start: a mixed-effects model can also be called a *multilevel model* or a *hierarchical linear model*. This should be burned into your brain by the end of this lesson.

Although typical to teach these types of models from only *one* of these perspectives, we are taking the approach of teaching them *all* simultaneously. Though this sounds challenging, there is good reason for doing so:

- Multilevel and hierarchical refer to the *same* perspective, so there are only *two* ways of looking at these models, not *three*.
- Both perspectives have their merit and it is likely you will find one of them more intuitive than the other. If you know *both*, you can choose.
- Different software requires different perspectives. If you know both, you can use anything you want.

From the perspective of this lesson, there is a preference for *thinking about* these models as multilevel/hierarchical, but then *implementing* them as mixed-effects. This involves building the model conceptually as a multilevel/hierarchical model, but then translating it into mixed-effects form for the purpose of fitting in software. You may disagree, but that is some of the point of presenting *both* perspectives. 

In addition, we will use the term *multilevel* throughout. The term *hierarchical* is entirely equivalent, so if anyone ever talks to you about *hierarchical linear models*, they are talking about *multilevel models*. Our preference for *multilevel* comes down to it being (a) more directly descriptive of the model form and (b) easier to say. 

## Estimation of Mixed-effects Models
Before we even get to understanding mixed-effects models, it is important to realise that these types of models can be *computationally challenging*. In general, the examples we provide will not push the computer very hard and will be very quick to fit. However, in real datasets, things can be more challenging. All the software we will present leverages iterative maximum likelihood methods for finding the model parameters. This is important to understand for several reasons:

- Iterative algorithms can fail if they do not land on a solution after the default number of iterations.
- We are not finding closed-form solutions here, we are using the computer to search for the "best" solution in an unknown landscape of possibilities.
- Failure can either be a *structural* problem with the model or a *computational* problem related to either the algorithm itself, or the computer running out of memory.

You should not run into any problems using the examples in these lessons because they are chosen to be *simple*. However, in the real world, using large datasets, these problems can emerge. This is precisely because we have left the world of least-squares certainty and have entered the world of iterative optimisation algorithms.

In addition, most mixed-effects software will have the option of finding parameters using either *maximum likelihood* (ML) or *restricted maximum likelihood* (REML). As we know from last semester, ML is *biased* when estimating variance terms because it treats the mean structure as *known*. REML is *unbiased* because it is able to accommodate that uncertainty. So, obviously we just want to use REML? In general, yes, however, there is a catch. If we are *comparing models*, using REML will change elements of the model structure beyond those that we want to compare. As such, we need to make sure we are using ML when performing model comparisons. We will discuss this in more detail in the associated workshop.

## Inference in Mixed-effects Models
Another important aspect of mixed-effects models in that, much like GLS, inference remains *approximate*. This is for all the same reasons we have discussed in previous weeks. Because we are moving to models with more general covariance structures, the inferential framework breaks in exactly the same. Mixed-effects do not save us from this. So we are back to the options presented previously:

1. Ignore the problem.
2. Ignore the problem, but only under the assumption that we have so much data that it does not matter.
3. Construct fictitious degrees of freedom as a proxy for our uncertainty.

One advantage of mixed-effects models is that software developments have tended to focus on their use over methods like GLS, so there can be more options available. For instance, the effective degrees of freedom method developed by [Kenward & Rogers (1997)](https://www.jstor.org/stable/2533558) is available in the `R` package `pbkrtest`, but only for mixed-effects models[^lme4-foot]. This is despite the method itself is more general-purpose. Mixed-effects models also embed the structure of the data within the model because this is *where* the covariance structure comes from. This can allow for better approximations to the degrees of freedom because certain structural elements, such as the number of subjects, is a known part of the model structure. This is not true of GLS. GLS has no idea where the covariance structure has come from. It just takes what we give it and dutifully removes it. So knowledge of the data structure is a helpful addition. Of course, if we do not care for effective degrees of freedom then these advantages are largely moot.   

## Mixed-effects Applied to Repeated Measurements
In what follows, we will be mainly looking at mixed-effects models through the lens of *repeated measurements*. This is because this is the most common situation in experimental psychology and cognitive neuroscience where mixed-effects models are applied. However, as we will see next week, the power of mixed-effects models is that they can be applied to a variety of situations where the data has additional structure that we want to accommodate. This includes, but is not limited to, repeated measurement designs. So remember, mixed-effects are much *broader* than repeated measures. We are just limiting the discussion for the moment to keeps things clearer.  

On the topic of repeated measurements, this is also where terminology can start to become unhelpful. For instance, to a traditionally trained psychologist, a mixed ANOVA is one that contains *within-subject* and *between-subjects* factors. To a statistician, a mixed ANOVA refers to a mixed-effects model. The problem? A mixed ANOVA *is* a mixed-effects model, but the term "mixed" is being used in two different ways. The psychologist means "mixed" in terms of the form of experimental manipulation, whereas the statistician means "mixed" in terms of the modelled effects. A statistician would refer to this type of experiment as a "split-plot" design, but this is not common terminology in psychology. So you can see how confusion can easily set in. To get around this, we prefer to use the term "repeated measures" ANOVA *generically* for all situations where repeated measures are involved. We make no distinction between models that contain all within-subject manipulations or models that contain both within-subject and between-subjects manipulations. These are elements of *experimental design* that do not necessarily require further qualification when describing the statistical model. If you feel strongly about keep the term "mixed", we would suggest you always qualify this to make your meaning known. For instance, a "mixed-design" ANOVA compared to a "mixed-effects" ANOVA. 

## Packages for Mixed-effects
As a final point in this section, we need to address the fact that there are *two* main packages used in `R` for fitting mixed-effects models. This is unfortunate because it adds a degree of confusion that we could do without. This situation is largely a *historic* one, however, neither package can be considered superior to the other as they each have their *advantages* and *disadvantages*. Unfortunately, no single *uber package* exists that combines all the advantages of both. Until that day, you need to make an informed decision about which to use for a given analysis. 

### `nlme`
The older of the two packages is `nlme`. We have used this package already for the `gls()` function, but it also provides the `lme()` function for fitting mixed-effects models. This package is fully described in the associated textbook *Mixed-effects Models in S and S-Plus* by [Pinheiro & Bates (2000)](https://link.springer.com/book/10.1007/b98882). As the name suggests, this package actually *pre-dates* the wide-spread adoption of `R` as a language, instead focusing on the language that pre-dated `R` called `S` (for *statistics*) and `S-Plus` (a commercialised implementation of `S`). Although older, `nlme` has several advantages over the more recent `lme4`, which will become more apparent as we move through the materials. For now, the biggest advantage is that we can add a `weights=` term to a mixed-effects model in `nlme` to accommodate heterogeneity of variance. So, if we do have between-subjects manipulations, we can model different variances for the different groups. In many situations, this is a very important aspect. For instance, in clinical studies our between-subjects manipulation may correspond to *patients* vs *controls*. In that situation, different degrees of variability could be *characteristic* of the condition under study and ignoring it could bias our inference. The ability to accommodate this makes `nlme` an attractive prospect, despite the age of the package.

### `lme4`
Unlike `nlme`, the `lme4` package was developed much more recently directly for `R`. Its focus is on several aspects where `nlme` falls down. Firstly, by implementing much of the computation in `C++` and then interfacing with `R`, `lme4` gets a significant speed boost over `nlme`. This can make the model fitting much more robust and less prone to failing with large and complex datasets. Secondly, `lme4` allows for *crossed random-effects*, which is something `nlme` cannot do. This is a much more advanced topics, so do not worry about it for now. Just know that there are certain types of model that `nlme` *cannot fit*. Finally, `lme4` allows mixed-effects models for non-normal outcome variables via an implementation of *generalised linear mixed-effects models*. Again, this is a much more advanced topic, but is worth remembering because this is something that is *not possible* using `nlme`. However, the main *disadvantage* of `lme4` is that it has no facility for accommodating between-subjects variance differences. So while `lme4` has several modelling and computational advantages, for more basic usage where accommodation of heterogeneity of variance is important, `nlme` may be more applicable. This is why we will be largely sticking with `nlme` for all our examples. However, it is worth knowing that the syntax of these two packages is *very similar*, so once you understand one of them, it is not particularly difficult to switch to the other.

## Further Reading

```{figure} images/lme-textbook-front.png
---
scale: 15%
align: right
---
```

Finally, it is worth saying that the topic of mixed-effects models can be quite complicated and difficult to get your head around. We will be trying to build intuition as best we can throughout these materials. However, it is often advantageous to have more examples and explanations available. For that reason, we recommend the book *Linear Mixed Models: A Practical Guide Using Statistical Software* by [West, Welch and Galecki (2022)](https://www.taylorfrancis.com/books/mono/10.1201/9781003181064/linear-mixed-models-brady-west-kathleen-welch-andrzej-galecki), which is now on its 3rd edition. This book is unique because it contains the analysis of several different types of data structure using *all* major software packages. This means each chapter contains examples using `nlme` and `lme4` in `R`, *as well as* `SPSS`, `STATA`, `SAS` and `HLM`. So you can easily see how mixed-effects procedures work across different software platforms. Although written by statisticians, and so quite technical in places, the guided examples of analyses can be highly information, especially as this shows you *how a statistician would perform the analysis*. The analysis of these different data structures will also be the basis of the examples next week, so we will be revising this book again.   

`````{topic} What do you now know?
In this section, we have explored ... . After reading this section, you should have a good sense of :

- ...
- ...
- ...

`````

[^lme4-foot]: Specifically, mixed-effects models fit using the `lme4` package, rather than fit using `nlme`.