# Mixed-effects models

Sections:
* Fixed effects vs. random effects
* Mixed-effects vs. nuisance variables

This lecture draws from Chapter 1 in Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R package version, 1(7), 1-23. We also extend some ideas raised in Yarkoni, T. (2019). The generalizability crisis. Psyrxiv.

---
# 1. Fixed effects vs. random effects

Let's consider the full form of a statistical test.

$$ Y = f(X) + \epsilon $$

In everything we've learned so far, we leave sources of error that can't be explained by our factors of interest as noise (i.e., $\epsilon$). This is variance in $Y$ that comes from other sources. Typically we think of this as _random_ noise, but sometimes it's systematic. Systematic sources of variance include things like batch effects. Here the way in which you sampled your data can have an impact on $Y$. 

For example, let's say that you are looking at the effects of distractability on test performance. You test groups of students in different classrooms where you manipulate the presence of visually distracting stimuli on the walls (e.g., posters). Half the classrooms have bare walls, the other half have a lot of visually distracting images. Each classroom takes a timed math test.

In this example, $Y$ is performance on the test and our effect of interest $X$ is a binary variable indicating whether a classroom has distracting information on the walls. While the experimenter randomly assigned classroom, it is reasonable to assume that there is just natural variability in performance driven by the classroom (e.g., variability in the students, teacher performance). In this case we consider classroom to be a _batch_ variable, or more specifically the effect of classroom on test performance to be _random_.

[_Mixed-effects models_](https://en.wikipedia.org/wiki/Mixed_model), or simply mixed models, are variations on the linear regression model that attempt to account for these batching effects. Essentially it breaks the problem down into two parts.

* **Fixed effects:** Variables whose relationship to the resposne variable are stationary and covary with $Y$ in a meaningful way.

* **Random effects:** Categorical variables that are not of primary interest but have a systematic influence on the response variable.

In this case, we define $f(X)$ as 

$$ Y = X \beta + Z \upsilon + \epsilon $$

Here the fixed effects are described by $X \beta$ and are the same types of effects that we have discussed in the earlier lectures on linear regression. The random effects are described by $Z \upsilon$ where $Z$ is another design matrix like $X$, but indicating the categories of the batch variables that define the random effects. In our example, $Z$ would indicate the different classrooms and $\upsilon$ represents the influence that each classroom has on test scores.

The random effect variable ($Z$) can influence $Y$ in two separate ways. 

* First it can cause a _shift in the mean_ of $Y$. For example, regardless of the presence of destracting posters on the walls, the average test performance of certain classrooms will be higher than others. 
* The other way that the random effect variable can influence $Y$ is by _moderating the slope of the fixed effects_. This makes more sense in the context where $X$ is a continuous, quantitative variable. As $X$ varies around it's mean, the slope of it's relationship can be impacted by the random effect variable as well. 

In many ways, mixed-effects models are similar to ANCOVA models, but are more flexible.




---
# 2. Random effects vs. nusiance variables

Mixed models "control" for known sources of variability that can impact your estimation of the fixed effects (i.e., the effects that you are interested in). In this way they belong to a class of modeling approaches that attempt to account for factors that can confound your analysis. Thus they are often confused with an approach known as _nuisance signal regression_. Let's compare these two approaches to controlling for confounds in order to understand how they differ.

<br>

## Nuisance regression

Typically in nuisance signal regression, or just nuisance regression, we take advantage of violations of the collinearity assumption to account for other factors that may impact $Y$ and either produce a false effect with your variable of interst (i.e., Type-I error or false-positive error) or mask a true effect with your variable of interest (i.e., Type-II error or false-negative error). 

For the moment let's say that $X_{real}$ is the set of factors that you are interested in and $X_{nuisance}$ are factors that may effect your estimation of the relationship $Y=f(X_{real})$. We say that we _control_ for the effects of $X_{nuisance}$ by simply adding them to the model. Let's consider the case where $X_{real}$ and $X_{nuisance}$ both just have 1 variable (i.e., p = 1 for both). Then our full model would be 

$$ Y = \beta_0 + \beta_1 X_{real} + \beta_2 X_{nuisance} + \epsilon $$

If $X_{real}$ and $X_{nuisance}$ are correlated, but you **do not** account for $X_{nuisance}$, then your estimate of $\beta_1$ is really going to reflect $\beta_1 + \beta_2$. By including $X_{nuisance}$ in the model you are taking advantage of the additive assumption of linear systems. Thus you get a more accurate estimate of $\beta_1 X_{real}$ because $\beta_2 X_{nuisance}$ accounts for the correlated variance.

$$ \beta_1 X_{real} = Y - \beta_0 - \beta_2 X_{nuisance} - \epsilon $$

This boils down to the following principle: 

* **If the relationship between $X_{real}$ and $Y$ is correlated with $X_{nuisance}$, then including $X_{nuisance}$ in the model will give you the parital effect of $X_{real}$ on $Y$ given $X_{nuisance}$**.

This is typically how researchers "control" for possible confounds. For example, if you we take our classroom example above and wanted to control for the age of each child, we would include _age_ as a nuisance factor.  But notice something important here. In this case $\beta_2 X_{nuisance}$ (or $X_{age}$ in our example) is actually a _fixed effect_. We think that the relationship between $X_{nuisance}$ and $Y$ is meaningful, not random. 

Let's consider the mixed model more carefully now.

<br>

## Mixed models

Let's return to the form of a mixed model.

$$ Y = X \beta + Z \upsilon + \epsilon $$

By the nature of $\upsilon Z$ being random, we can't assume that for a unit change in $Z$ we will see a simple up or down effect on $Y$. This means that we can't use the same objective function that we used for a fixed effects model. Remember that the objective function for a fixed effect model (i.e. a linear regression model) is

$$ min(||Y-\beta_0-X\beta_1||^2) $$

If we extend this to the nuisance regression case above it is.

$$ min(||Y-\beta_0-X_{real}\beta_1 - X_{nuisance} \beta_2 ||^2) $$

However, in the mixed model case, $Z$ is assumed to have it's own random structure. This means that the variance of the regression coefficients on the random effects ($\upsilon$) have a different variance than the regression coefficients on the fixed effects ($\beta$). Specifically we say that

$$ \upsilon \sim N(0,\Sigma_{\theta}) $$

In this case, the effects (i.e., $\upsilon Z$) are real but randomly determined. 

Thus the _objective function_ that is to be minimized is

$$min(||Y-X\beta-Z\Lambda_{\theta}\upsilon||^2+||\upsilon||^2)$$

From our example at the beginning, the mean of Classroom 2 isn't higher or lower than the mean for Classroom 1 because the number 1 is smaller than the number 2. They are just categorical indicators that have their own sources of signal and noise, but for the purposes of your model those are completely determined by random chance.

Now if there no structure in the random effects at all, then $\Sigma_{\theta}$ would just boil down to the noise factor $\epsilon$ (or in the matrix case $\epsilon I$, where $I$ is the identity matrix). But if the random effects are indeed real, then we have to estimate an independent covariance matrix ($\Lambda_{\theta}$) that explains the random structure in $Z$. Thus we can estimate $\Sigma_{\theta}$ by

$$  \Sigma_{\theta} = \epsilon \Lambda_{\theta} \Lambda_{\theta}' $$

Because we have to estimate $\Lambda_{\theta}$  (and by extension $\Sigma_{\theta}$), this means we can't use the simple MLE solution to the OLS regression problem in order to find both $\beta$ and $\upsilon$. This is what differentiates mixed-effect models from nuisance regression.