# The Mixed-effects Framework
...

## The Hierarchical Perspective

... These are sometimes called *hierarchical* or *multilevel* models.


### Fitting a Model to a Single Subject
To begin with, let us imagine that we only have the data for a *single subject*. What kind of model could we fit?

... So, notice that if we allow the model to freely estimate the effect of `condition`, it will fit the data *perfectly* and there will be no error. This tells us that we cannot let this effect be unique to each subject because the data does not support it. So, in this instance, the data constrains us to the model

...

where $\alpha_{j}$ is a *constant* across all subjects, rather than being unique to subject $i$. So this is our final model, if we *only* had the data for this one subject. 

Now, let us imagine that we ran the experiment *again* and collected a different subject. What do we think would change? Well, certainly the term $\mu_{i}$. We have different data now and so the grand mean is almost certainly going to change. What else? Well, the errors will also change, as we would expect. What would not change would be $\alpha_{j}$, because we have assumed that this is *constant* across subjects. So, we have $\mu_{i}$ that wil differ with each new subject and $\eta_{ij}$ that will differ with each observation within each subject.

What does this mean for both $\mu_{i}$ and $\eta_{ij}$? Well, what do we call a variable that changes every time we observe it? A *random variable*. So, both $\mu_{i}$ and $\eta_{ij}$ are *random variables*. This means that they *both* have some underlying distribution that they are drawn from. As we know, the $\eta_{ij}$ are *errors* and thus reflect *deflections* around the expected value. As such, their distribution is the same as it always was 

$$
\eta_{ij} \sim \mathcal{N}\left(0, \sigma^{2}_{w}\right).
$$

But what about the $\mu_{i}$? Well, as written above, these are *means* for each subject, so their expected value will not be 0. Instead, it will be whatever the *population grand mean* happens to be. Their variance will then represent the variability of the subject means, which we will call $\sigma^{2}_{b}$. As such

$$
\mu_{i} \sim \mathcal{N}\left(\mu, \sigma^{2}_{b}\right).
$$

So our full model is now

$$
\begin{alignat*}{1}
    y_{ij}        &=    \mu_{i} + \alpha_{j} + \eta_{ij}           \\
    \mu_{i}       &\sim \mathcal{N}\left(\mu,\sigma^{2}_{b}\right) \\
    \epsilon_{ij} &\sim \mathcal{N}\left(0,\sigma^{2}_{w}\right)   \\
\end{alignat*}
$$

For each subject we observe, we have a subject-specific intercept $\mu_{i}$, a constant effect of condition $j$ (as constrained by the data) and an observation-specific error $\eta_{ij}$.


### Fitting Individual Linear Models in `R`
Each time we do this, we will collect the estimates of $\mu_{i}$ and $\epsilon_{ij}$, just to demonstrate that these are the elements that truly *do* change with each new subject. We will then show the distribution of these at the end to illustrate that these terms are indeed *random variables*.

### The Hierarchical Model

... Given the model above, we can start to play around with how it is written. This will lead us directly to the *multilevel* way of viewing these models.

Remembering back to how we can specify a linear model in terms of either a distribution on the *outcome variable* or a distribution on the *errors*, we can do exactly the same thing with the $\mu_{i}$ terms. At present, this is expressed as a distribution on the outcome variable (all the different observed values of $\mu_{i}$). But we can separate this out like so

$$
\begin{alignat*}{1}
    y_{ij}    &= \mu_{i} + \alpha_{j} + \eta_{ij}        \\
    \mu_{i}   &= \mu + S_{i}                              \\
    S_{i}     &\sim \mathcal{N}\left(0,\sigma^{2}_{b}\right) \\
    \eta_{ij} &\sim \mathcal{N}\left(0,\sigma^{2}_{w}\right) \\
\end{alignat*}
$$

So we have written $\mu_{i}$ in terms of its *mean function* and a new error term $S_{i}$. This is *exactly* the multilevel/hierarchical perspective. We now have *two levels* of variation that we are modelling. Level 1 expresses the model for an individual subject and Level 2 expresses the model for the *group* of subjects. We could write this like so

$$
\begin{alignat*}{2}
\text{Level 1 (Subject)}\quad&
\begin{cases}
    y_{ij}    = \mu_{i} + \alpha_{j} + \eta_{ij}            \\
    \eta_{ij} \sim \mathcal{N}\left(0,\sigma^{2}_{w}\right) \\
    
\end{cases} \\
\text{Level 2 (Group)}\quad&
\begin{cases}
    \mu_{i} = \mu + S_{i}                                 \\
    S_{i}   \sim \mathcal{N}\left(0,\sigma^{2}_{b}\right) \\
    \end{cases}
\end{alignat*}
$$

## From Hierarchy to Mixed-effects
... So, the key here is recognising that we can collapse the two equations

$$
\begin{alignat*}{1}
    y_{ij}    &= \mu_{i} + \alpha_{j} + \eta_{ij}  \\
    \mu_{i}   &= \mu + S_{i}                       \\
\end{alignat*}
$$

After all, we can see exactly what $\mu_{i}$ is equal to. So let us replace $\mu_{i}$ in the first equation with the equality in the second equation. If we do so, we get

$$
\begin{alignat*}{1}
    y_{ij} &= (\mu + S_{i}) + \alpha_{j} + \eta_{ij} \\
           &= \mu + \alpha_{j} + S_{i} + \eta_{ij}
\end{alignat*}
$$

with

$$
\begin{alignat*}{1}
    S_{i}     &\sim \mathcal{N}\left(0,\sigma^{2}_{b}\right) \\
    \eta_{ij} &\sim \mathcal{N}\left(0,\sigma^{2}_{w}\right) \\
\end{alignat*}
$$

This is *exactly the partitioned error model we saw last week*.