# The Multilevel Framework
We will start our journey into the world of mixed-effects models from the perspective of a *multilevel* model. This is primarily because we can build the pieces of a mixed-effects model very slowly from first principles. In addition, the multilevel framework is often the most *intuitive* way to think about these models, they are just rarely implemented in this fashion. .... Remember as well, we will use the term *multilevel* throughout, but the term *hierarchical* is entirely equivalent. So, if anyone ever says "hierarchical linear model" to you, you just need to do a mental substitution for the term "multilevel linear model". They mean exactly the same thing.  

## Fitting a Model to One Subject
To begin with, let us imagine that we only have the data for a *single subject*. What kind of model could we fit?

In [1]:
library('datarium')
library('reshape2')

data('selfesteem')

# repeats and number of subjects
t <- 3
n <- dim(selfesteem)[1]

# reshape wide -> long
selfesteem.long <- melt(selfesteem,            # wide data frame
                        id.vars='id',          # what stays fixed?
                        variable.name="time",  # name for the new predictor
                        value.name="score")    # name for the new outcome

selfesteem.long           <- selfesteem.long[order(selfesteem.long$id),] # order by ID
rownames(selfesteem.long) <- seq(1,n*t)                                  # fix row names
selfesteem.long$id        <- as.factor(selfesteem.long$id)               # convert ID to factor

In [2]:
sub.1 <- selfesteem.long[selfesteem.long$id == '1',]
print(sub.1)

  id time    score
1  1   t1 4.005027
2  1   t2 5.182286
3  1   t3 7.107831


So, our key question here is what model is possible?

In [3]:
lm.sub.1 <- lm(score ~ time, data=sub.1)
summary(lm.sub.1)


Call:
lm(formula = score ~ time, data = sub.1)

Residuals:
ALL 3 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    4.005        NaN     NaN      NaN
timet2         1.177        NaN     NaN      NaN
timet3         3.103        NaN     NaN      NaN

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1,	Adjusted R-squared:    NaN 
F-statistic:   NaN on 2 and 0 DF,  p-value: NA


... So, notice that if we allow the model to freely estimate the effect of `condition`, it will fit the data *perfectly* and there will be no error. This tells us that we cannot let this effect be unique to each subject because the data does not support it. So, in this instance, the data constrains us to the model


## Extending the Model to Multiple Subjects
But of course, we do not *only* have subject 1. So let us now do the same thing with subject 2 ...

$$
$$

So, we now have these two models separately, but notice that we *can* now estimate the effect of `time` because we have replications of `time` *across* the subjects. So now, we can think of these two models as

$$
\begin{alignat*}{1}
    y_{1j} &= \mu_{1} + \alpha_{j} + \eta_{1j} \\
    y_{2j} &= \mu_{2} + \alpha_{j} + \eta_{2j} \\
\end{alignat*}
$$

which, across all subjects, just gives us

$$
y_{ij} = \mu_{i} + \alpha_{j} + \eta_{ij}.
$$

So, we now have *subject-specific* means ($\mu_{1}, \mu_{2}, \dots, \mu_{n}$) and an effect of `time` ($\alpha_{j}$). But notice that there is no subject index on this effect. So it is *the same* irrespective of the specific subject. This is important because it indicates that $\alpha_{j}$ captures something *universal* from across all subjects, which is precisely what we want in terms of understanding the effect of `time` on the self-esteem `score`.

In order to understand where we go from here, imagine that we ran the experiment *again* and collected a new subject. What do we think would change? Well, certainly the term $\mu_{i}$. We have different data now and so the grand mean is almost certainly going to change. What else? Well, the errors will also change, as we would expect. What would not change would be $\alpha_{j}$, because we have assumed that this is *constant* across subjects. So, we have $\mu_{i}$ that will differ with each new subject and $\eta_{ij}$ that will differ with each observation within each subject.

What does this mean for both $\mu_{i}$ and $\eta_{ij}$? Well, what do we call a variable that changes every time we observe it? A *random variable*. So, both $\mu_{i}$ and $\eta_{ij}$ are *random variables*. This means that they *both* have some underlying distribution that they are drawn from. As we know, the $\eta_{ij}$ are *errors* and thus reflect *deflections* around the expected value. As such, their distribution is the same as it always was 

$$
\eta_{ij} \sim \mathcal{N}\left(0, \sigma^{2}_{w}\right).
$$

But what about the $\mu_{i}$? Well, as written above, these are *means* for each subject, so their expected value will not be 0. Instead, it will be whatever the *population grand mean* happens to be. Their variance will then represent the variability of the subject means, which we will call $\sigma^{2}_{b}$. As such

$$
\mu_{i} \sim \mathcal{N}\left(\mu, \sigma^{2}_{b}\right).
$$

As we know from our discussions last semester, we can always write a linear model as an equation for the mean function with the probabilistic behaviour of the random variable attributable to an error term. So we can write the above as

$$
\mu_{i} = \mu + S_{i}
$$

with

$$
S_{i} \sim \mathcal{N}(0,\sigma^{2}_{b})
$$


## The Complete Multilevel Model
So, putting all these pieces together, our full model is now

$$
\begin{alignat*}{1}
    y_{ij}        &= \mu_{i} + \alpha_{j} + \eta_{ij}  \\
    \mu_{i}       &= \mu + S_{i} \\
\end{alignat*}
$$

with

$$
\begin{alignat*}{1}
     S_{i}    &\sim \mathcal{N}\left(0,\sigma^{2}_{b}\right)   \\
    \eta_{ij} &\sim \mathcal{N}\left(0,\sigma^{2}_{w}\right) \\
\end{alignat*}
$$

So, these are in fact *two* models that are *linked together*. We have a *hierarchy* of models, or a single model with *multiple levels*. Indeed, from the multilevel perspective, it is typical to label these models like so

$$
\begin{alignat*}{2}
    y_{ij}        &= \mu_{i} + \alpha_{j} + \eta_{ij} &\quad\text{(Level 1)} \\
    \mu_{i}       &= \mu + S_{i}                      &\quad\text{(Level 2)} \\
\end{alignat*}
$$

### The Data-generating Process
In order to fully conceptualise what the multilevel model is saying, we need to think of it as an explanation of *where* our data come from.

So, for a single subject, their mean for a given value of `time` is *unique* to them. But we can decompose this into a *subject-specific* mean and a constant effect of `time`. So each measurement from each subject is composed of three parts

1. $\mu_{i}$ - something unique and specific to individual $i$ that is true across all measurements taken from them
2. $\alpha_{j}$ - something universal about the effect of time-point $j$ that is true irrespective of the individual
3. $\eta_{ij}$ - a random perturbation of measurement $ij$ that captures all the reasons why this is not exactly $\mu_{i} + \alpha_{j}$

... SO we can think of the expected value *conditional* on a specific subject. For instance, when $i = 1$ we expect

$$
E(y_{1j}) = \mu_{1} + \alpha_{j}.
$$

So each subject has their own specific mean. However, whenever we consider *all* subjects we have

$$
E(y_{ij}) = E(\mu_{i}) + \alpha_{j} = \mu + \alpha_{j}.
$$

which consists of only those effects that are *universal* across all subjects. Again, this captures the idea that we have many *little* models for each individual subject as well as one *big* model for all subjects. 

### Fitting Individual Linear Models in `R`
Each time we do this, we will collect the estimates of $\mu_{i}$ and $\epsilon_{ij}$, just to demonstrate that these are the elements that truly *do* change with each new subject. We will then show the distribution of these at the end to illustrate that these terms are indeed *random variables*.

In [4]:
level.1 <- lm(score ~ 0 + id + time, data=selfesteem.long)

Notice that this bears a striking similarity to how we specified the repeated measures ANOVA using `lm()` a couple of weeks ago. We will make this connection more explicit a little later in this lesson.

In [5]:
summary(level.1)


Call:
lm(formula = score ~ 0 + id + time, data = selfesteem.long)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.3509 -0.5233 -0.0888  0.5304  1.9560 

Coefficients:
       Estimate Std. Error t value Pr(>|t|)    
id1      3.3350     0.6078   5.487 3.28e-05 ***
id2      3.1631     0.6078   5.204 5.98e-05 ***
id3      3.7253     0.6078   6.129 8.66e-06 ***
id4      3.3961     0.6078   5.588 2.65e-05 ***
id5      2.3156     0.6078   3.810 0.001283 ** 
id6      2.5832     0.6078   4.250 0.000482 ***
id7      3.2189     0.6078   5.296 4.91e-05 ***
id8      3.0261     0.6078   4.979 9.72e-05 ***
id9      3.3630     0.6078   5.533 2.97e-05 ***
id10     3.2747     0.6078   5.388 4.04e-05 ***
timet2   1.7938     0.4298   4.174 0.000570 ***
timet3   4.4962     0.4298  10.462 4.44e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.961 on 18 degrees of freedom
Multiple R-squared:  0.9824,	Adjusted R-squared:  0.9707 
F-statistic: 83.89

So, we now have individual effects unique to each subject, as well as effects of `time` that exist *across* subjects.

In [6]:
mu.i    <- coef(level.1)[1:10]
level.2 <- lm(mu.i ~ 1)

Due to the way the model is coded, this is therefore the *group mean* for `time1`. 