# Introducing Mixed-effects Models
... However, it is a little unsatisfying to have a method that only works *asymptotically* and does not take the true structure of the data into account. Althoughy this may be fine for your use-case, it would be better to have an approach that works *irrespective* of the sample size and is able to flexibly accommodate the actual structure of the data. This is the domain of *linear mixed-effects* (LME) models.

... In addition, it is important to recognise that there are many instances where a GLS model is able to achieve everything that an LME model can achieve in a much easier fashion. However, there are many aspects to real-world data that the GLS framework is unable to accommodate. These include:

- Missing values in some subjects
- Data with *multiple repeats* per-subjecte and per-condition
- Non-normally distributed data
- A hierarchical structure
- Different amounts of data per-subject

`````{admonition} Variable Terminology
:class: tip
Note that mixed-effects models are also known as *random-effects models*, *linear mixed models* and *hierarchical linear models*. These are all exactly the same thing, so try not to get confused if you come across these different terms.
`````

## Motivational Example
... A little later we will introduce the `lme4` package as an alternative to `nlme`, but for now we will stick with the format that we know.

## The `lme4` Package

We will learn much more about the syntax and theory behind these methods as this unit progresses. For now, just notice what happens when we specify the same paired $t$-test from earlier, but with `subject` specifically treated as a random effect.

In [3]:
library('matrixcalc')
library('MASS')
set.seed(666)

var1  <- 1
var2  <- 1
rho   <- 0.8
covar <- rho*var1*var2

Sigma <- matrix(data=c(var1,covar,covar,var2), nrow=2, ncol=2)
y     <- mvrnorm(n=50, mu=c(1,1.25), Sigma=Sigma)


y.long  <- vec(t(y))          # Turn y into a column
cond    <- rep(c("A","B"),50) # Create a predictor for the two conditions
subject <- matrix(data=c(seq(1,50),seq(1,50)), nrow=50, ncol=2)
subject <- vec(t(subject))
subject <- as.factor(subject)

In [4]:
library('nlme')

# Mixed Model
mix.mod <- lme(y.long ~ cond, random=(~1|subject))
summary(mix.mod)

Linear mixed-effects model fit by REML
  Data: NULL 
       AIC      BIC    logLik
  248.5643 258.9042 -120.2822

Random effects:
 Formula: ~1 | subject
        (Intercept)  Residual
StdDev:    1.034103 0.4141004

Fixed effects:  y.long ~ cond 
                Value  Std.Error DF  t-value p-value
(Intercept) 0.9601951 0.15753399 49 6.095162   0.000
condB       0.2066553 0.08282009 49 2.495231   0.016
 Correlation: 
      (Intr)
condB -0.263

Standardized Within-Group Residuals:
          Min            Q1           Med            Q3           Max 
-2.4841539956 -0.5269443992 -0.0008964121  0.4360563683  1.9063134479 

Number of Observations: 100
Number of Groups: 50 

So notice that this now agrees *precisely* with the paired $t$-test. This is true even in a small sample and without the need to appeal to asymptotic statistics. Like GLS, this has been achieved without the need to messily include the subjects as an effect in the model, but is still able to take the subjects as indicative of the structure of the data as well as accommodate the correlation. So, this appears to fix all our problems. The main issue going forward is actually *understanding* what the model is doing. Unfortunately, this is a harderer task than with GLS and will be the focus of the next few lessons on this unit.