# Generalised Least Squares
We will start our journey into the world of mixed-effects models by first examining a *related* approach that we have already seen before: Generalised Least Squares (GLS). The reason for doing this is twofold. Firstly, GLS actually provides a simpler solution to many of the issues with the repeated measures ANOVA and thus presents a more logical starting point. Secondly, limitations in the way that GLS does this will provide some motivation for mixed-effects as a more complex, but ultimately more flexible, method of dealing with this problem.  

... In the previous part of this lesson, we indicated that one of the biggest issues with the repeated measures ANOVA is that the covariance is not only very simple, it is also only *implicit* in the model. The practical result of this is that the error terms of the tests needed to be organised manually, which causes a lot of headaches in complex situations. A better approach would be a method where the covariance structure was actually *estimated explicitly* and then used to derive the correct standard errors for the tests. This would result in an *automatic* generation of the correct denominators, which is a much nicer situation to have. Mixed-effects models do this, as we will see as this section of the unit progresses. However, before we get there, we will examine another method that is able to do this that we have already covered: *generalised least squares*.

... As we will see below, GLS is really a stepping-stone to mixed-effects models. Although we can use GLS to accommodate a variety of correlation structures, we will see that this method has no concept of the *structure* of the data. So, GLS solves one major issue of the repeated measures ANOVA. However, it is an incomplete solution. This will provide the main justification for moving towards full mixed-effects models.

## How Does GLS Work?
...At its most basic, GLS uses the residuals of an initial model fit to estimate the correlation/variance structure. This is then *removed* from the data. The corrected data is then, in theory, *uncorrelated* with *equal variance* and we can use OLS to estimate the parameters. This can all be achived effectively within a single estimation framework by using *restricted maximum likelihood* (REML). This will iteratively estimate the variance structure from the residuals and then estimate the parameters after removing the estimated variance structure. This continues until covergence (i.e. the parameters stop changing on each iteration). So we can think of GLS as a mechanism for *removing* a complex correlation or variance structure from the data, and then using a standard regression model on what remains.

## The Paired $t$-test Using GLS

In [4]:
library(MASS)
library(nlme)
library(car)

set.seed(666)

var1  <- var2 <- 1
rho   <- 0.8
covar <- rho*sqrt(var1)*sqrt(var2)

Sigma <- matrix(data=c(var1,covar,covar,var2), nrow=2, ncol=2)
y     <- mvrnorm(n=50, mu=c(1,1.25), Sigma=Sigma)

subject <- rep(seq(from=1,to=50), each=2)
subject <- as.factor(subject)

y.long <- as.vector(t(y))    # Turn y into a column
cond   <- rep(c("A","B"),50) # Create a predictor for the two conditions
cond   <- as.factor(cond)

t.test(y[,1], y[,2], paired=TRUE, var.equal=TRUE)

gls.mod <- gls(y.long ~ cond, correlation=corCompSymm(form=~1|subject))
print(summary(gls.mod))

print(Anova(gls.mod))

#lme.mod <- lme(y.long ~ cond, random=(~1|subject))
#summary(lme.mod)


Generalized least squares fit by REML
  Model: y.long ~ cond 
  Data: NULL 
       AIC      BIC    logLik
  248.5643 258.9042 -120.2822

Correlation Structure: Compound symmetry
 Formula: ~1 | subject 
 Parameter estimate(s):
      Rho 
0.8618049 

Coefficients:
                Value  Std.Error  t-value p-value
(Intercept) 0.9601951 0.15753402 6.095161  0.0000
condB       0.2066553 0.08282007 2.495232  0.0143

 Correlation: 
      (Intr)
condB -0.263

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-2.26305985 -0.76968978  0.01612246  0.63434195  2.17187084 

Residual standard error: 1.113934 
Degrees of freedom: 100 total; 98 residual
Analysis of Deviance Table (Type II tests)

Response: y.long
     Df  Chisq Pr(>Chisq)  
cond  1 6.2262    0.01259 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Notice, however, that GLS is ignoring the structure in the data in terms of the subjects, because it has no knowledge of them.

## Large Sample GLS

In [40]:
library(MASS)
library(nlme)
library(car)

set.seed(666)

n.sub <- 600
n.cond <- 2
var1  <- var2 <- 1
rho   <- 0.8
covar <- rho*sqrt(var1)*sqrt(var2)

Sigma <- matrix(data=c(var1,covar,covar,var2), nrow=2, ncol=2)
y     <- mvrnorm(n=n.sub, mu=c(1,1.01), Sigma=Sigma)

subject <- rep(seq(from=1,to=n.sub), each=n.cond)
subject <- as.factor(subject)

y.long <- as.vector(t(y))    # Turn y into a column
cond   <- rep(c("A","B"),n.sub) # Create a predictor for the two conditions
cond   <- as.factor(cond)

t.test(y[,1], y[,2], paired=TRUE, var.equal=TRUE)

gls.mod <- gls(y.long ~ cond, correlation=corCompSymm(form=~1|subject))
summary(gls.mod)

print(Anova(gls.mod))

#lme.mod <- lme(y.long ~ cond, random=(~1|subject))
#summary(lme.mod)



	Paired t-test

data:  y[, 1] and y[, 2]
t = -0.14922, df = 599, p-value = 0.8814
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -0.05131205  0.04406548
sample estimates:
mean difference 
    -0.00362329 


Generalized least squares fit by REML
  Model: y.long ~ cond 
  Data: NULL 
       AIC      BIC    logLik
  2747.006 2767.359 -1369.503

Correlation Structure: Compound symmetry
 Formula: ~1 | subject 
 Parameter estimate(s):
      Rho 
0.8242698 

Coefficients:
                Value  Std.Error   t-value p-value
(Intercept) 0.9710817 0.04095920 23.708516  0.0000
condB       0.0036233 0.02428229  0.149215  0.8814

 Correlation: 
      (Intr)
condB -0.296

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-3.22888223 -0.68109163 -0.01037886  0.71963247  2.85231883 

Residual standard error: 1.003291 
Degrees of freedom: 1200 total; 1198 residual

Analysis of Deviance Table (Type II tests)

Response: y.long
     Df  Chisq Pr(>Chisq)
cond  1 0.0223     0.8814


So notice now that the paired $t$-test with knowledge of the structure, the GLS tests with inaccurate knowledge of the structure and the asymptotic tests that ignore the structure are *identical*. This is because the structure only matters for calculation of the degrees of freedom, which only make a difference when the sample size is *small*. Once the sample size gets big enough, the $t$-distribution approaches the normal distribution and the degrees of freedom no longer matter. So this suggests that, actually, GLS will agree perfectly so long as the sample size is large. Otherwise, its tests must be taken as *asymptotic approximations* that will be liberal in small samples.

In brief, we can conclude that 

- GLS is an asymptotically valid generalisation of RM ANOVA.
- The only reason it fails to reproduce ANOVA tests is that ANOVA df are small-sample design-based df.
- In large samples, GLS and RM ANOVA inference become identical
- In small samples, $p$-values that are quite close to the threshold (e.g. $p = 0.048$ or $p = 0.035$) should be treated *very cautiously*, as we know these will be liberal

## GLS for ANOVA Models
In the above example, we saw the use of GLS as an alternative to the paired $t$-test. A more usual application would be as an alternative to the Repeated Measures ANOVA as a means of side-stepping all the messy specification of different error terms for different tests.

### ANOVA Models with More Flexible Variance Structures
Perhaps the most useful elements of GLS is that we can use much more flexible variance structures than the simple compound-symmetric structure assumed by RM ANOVA. For instance, we can ask for a completely free within-subject correlations and a different variance for each between-subject group. For instance, specifying `corSymm(form=~1|subject)` and `varIdent(form=~1|group)` gives the most flexible structure that allows all correlations and variances to differ. However, we have to be careful because it is possible that there simply is not enough information in our data to allow this to be estimated, even if we want it.

## What Does GLS *Not* Do?
... The problem is that GLS does not know anything about the *structure* of the data. It has no sense of *subjects* as the experimental unit, nor the idea that the outcome variable is comprised of clusters of values taken from different subjects who might themselves form clusters of values from larger groups (e.g. patients vs controls). All that GLS knows is that there is a correlation structure that we want to remove. Unfortunately, this lack of appreciation for the structure of the data means that GLS cannot use that structure to its advantage. There is no separation of the information available by pooling observations across subjects, or subjects across groups. In effect, GLS is a very *crude* solution to a bigger problem with repeated measurements. Namely, that there is a larger *hierarchical* structure at play that the model should be able to take advantage of. We have seen this in a very general way through small-sample degrees of freedom, but really this is only a *symptom* of a larger problem. As we will come to learn, mixed-effects models are advantageous precisely *because* they embed this structure in the model. This has a number of consequences, not least the fact that correlation between measurements from the same experimental unit are *automatically* embedded in the model. This is not because we tell the model to include correlation, rather it is a *natural consequence* of the structure of the data. As such, mixed-effects models are useful because features such as correlation are a natural part of the modelling framework, precisely because it does take the structure into account in a way that GLS simply cannot.

## When Can We Use GLS?
In reality, a GLS model is useful if you do not care about the hypothesis tests and just want estimates that accommodate a given correlation structure, or if you are using non-repeated measurement data and want a more flexible between-subjects variance structure. Alternatively, if you are taking many measurements from a *single* subject, GLS can be useful to model just their individual data. For instance, time-series data from one subject as measured using EEG, or eye-tracking, or continuous monitoring of hand movement. In these cases you can end up with thousands of data points and GLS can be used to model it using some suitable correlation structure (e.g. `corAR1(form=~time)`). The estimates could then be used as summary statistics to analyse multiple subjects[^summarystat-foot]. However, as we have seen above, GLS is not necessarily suitable for multiple subjects with repeated measurements because it does not accommodate the blocked structure of the data and thus fails to consider how this changes the number of independant pieces of information. This can be side-stepped by using asymptotic statistics that do not depend upon the concept of degrees of freedom. In large samples, this issue disappears, so if you have a large sample, or are willing to treat the $p$-values cautiously in small samples, GLS is a perfectly legitimate solution to the problem.

[^summarystat-foot]: This approach is, unsurprisingly, known as a *summary statistics* approach and is typical of how data analysis is handled for fMRI and M/EEG data.