/
GLMMadaptive.Rmd
265 lines (227 loc) · 11.4 KB
/
GLMMadaptive.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
---
title: "GLMMadaptive Basics"
author: "Dimitris Rizopoulos"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{GLMMadaptive Basics}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library("GLMMadaptive")
```
# Generalized Linear Mixed Models - Theory
## Model Specification
Package **GLMMadaptive** provides a suit of functions for fitting and post-processing
mixed effects models for grouped/clustered outcomes which have a distribution other than a
normal distribution. In particular, let $y_i$ denote a vector of grouped/clustered outcome for
the $i$-th sample unit ($i = 1, \ldots, n$). The conditional distribution of $y_i$ given a
vector of random effects $b_i$ is assumed to be a member of the extended exponential
family, with linear predictor given by $$g\{E(y_i \mid b_i)\} = X_i \beta + Z_i b_i,$$
where $g(\cdot)$ denotes a monotonic link function, $X_i$ a design matrix for the fixed
effects coefficients $\beta$, and $Z_i$ a design matrix for the random effects
coefficients $b_i$. Typically, matrix $Z_i$ is assumed to be a subset of $X_i$. The random
effects are assumed to follow a normal distribution with mean 0 and variance-covariance
matrix $D$. In addition, the distribution $[y_i \mid b_i]$ may potentially have
extra dispersion/shape parameters $\phi$.
## Estimation
The package focuses in settings in which the distribution $[y_i \mid b_i]$ is not normal
and/or the link function $g(\cdot)$ is not the identity. In these settings, the estimation
of the model is complicated by the fact that the marginal log-likelihood function of the
observed $y_i$ cannot be derived analytically. In particular, the log-likelihood function
has the form:
$$\begin{eqnarray*}
\ell(\theta) & = & \sum_{i = 1}^n \log p(y_i; \theta)\\
& = & \sum_{i = 1}^n \log \int p(y_i \mid b_i; \theta) \, p(b_i; \theta) \, db_i,
\end{eqnarray*}$$
where $\theta$ denotes the full parameter vector including the fixed effects, the extra
potential dispersion/shape parameters $\phi$ and the unique element of the covariance
matrix $D$, and $p(\cdot)$ denotes a probability density or probability mass function. The
integral in the definition of $\ell(\theta)$ does not have a closed-form solution, and
numerical approximations are required to obtain the maximum likelihood estimates.
In the literature several approaches have been proposed to approximate such integrals, and
a nice overview is given in
[Pinheiro and Chao (2006)](https://doi.org/10.1198/106186006X96962). A typical approach to
approximate these integrals is the Laplace approximation. However, the general consensus
has been that in the standard but difficult cases of binary/dichotomous data and count data
with small counts and few repeated measurements, the accuracy of this approximation is
rather low. Due to this fact, the general consensus is that the gold standard numerical
approximation method is the adaptive Gaussian quadrature rule (*note: we focus here on
maximum likelihood estimation; under the Bayesian paradigm, approaches, such as, MCMC and
Hamiltonian Monte Carlo also provide accurate evaluation of the integrals*). This is more
computationally intensive but also more accurate. This package provides an efficient
implementation of the adaptive Gaussian quadrature rule, allowing for multiple correlated
random effects (e.g., random intercepts, linear and quadratic random slopes) but currently
a single grouping factor (i.e., no nested or crossed random effects designs).
A hybrid optimization procedure is implemented starting with an EM algorithm, treating the
random effects as 'missing data', followed by a direct optimization procedure with a
quasi-Newton algorithm. Fine control of this procedure is allowed with a series of control
arguments.
# Generalized Linear Mixed Models - Practice
## Mixed Effects Logistic Regression
We illustrate the use of the package in the standard case of a mixed effects logistic
regression. That is, the distribution of $[y_i \mid b_i]$ is binomial, and the distribution
of $[b_i]$ multivariate normal.
We start by simulating some data for a binary longitudinal outcome:
```{r}
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we construct a data frame with the design:
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- drop(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
```
We fit a mixed effects logistic regression for `y`, assuming random
intercepts for the random-effects part. The basic model-fitting function in
**GLMMadaptive** is called `mixed_model()`, and has four required arguments, namely
`fixed` a formula for the fixed effects, `random` a formula for the random effects,
`family` a family object specifying the type of response variable, and `data` a data frame
containing the variables in the previously mentioned formulas. Hence, the call to fit the
random intercepts logistic regression is:
```{r}
fm1 <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
family = binomial())
```
The summary method gives a detailed output of the model:
```{r}
summary(fm1)
```
We continue by checking the impact of the chosen number of quadrature points to the
parameters estimates and the log-likelihood value at convergence. First, we refit the
model with an increasing number of quadrature points. The default when the number of
random effects is smaller or equal to two is 11 points. We fit then with 15, and 21 points:
```{r}
fm1_q11 <- fm1
fm1_q15 <- update(fm1_q11, nAGQ = 15)
fm1_q21 <- update(fm1_q11, nAGQ = 21)
models <- list("nAGQ=11" = fm1_q11, "nAGQ=15" = fm1_q15, "nAGQ=21" = fm1_q21)
```
We now extract from the model the estimated parameter for the fixed effects (using
function `fixef()`), for the random effects, and the log-likelihood (using function
`logLik()`):
```{r}
extract <- function (obj) {
c(fixef(obj), "var_(Intercept)" = obj$D[1, 1], "logLik" = logLik(obj))
}
sapply(models, extract)
```
We observe a rather stable model with virtually no differences between the different choices
of quadrature points.
We first compare the model with a simple logistic regression that does not include any
random effects:
```{r}
km <- glm(y ~ sex * time, data = DF, family = binomial())
anova(fm1, km)
```
We obtain a highly significant p-value suggesting that there are correlations in the data
that cannot be ignored. *Note:* the `anova()` method that performs the likelihood ratio
test calculates the p-value using the standard $\chi^2$ distribution, here with one
degree of freedom. However, because the null hypothesis for testing variance parameters is
on the boundary of the corresponding parameter space, it would be more appropriate to use
a mixture of $\chi^2$ distributions.
We extend model `fm1` by also including a random slopes term; however, we assume that the
covariance between the random intercepts and random slopes is zero. This is achieved by
using the `||` symbol in the specification of the `random` argument, i.e.,
```{r}
fm2 <- mixed_model(fixed = y ~ sex * time, random = ~ time || id, data = DF,
family = binomial())
```
The likelihood ratio test between the two models is computed with function `anova()`. When
two `"MixMod"` objects are provided, the function assumes that the first object represents
the model under the null hypothesis, and the second object the model under the
alternative, i.e.,
```{r}
anova(fm1, fm2)
```
The results suggest that we need the random slopes term. We continue by testing whether
the covariance between the random effects terms is zero. The model under the alternative
hypothesis is (results not shown):
```{r, eval = TRUE}
fm3 <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
family = binomial())
```
And again the likelihood ratio test is performed by a call to `anova()` (results not shown):
```{r, eval = TRUE}
anova(fm2, fm3)
```
The results now suggest that indeed the covariance between the two random effects terms
is not statistically different from zero.
## Penalized Mixed Effects Poisson Regression
We continue our illustration with a Poisson mixed effects model. We start again by
simulating some data for a Poisson longitudinal outcome:
```{r}
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we construct a data frame with the design:
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
betas <- c(2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
# we simulate random effects
b <- rnorm(n, sd = sqrt(D11))
# linear predictor
eta_y <- drop(X %*% betas + b[DF$id])
# we simulate Poisson longitudinal data
DF$y <- rpois(n * K, exp(eta_y))
```
We fit the mixed effects Poisson regression for `y` assuming random intercepts for the
random-effects part.
```{r, eval = TRUE}
gm1 <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
family = poisson())
```
As previously, the summary method gives a detailed output of the model:
```{r, eval = TRUE}
summary(gm1)
```
In settings with few subjects, repeated measurements of separation effects, the package
also allows to place a penalty in the fixed effects regression coefficients $\beta$. The
penalty/prior is in the form of a Student's t distribution with mean 0, scale parameter 1,
and 3 degrees of freedom, and it is placed in all $\beta$ coefficients except from the
intercept. The penalized model can be fitted by setting argument `penalized` to `TRUE`,
i.e.,
```{r, eval = TRUE}
gm2 <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
family = poisson(), penalized = TRUE)
```
```{r, eval = TRUE}
cbind('unpenalized' = fixef(gm1), 'penalized' = fixef(gm2))
```
In this example we observe small differences between the penalized and unpenalized models.
The users have the option to alter the specification of the Student's t penalty by
directly specifying the mean, scale and degrees of freedom arguments of the distribution.
For example, a ridge penalty could be placed by setting the degrees of freedom to a high
value. The call in this case should be:
```{r, eval = FALSE}
gm3 <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
family = poisson(),
penalized = list(pen_mu = 0, pen_sigma = 1, pen_df = 200))
```