-
Notifications
You must be signed in to change notification settings - Fork 2
/
mixed-effects-marginaleffects.Rmd
1168 lines (907 loc) · 42.6 KB
/
mixed-effects-marginaleffects.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Marginal Effects for Mixed Effects Models"
output:
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: true
toc_depth: 3
vignette: >
%\VignetteIndexEntry{Marginal Effects for Mixed Effects Models}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```r
library(knitr)
library(data.table)
#> data.table 1.14.2 using 24 threads (see ?getDTthreads). Latest news: r-datatable.com
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.17.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#>
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#>
#> ar
library(brmsmargins)
```
This vignette provides a brief overview of how to calculate
marginal effects for Bayesian regression models involving
only mixed effects (i.e., fixed and random) and fit using
the `brms` package.
## Integrating out Random Effects
A random intercept logistic regression model where a binary (0/1) outcome, $Y$
is observed at the $i^{th}$ assessment for the $j^{th}$ person and there are
$p$ variables included in the regression model can be written as:
$$
\hat{\pi}_{ij} = g \left(P \left( Y_{ij} = 1 \Big| X_{ij} = x_{ij}, u_j \right) \right) = \beta_0 + \sum_{k = 1}^p x_{ij,k} \beta_k + u_j
$$
where $g(\cdot)$ indicates the link function, here the logit
$$
\mu = g(\pi) = ln\left(\frac{\pi}{1 - \pi}\right)
$$
and $g^{-1}(\cdot)$ is the inverse link function:
$$
\pi = g^{-1}(\mu) = \frac{1}{1 + exp(-\mu)}
$$
A conditional predicted probability, conditional on the random effect can be calculated as:
$$
\hat{\pi}_{ij}(u_j = 0) =
P\left(Y_{ij} = 1 \Big| X_{ij} = x_{ij}, u_j = 0 \right) =
g^{-1} \left( \beta_0 + \sum_{k = 1}^p x_{ij,k} \beta_k + 0 \right)
$$
However, to correctly calculate a prediction that is marginal to the random
effects, the random effects must be integrated out. Not set at a specific value
or set at their mean (0).
$$
\hat{\pi}_{ij} =
P\left(Y_{ij} = 1 \Big| X_{ij} = x_{ij} \right) =
\int_{-\infty}^{\infty} g^{-1} \left( \beta_0 + \sum_{k = 1}^p x_{ij,k} \beta_k + u \right)f(u)du
$$
Integrating out the random effects analytically can quickly become complex.
For example, it rapidly becomes more complex when there are multiple random effects,
such as if there is more than one grouping or clustering variable. It also
can become more complex when different distributions are used / assumed.
Monte Carlo integration is a convenient, numerical approach that uses random
samples to approximate the integral. Continuing the simple example of a
logistic regression model where the only random effect is a random intercept,
$u_j$ and where we assume that $u_j \sim \mathcal{N}(0, \sigma^{2}_u)$,
we could draw $Q$ random samples, say 100, from $\mathcal{N}(0, \sigma^{2}_u)$,
call these $RE_a$, then Monte Carlo integration would be:
$$
\hat{\pi}_{ij} =
P\left(Y_{ij} = 1 \Big| X_{ij} = x_{ij} \right) =
\frac{\displaystyle \sum_{a = 1}^{Q} g^{-1} \left( \beta_0 + \sum_{k = 1}^p x_{ij,k} \beta_k + RE_a \right)}{Q}
$$
This approach works for most generalized linear mixed models, although the outcome would
not be a probability, necessarily, but whatever the result of the inverse
link function is.
In a Bayesian framework, this approach would be repeated for each posterior draw as both
the regression coefficients and $RE_a$ differs. Because this is repeated across each
posterior draw, a very large number of random draws, $Q$, for the Monte Carlo integration
is probably not needed. Although a modest number, say $Q = 100$, would have a relatively
large amount of simulation error, it is random error and when repeated across typically
thousands of posterior draws, the impact is likely diminished.
Once we have these marginal predictions, we can calculate marginal effects
using numerical derivatives as:
$$
\frac{P\left(Y_{ij} = 1 \Big| X_{ij} = x_{ij} + h \right) - P\left(Y_{ij} = 1 \Big| X_{ij} = x_{ij} \right)}{h}
$$
which for a continuous variable provides an approximation of the derivative,
often quite good as long as $h$ is sufficiently small.
## Using `brmsmargins()`
A simpler introduction and very brief overview and motivation
is available in the vignette for fixed effects only.
When there are fixed and random effects, calculating
average marginal effects (AMEs) is more complicated.
Generally, predictions are **conditional** on the random effects.
To deal with this, we need to integrate out the random effects
for every prediction. Please note that this is quite computationally
demanding, at least as currently implemented.
For every predicted value and each posterior draw,
random samples from the model estimated random effects distribution are drawn,
added, back transformed, and averaged.
Thus, if you wanted AMEs across a dataset of 1,000 people,
with 2,000 posterior draws, and you wanted to use 100 points for the
numerical integration, a total of 200 million (1,000 x 2,000 x 100)
values are calculated. The monte carlo integration is implemented in C++
code to try to help speed up the process, but it is not "quick"
and also may be memory intensive.
Because of the complexity involved, only limited types of mixed effects
models are supported.
## Mixed Effects Logistic Regression
We will simulate some multilevel binary data for our
mixed effects logistic regression model with individual differences
in both the intercept and slope.
```r
d <- withr::with_seed(
seed = 12345, code = {
nGroups <- 100
nObs <- 20
theta.location <- matrix(rnorm(nGroups * 2), nrow = nGroups, ncol = 2)
theta.location[, 1] <- theta.location[, 1] - mean(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] - mean(theta.location[, 2])
theta.location[, 1] <- theta.location[, 1] / sd(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] / sd(theta.location[, 2])
theta.location <- theta.location %*% chol(matrix(c(1.5, -.25, -.25, .5^2), 2))
theta.location[, 1] <- theta.location[, 1] - 2.5
theta.location[, 2] <- theta.location[, 2] + 1
d <- data.table(
x = rep(rep(0:1, each = nObs / 2), times = nGroups))
d[, ID := rep(seq_len(nGroups), each = nObs)]
for (i in seq_len(nGroups)) {
d[ID == i, y := rbinom(
n = nObs,
size = 1,
prob = plogis(theta.location[i, 1] + theta.location[i, 2] * x))
]
}
copy(d)
})
mlogit <- brms::brm(
y ~ 1 + x + (1 + x | ID), family = "bernoulli",
data = d, seed = 1234,
silent = 2, refresh = 0,
chains = 4L, cores = 4L, backend = "cmdstanr")
#> Compiling Stan program...
#> Warning: 25 of 4000 (1.0%) transitions ended with a divergence.
#> See https://mc-stan.org/misc/warnings for details.
```
```r
summary(mlogit)
#> Warning: There were 25 divergent transitions after warmup. Increasing
#> adapt_delta above may help. See http://mc-stan.org/misc/warnings.html#divergent-
#> transitions-after-warmup
#> Family: bernoulli
#> Links: mu = logit
#> Formula: y ~ 1 + x + (1 + x | ID)
#> Data: d (Number of observations: 2000)
#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 4000
#>
#> Group-Level Effects:
#> ~ID (Number of levels: 100)
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept) 1.17 0.19 0.85 1.58 1.01 1038 1913
#> sd(x) 0.53 0.28 0.03 1.11 1.03 223 409
#> cor(Intercept,x) -0.16 0.42 -0.81 0.82 1.01 1403 1047
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept -2.47 0.20 -2.89 -2.10 1.01 1116 1292
#> x 0.95 0.18 0.59 1.34 1.00 2152 2024
#>
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
```
### AMEs
Now we can use `brmsmargins()`. By default, it will
only use the fixed effects. To integrate out random effects,
we specify `effects = "integrateoutRE"`. The number of
values used for numerical integration are set via the argument, `k`,
here `k = 100L`, the default.
More details are in: `?brmsmargins:::.predict`
```r
h <- .001
ame1 <- brmsmargins(
mlogit,
add = data.frame(x = c(0, h)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
effects = "integrateoutRE", k = 100L, seed = 1234)
knitr::kable(ame1$ContrastSummary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|-----:|-----:|-----:|-----:|-----------:|----------:|----:|:------|:----|:---|:-----|
| 0.112| 0.112| 0.063| 0.169| NA| NA| 0.99|HDI |NA |NA |AME x |
We can follow a similar process getting discrete predictions at
x held at 0 or 1. In this instance, the summary of
predictions is more interesting as well, since they are at meaningfully
different values of `x`. They also agree quite closely with the
average probability at different `x` values calculated in the data.
```r
ame2 <- brmsmargins(
mlogit,
at = data.frame(x = c(0, 1)),
contrasts = cbind("AME x" = c(-1, 1)),
effects = "integrateoutRE", k = 100L, seed = 1234)
```
Here is a summary of the predictions.
```r
knitr::kable(ame2$Summary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |
|-----:|-----:|-----:|-----:|-----------:|----------:|----:|:------|:----|:---|
| 0.120| 0.118| 0.071| 0.174| NA| NA| 0.99|HDI |NA |NA |
| 0.231| 0.231| 0.157| 0.302| NA| NA| 0.99|HDI |NA |NA |
```r
knitr::kable(ame2$ContrastSummary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|-----:|-----:|----:|-----:|-----------:|----------:|----:|:------|:----|:---|:-----|
| 0.111| 0.111| 0.06| 0.169| NA| NA| 0.99|HDI |NA |NA |AME x |
```r
knitr::kable(d[, .(M = mean(y)), by = .(ID, x)][, .(M = mean(M)), by = x])
```
| x| M|
|--:|-----:|
| 0| 0.117|
| 1| 0.228|
Note that when integrating out random effects, the random seed is
quite important. If the `seed` argument is not specified,
`brmsmargins()` will randomly select one. This would not matter
when generating predictions only from fixed effects, but when
using random samples to integrate out random effects, if different
random seeds are used for different predictions, you would expect
some (small) differences even for the same input data for prediction.
This may not be an issue for predictions on their own. However,
when numerically approximating a derivative by a very small difference
in predictions, such as with `h = .001` tiny differences are magnified.
To see the impact, consider this example where we explicitly set multiple
random seeds, one for each row of the data used for predictions.
In both cases, we use exactly `x = 0`, so the difference is due to
Monte Carlo variation only, but with `k = 10L` the small error,
when divided by `h = .001` becomes very large, impossibly so.
```r
h <- .001
ame.error <- brmsmargins(
mlogit,
add = data.frame(x = c(0, 0)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
effects = "integrateoutRE", k = 10L, seed = c(1234, 54321))
knitr::kable(ame.error$ContrastSummary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|------:|-----:|--------:|-------:|-----------:|----------:|----:|:------|:----|:---|:-----|
| -0.703| -0.98| -162.906| 172.759| NA| NA| 0.99|HDI |NA |NA |AME x |
This disappears when we use the same seed for each row of the data
used for predictions. Here we get all zeros for the difference, as
we would expect. Note that you do not need to specify a seed for each
row of the data. You can specify one seed (or rely on `brmsmargins()` default),
which will then be used for all rows of the data.
```r
h <- .001
ame.noerror <- brmsmargins(
mlogit,
add = data.frame(x = c(0, 0)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
effects = "integrateoutRE", k = 10L, seed = c(1234, 1234))
knitr::kable(ame.noerror$ContrastSummary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|--:|---:|--:|--:|-----------:|----------:|----:|:------|:----|:---|:-----|
| 0| 0| 0| 0| NA| NA| 0.99|HDI |NA |NA |AME x |
### Marginal Coefficients
The fixed effects coefficients are conditional on the random effects.
To aide interpretation, we also can calculate marginal coefficients or
population averaged coefficients. The function to do this is
`marginalcoef()` which uses the method described by Hedeker and colleagues
(2018). Here is an example and comparison to results using a single level
logistic regression that ignores the clustering in the data.
```r
## calculate marginal coefficients
mc.logit <- marginalcoef(mlogit, CI = 0.95)
## calculate single level logistic regression
glm.logit <- glm(y ~ 1 + x, family = "binomial", data = d)
glm.logit <- as.data.table(cbind(Est = coef(glm.logit), confint(glm.logit)))
#> Waiting for profiling to be done...
```
Now we can view and compare the results.
```r
knitr::kable(cbind(
mc.logit$Summary[, .(
MargCoef = sprintf("%0.3f", round(M, 3)),
MargCoefCI = sprintf("[%0.3f, %0.3f]", round(LL, 3), round(UL, 3)))],
glm.logit[, .(
GLMCoef = sprintf("%0.3f", round(Est, 3)),
GLMCI = sprintf("[%0.3f, %0.3f]", round(`2.5 %`, 3), round(`97.5 %`, 3)))]))
```
|MargCoef |MargCoefCI |GLMCoef |GLMCI |
|:--------|:----------------|:-------|:----------------|
|-2.010 |[-2.386, -1.666] |-2.021 |[-2.219, -1.833] |
|0.800 |[0.520, 1.083] |0.802 |[0.561, 1.047] |
## Mixed Effects Poisson Regression
We will simulate some multilevel poisson data for our
mixed effects poisson regression model with individual differences
in both the intercept and slope.
```r
dpoisson <- withr::with_seed(
seed = 12345, code = {
nGroups <- 100
nObs <- 20
theta.location <- matrix(rnorm(nGroups * 2), nrow = nGroups, ncol = 2)
theta.location[, 1] <- theta.location[, 1] - mean(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] - mean(theta.location[, 2])
theta.location[, 1] <- theta.location[, 1] / sd(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] / sd(theta.location[, 2])
theta.location <- theta.location %*% chol(matrix(c(1.5, -.25, -.25, .5^2), 2))
theta.location[, 1] <- theta.location[, 1] - 2.5
theta.location[, 2] <- theta.location[, 2] + 1
d <- data.table(
x = rep(rep(0:1, each = nObs / 2), times = nGroups))
d[, ID := rep(seq_len(nGroups), each = nObs)]
for (i in seq_len(nGroups)) {
d[ID == i, y := rpois(
n = nObs,
lambda = exp(theta.location[i, 1] + theta.location[i, 2] * x))
]
}
copy(d)
})
mpois <- brms::brm(
y ~ 1 + x + (1 + x | ID), family = "poisson",
data = dpoisson, seed = 1234,
chains = 4L, cores = 4L, backend = "cmdstanr",
silent = 2, refresh = 0, adapt_delta = 0.99)
#> Compiling Stan program...
```
```r
summary(mpois)
#> Family: poisson
#> Links: mu = log
#> Formula: y ~ 1 + x + (1 + x | ID)
#> Data: dpoisson (Number of observations: 2000)
#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 4000
#>
#> Group-Level Effects:
#> ~ID (Number of levels: 100)
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept) 1.18 0.15 0.91 1.52 1.00 1353 2327
#> sd(x) 0.36 0.20 0.02 0.75 1.01 308 949
#> cor(Intercept,x) -0.06 0.41 -0.75 0.85 1.00 1680 1536
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept -2.47 0.17 -2.82 -2.14 1.00 1777 2484
#> x 0.97 0.15 0.68 1.27 1.00 3307 2893
#>
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
```
### AMEs
We use `brmsmargins()` in the same way as for the mixed effects logistic regression.
Here is an example with a numeric derivative treating `x` as continuous.
```r
h <- .001
ame1.pois <- brmsmargins(
mpois,
add = data.frame(x = c(0, h)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
effects = "integrateoutRE", k = 100L, seed = 1234)
knitr::kable(ame1.pois$ContrastSummary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|-----:|-----:|----:|-----:|-----------:|----------:|----:|:------|:----|:---|:-----|
| 0.332| 0.312| 0.14| 0.731| NA| NA| 0.99|HDI |NA |NA |AME x |
Here is an example treating `x` as discrete.
```r
ame2.pois <- brmsmargins(
mpois,
at = data.frame(x = c(0, 1)),
contrasts = cbind("AME x" = c(-1, 1)),
effects = "integrateoutRE", k = 100L, seed = 1234)
knitr::kable(ame2.pois$ContrastSummary)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|--------:|---------:|---------:|---------:|-----------:|----------:|----:|:------|:----|:---|:-----|
| 0.294537| 0.2780058| 0.1076343| 0.5973478| NA| NA| 0.99|HDI |NA |NA |AME x |
### Marginal Coefficients
Just as for mixed effects logistic regression,
we can calculate marginal or population averaged coefficients for
mixed effects poisson regression using the same process as described
by Hedeker and colleagues (2018).
Here is an example and comparison to results using a single level
poisson regression that ignores the clustering in the data.
```r
## calculate marginal coefficients
mc.pois <- marginalcoef(mpois, CI = 0.95)
## calculate single level logistic regression
glm.pois <- glm(y ~ 1 + x, family = "poisson", data = d)
glm.pois <- as.data.table(cbind(Est = coef(glm.pois), confint(glm.pois)))
#> Waiting for profiling to be done...
```
Now we can view and compare the results.
```r
knitr::kable(cbind(
mc.pois$Summary[, .(
MargCoef = sprintf("%0.3f", round(M, 3)),
MargCoefCI = sprintf("[%0.3f, %0.3f]", round(LL, 3), round(UL, 3)))],
glm.pois[, .(
GLMCoef = sprintf("%0.3f", round(Est, 3)),
GLMCI = sprintf("[%0.3f, %0.3f]", round(`2.5 %`, 3), round(`97.5 %`, 3)))]))
```
|MargCoef |MargCoefCI |GLMCoef |GLMCI |
|:--------|:----------------|:-------|:----------------|
|-1.765 |[-2.218, -1.239] |-1.858 |[-2.019, -1.705] |
|0.987 |[0.697, 1.279] |0.983 |[0.802, 1.170] |
## Mixed Effects Negative Binomial Regression
Negative binomial models work the same way as for poisson models.
We use the same dataset, just for demonstration.
```r
mnb <- brms::brm(
y ~ 1 + x + (1 + x | ID), family = "negbinomial",
data = dpoisson, seed = 1234,
chains = 4L, cores = 4L, backend = "cmdstanr",
silent = 2, refresh = 0, adapt_delta = 0.99)
#> Compiling Stan program...
```
```r
summary(mnb)
#> Family: negbinomial
#> Links: mu = log; shape = identity
#> Formula: y ~ 1 + x + (1 + x | ID)
#> Data: dpoisson (Number of observations: 2000)
#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 4000
#>
#> Group-Level Effects:
#> ~ID (Number of levels: 100)
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept) 1.19 0.15 0.93 1.51 1.01 975 1816
#> sd(x) 0.35 0.20 0.01 0.74 1.01 252 821
#> cor(Intercept,x) -0.07 0.41 -0.77 0.84 1.00 1432 1277
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept -2.46 0.18 -2.83 -2.14 1.00 1057 1993
#> x 0.98 0.15 0.70 1.28 1.00 2711 2153
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> shape 61.06 63.22 8.15 243.02 1.00 5885 2745
#>
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
```
### AMEs
We use `brmsmargins()` in the same way as for the mixed effects poisson regression.
Here is an example with a numeric derivative treating `x` as continuous.
```r
h <- .001
ame1.nb <- brmsmargins(
mnb,
add = data.frame(x = c(0, h)),
contrasts = cbind("AME x" = c(-1 / h, 1 / h)),
effects = "integrateoutRE", k = 100L, seed = 1234)
knitr::kable(ame1.nb$ContrastSummary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|-----:|-----:|-----:|-----:|-----------:|----------:|----:|:------|:----|:---|:-----|
| 0.335| 0.313| 0.141| 0.777| NA| NA| 0.99|HDI |NA |NA |AME x |
Here is an example treating `x` as discrete.
```r
ame2.nb <- brmsmargins(
mnb,
at = data.frame(x = c(0, 1)),
contrasts = cbind("AME x" = c(-1, 1)),
effects = "integrateoutRE", k = 100L, seed = 1234)
knitr::kable(ame2.nb$ContrastSummary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|-----:|----:|-----:|-----:|-----------:|----------:|----:|:------|:----|:---|:-----|
| 0.298| 0.28| 0.141| 0.662| NA| NA| 0.99|HDI |NA |NA |AME x |
### Marginal Coefficients
Negative binomial models cannot be fit by the `glm()` function in `R`
so we just show the population averaged values from `brms`.
```r
## calculate marginal coefficients
mc.nb <- marginalcoef(mnb, CI = 0.95)
```
View the results.
```r
knitr::kable(
mc.nb$Summary[, .(
MargCoef = sprintf("%0.3f", round(M, 3)),
MargCoefCI = sprintf("[%0.3f, %0.3f]", round(LL, 3), round(UL, 3)))])
```
|MargCoef |MargCoefCI |
|:--------|:----------------|
|-1.759 |[-2.225, -1.234] |
|0.988 |[0.725, 1.283] |
## Centered Categorical Predictors
In mixed effects models, it is common to center continuous predictors.
Consider a longitudinal study where sleep was collected nightly for a week
in multiple people. Hours of sleep duration will have variation that
exists between people (different people have different typical or average
sleep durations) and within people (the same person will sleep different
amounts on different nights).
Entering observed hours of sleep duration as a predictor in the model will
conflate associations between sleep duration and the outcome that exist
at the between person and within person level.
Here is a small example for two people with observed sleep ("Sleep"), the between
person component, the average sleep per person ("BSleep"), and
the within person sleep, the person centered sleep duration ("WSleep").
| ID | Sleep | BSleep | WSleep |
|:--:|:-----:|:------:|:------:|
| 1 | 6 | 7 | -1 |
| 1 | 7 | 7 | 0 |
| 1 | 8 | 7 | +1 |
| 2 | 4 | 5 | -1 |
| 2 | 5 | 5 | 0 |
| 2 | 6 | 5 | +1 |
These values are obtained by averaging the observed sleep values by ID.
Then by taking the difference between the observed sleep duration values
and the between person mean sleep values, thus:
$$
WSleep = Sleep - BSleep
$$
Both `BSleep` and `WSleep` could be included as predictor variables in a model,
or if the interest is solely in the within person association, `WSleep` only
included.
This type of centering is relatively common in mixed effects or multilevel models,
at least for continuous predictors. The presence, or absence, of such centering
for continuous variables has relatively little bearing on calculating the
average marginal effects (AMEs) as generally both variables remain continuous
and a derivative makes sense.
In contrast to continuous predictors where it is common, it is relatively
*un*common to center categorical predictors. However, research
(Yaremych, Preacher, & Hedeker, 2021) highlights how the same rationale
that apply to continuous predictors, in regards to the benefits of centering,
equally apply to categorical predictors.
One area where particular attention has been paid to the importance
of centering, whether continuous or categorical predictors,
are in individual participant data meta analyses (IPDMA) with one stage analyses.
In these IPDMA where there are treatment x covariate interactions to evaluate
effect modifiers, centering the predictor / covariate is critical for
avoiding 'ecological bias' (Hua et al., 2017).
Yaremych et al (2021) showed that an equivalent algebraic approach to
creating centered continuous variables can be applied to categorical variables,
after coding (e.g., using dummy coding). In the same study used for the
continuous example, suppose that whether or not people had a good night's
sleep was recorded dummy coded as yes = 1 and no = 0.
This table shows how it might be centered.
| ID | GoodSleep | BSleep | WSleep |
|:--:|:---------:|:------:|:------:|
| 1 | 0 | 0.50 | -0.50 |
| 1 | 0 | 0.50 | -0.50 |
| 1 | 1 | 0.50 | +0.50 |
| 1 | 1 | 0.50 | +0.50 |
| 2 | 0 | 0.75 | -0.75 |
| 2 | 1 | 0.75 | +0.25 |
| 2 | 1 | 0.75 | +0.25 |
| 2 | 1 | 0.75 | +0.25 |
| 3 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 |
We can include these new variables as predictors in a model just as any other
predictors. However, in contrast to the continuous case where for the AMEs we
tend to calculate a derivative, the categorical case is more challenging.
Typically, for categorical variables, AMEs are calculated as the
AME moving from one category to another category.
This can easily be accomplished in `brmsmargins` using the `at` argument.
However, when a categorical variable is centered by ID, the values
for one category are not constant by ID.
To address, this, `brmsmargins` implements an additional argument, `wat`
for within level `at`. This argument is used to give the values to be used
at each ID level. The `wat` argument is used in conjuction with the `at` argument.
The `at` argument continues to be used for the general setup, with `wat` used
for the specific values.
This is more easily shown than said. Here we simulate some multilevel data
with a binary predictor, `x` and a binary outcome, `y`.
```r
d <- withr::with_seed(
seed = 12345, code = {
nGroups <- 100
nObs <- 20
theta.location <- matrix(rnorm(nGroups * 2), nrow = nGroups, ncol = 2)
theta.location[, 1] <- theta.location[, 1] - mean(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] - mean(theta.location[, 2])
theta.location[, 1] <- theta.location[, 1] / sd(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] / sd(theta.location[, 2])
theta.location <- theta.location %*% chol(matrix(c(1.5, -.25, -.25, .5^2), 2))
theta.location[, 1] <- theta.location[, 1] - 2.5
theta.location[, 2] <- theta.location[, 2] + 1
d <- data.table(ID = seq_len(nGroups))
d <- d[, .(x = rbinom(n = nObs, size = 1, prob = runif(1, 0, 1))), by = ID]
for (i in seq_len(nGroups)) {
d[ID == i, y := rbinom(
n = nObs,
size = 1,
prob = plogis(theta.location[i, 1] + theta.location[i, 2] * x))
]
}
copy(d)
})
```
Now we can create a between and within person version of `x`,
which we will call `xb` and `xw`. We also fit the `brms` model, using
`xw`. In this instance we are not interested in any between person effects,
so that variable is omitted. Any variation not explained by it will go into
the random intercept, anyways.
```r
## within person centering binary predictor
d[, xb := mean(x), by = ID]
d[, xw := x - xb]
mlogitcent <- brms::brm(
y ~ 1 + xw + (1 + xw | ID), family = "bernoulli",
data = d, seed = 1234,
silent = 2, refresh = 0,
chains = 4L, cores = 4L, backend = "cmdstanr")
#> Compiling Stan program...
```
Now comes the new part. Based on how we coded our predictor, `x`,
we will create the two values by ID. In this case, `x` was coded
as 0 and 1. So we start with 0 as the low value and 1 as the high value.
Then we need to take the difference between 0 and 1 with the mean
by ID. We (arbitrarily) label these `a` and `b`.
We could have chosen any label, `min` and `max` or `low` and `high` or
whatever we wanted. It just needs to be distinct and something that
`brmsmargins()` can match between the `at` and the `wat` arguments.
The resulting data get turned into a single long `data.table`,
called `xwid`. We use this to create a list, saved as `usewat`.
The list must have two elements at a minimum:
1. A named element, named "ID" that contains a character string with
the name of the ID variable.
2. An element named after the centered variable, with a `data.frame`
or `data.table` containing the values to use for each ID and
for each label we chose, here `a` and `b`.
The table shows a few examples of what the dataset looks like.
```r
xwid <- melt(
d[, .(a = 0 - na.omit(xb)[1],
b = 1 - na.omit(xb)[1]),
by = ID], id.vars = "ID")
usewat <- list(ID = "ID", xw = xwid)
knitr::kable(xwid[ID %in% c(1, 2, 4, 100)][order(ID)], digits = 2)
```
| ID|variable | value|
|---:|:--------|-----:|
| 1|a | 0.00|
| 1|b | 1.00|
| 2|a | -0.30|
| 2|b | 0.70|
| 4|a | -1.00|
| 4|b | 0.00|
| 100|a | -0.15|
| 100|b | 0.85|
This new dataset shows what values to use for moving from
one category of `x` to another category of `x` looks like,
for each ID. Note that sometimes these are the same. In the case
of ID 1, all `x` values were identical and thus all `xw` values
will be 0. This is appropriate because for that ID, there was no
variation on `x` and so we have no idea what the "true" change
for ID 1 would be if moving from one category to another.
Now we can use `brmsmargins()` to calculate the AME.
We must still specify the `at` argument. Here we use
our arbitrary lables of `a` and `b` for the variable,
`xw`. We also pass our list to the argument `wat`.
Internally, `brmsmargins()` will substitute `a` for
all the values for `a` by ID from our list, and similarly for
`b`. The reason for this separation is in the `at` argument,
one might have specific values of other variables as well,
had there been other predictors in the model.
The resulting summary gives the average marginal predictions
for `xw = a` and `xw = b` which in our case refers to when
`xw` is at 0 on `x` **for that ID** and when it is at 1 for `x`
**for that ID**.
```r
ame.cent <- brmsmargins(
object = mlogitcent,
at = expand.grid(xw = c("a", "b")),
wat = usewat,
contrasts = cbind("AME xw" = c(-1, 1)),
effects = "integrateoutRE", k = 20L)
knitr::kable(ame.cent$Summary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |
|-----:|-----:|-----:|-----:|-----------:|----------:|----:|:------|:----|:---|
| 0.122| 0.119| 0.051| 0.206| NA| NA| 0.99|HDI |NA |NA |
| 0.231| 0.228| 0.116| 0.345| NA| NA| 0.99|HDI |NA |NA |
Just as in other examples, we can get a summary of the
contrast of these two values, our AME.
```r
knitr::kable(ame.cent$ContrastSummary, digits = 3)
```
| M| Mdn| LL| UL| PercentROPE| PercentMID| CI|CIType |ROPE |MID |Label |
|-----:|-----:|-----:|-----:|-----------:|----------:|----:|:------|:----|:---|:------|
| 0.109| 0.109| 0.041| 0.184| NA| NA| 0.99|HDI |NA |NA |AME xw |
## Interactions and Marginal Effects
As a final example, we will look at a model with an interaction.
Specifically we have a binary outcome, `y`, measured repeatedly.
A binary predictor, `x` measured repeatedly within units and
group membership, `Group`, a binary variable that is constant within
units so only varies between units.
To map these to a concrete potential research question, suppose that
100 people were randomized 1:1 to either a waitlist control group
(Group = 0) or an emotion regulation intervention designed to help
manage stress (Group = 1). After completing the intervention,
participants completed 20 days of daily diaries. Sleep was recorded
each day and categorized as a good night sleep
(`y` = 1) or a poor night sleep (`y` = 0).
Each day participants' also reported whether they experienced
no stressor (`x` = 0) or any stressor (`x` = 1).
The goal is to examine whether the intervention group moderates
participants' stress reactivity, defined as the association between
experiencing a stressor or not on a given day and whether they have
good or poor sleep that night. Based on clinical judgement and the
intervention cost, it is estimated that any probability difference
of plus or minus 2% is practically equivalent, so the
region of practical equivalence (ROPE) is set at -0.02 to +0.02.
A probability difference of 5% or more is considered the minimally
important difference (MID), the smallest difference that is clinically
meaningful. Thus the MID is set at less than or equal to -0.05 or
greater than or equal to +0.05.
To show this example, we first simulate a dataset, using a similar
process as in previous examples.
```r
d <- withr::with_seed(
seed = 12345, code = {
nGroups <- 100
nObs <- 20
theta.location <- matrix(rnorm(nGroups * 2), nrow = nGroups, ncol = 2)
theta.location[, 1] <- theta.location[, 1] - mean(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] - mean(theta.location[, 2])
theta.location[, 1] <- theta.location[, 1] / sd(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] / sd(theta.location[, 2])
theta.location <- theta.location %*% chol(matrix(c(1.5, -.25, -.25, .5^2), 2))
theta.location[, 1] <- theta.location[, 1] - 2.5
theta.location[, 2] <- theta.location[, 2] + rep(c(-1, 1), each = nGroups / 2)
d <- data.table(ID = seq_len(nGroups), group = rep(0:1, each = nGroups / 2))
d <- d[, .(x = rbinom(n = nObs, size = 1, prob = runif(1, 0, 1))), by = .(ID, group)]
for (i in seq_len(nGroups)) {
d[ID == i, y := rbinom(
n = nObs,
size = 1,
prob = plogis(theta.location[i, 1] + theta.location[i, 2] * x))
]
}
copy(d)
})
```
Following the earlier example, we center our binary predictor, `x`
(whether participants experienced any stressor that day).
This separates the effects of experiencing a stressor within a person
from the fact that some people may experience stressors most days
or rarely experience any stressors. We call the between person variable,
the average proportion of days experiencing a stressor, `xb`.
We call the within person variable, `xw`. As in the example on
centering within person categorical variables, we also create a
dataset with the within person hypothetical values for each person,
`usewat`.
```r
## within person centering binary predictor
d[, xb := mean(x), by = ID]
d[, xw := x - xb]
xwid <- melt(
d[, .(a = 0 - na.omit(xb)[1],
b = 1 - na.omit(xb)[1]),
by = ID], id.vars = "ID")
usewat <- list(ID = "ID", xw = xwid)
```