/
model-selection-GLMs-AIC-what-to-report.Rmd
185 lines (142 loc) · 7.7 KB
/
model-selection-GLMs-AIC-what-to-report.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
title: So, you did some GLMs & compared with AIC. Congrats!
author: Jaime Ashander
date: 2015-10-26
tags: R, statistics, model selection, GLM, AIC, deviance
output:
md_document:
fig_caption: true
variant: markdown_strict
dev: 'svg'
encoding: 'UTF-8'
---
```{r knit-opts, echo=FALSE}
## important to let caching work between local render and building w/ pelican
knitr::opts_chunk$set(cache = 1)
options(digits=2)
```
Here's what you need to report in a paper about the model comparison:
- residual deviance
- residual df
- delta AIC
- AIC weight
You should also report the null deviance and degrees of freedom,
maybe in a table caption.
Thanks to [Emilio Bruna](http://brunalab.org/) for prompting this post and
suggesting its title.
Below I'll do a worked example, explain why you should include at least these
numbers, and mention some situations where it's better to use something other
than AIC.
### Example: UC Berkeley admissions and gender
Let's look at the built-in `UCBAdmissions` data. As R
will tell you, these data are often used to illustrate Simpson's paradox (see
`?UCBAdmissions` and Bickel _et al._ 1975 or my PPS below).
```{r, admit-data}
d <- as.data.frame(UCBAdmissions)
d <- tidyr::spread(d, Admit, Freq) # use Hadley's excellent tidyr to reshape
d[order(d$Dept), ]
```
Using logistic regression, encode several models for the effect of applicant
gender, department identity, or both on admission.
```{r, glm-example}
m1 <- glm(cbind(Admitted, Rejected) ~ Gender, d, family='binomial')
m2 <- glm(cbind(Admitted, Rejected) ~ Dept, d, family = 'binomial')
m3 <- glm(cbind(Admitted, Rejected) ~ Dept + Gender, d, family = 'binomial')
```
(Note: although we might like to allow for an interaction between gender and
department, the data here are insufficient to fit such a model.)
Running `summary` on any one of the fits yields a bunch of stats: AIC, Residual
and null deviance, as well as coefficients, their standard errors, and
significance.
### What to report
We could type by hand the AIC and other stats. Instead, we'll use [David Robinson](http://varianceexplained.org/)'s `broom` which gives tidy summaries of model objects.
```{r, broom-it-up}
summary.table <- do.call(rbind, lapply(list(m1, m2, m3), broom::glance))
summary.table[["model"]] <- 1:3
```
If we take a look at `summary.table`, we'll see it has all the ingredients we might like to report from model selection, whether via AIC, BIC, or just the deviance. These are, in order, `r names(summary.table)`.
Creating a table with our own desired columns in an appropriate order is easy.
```{r, reordered-subset}
table.cols <- c("model", "df.residual", "deviance", "AIC")
reported.table <- summary.table[table.cols]
names(reported.table) <- c("Model", "Resid. Df", "Resid. Dev", "AIC")
```
For model selection, a model's AIC is only meaningful relative to that of other
models, so Akaike and others recommend reporting differences in AIC from the
best model, $\Delta$_AIC_, and AIC weights _wAIC_. The latter can be viewed as
an estimate of the probability a model gives the best predictions on new data
(conditional on the models considered; Burnham and Anderson 2002, McElreath
2015).
```{r, delta-and-weights}
reported.table[['dAIC']] <- with(reported.table, AIC - min(AIC))
reported.table[['wAIC']] <- with(reported.table, exp(- 0.5 * dAIC) / sum(exp(- 0.5 * dAIC)))
reported.table$AIC <- NULL
```
With this advice in mind, here's *a minimal table for reporting model selection on GLMs*:
Caption: Model selection for the effect gender (model 1), department (model 2),
and both gender and department (model 3) on admission probability with `r unique(summary.table$df.null)`
null degrees of freedom and `r unique(summary.table$null.deviance)` null deviance.
```{r,model-selection-table}
reported.table
```
Even if using an information criterion, include the residual deviance and
degrees of freedom for each model. These provide a (rough) goodness of fit
measure for each model.
Of course, model selection is just the beginning of reporting your results. See
the PPS below for some thoughts on reporting results of the best model(s).
### Is AIC the right criterion?
* For small data, you should use AICc -- the small sample correction which
provides greater penalty for each parameter but approaches AIC as $n$ becomes
large. If it makes a difference, you should use it. (I probably should have
used it above.)
* For large data, consider BIC, which is asymptotically consistent while AIC is
not (see Hastie _et al._ 2009, which is available online). AIC typically
favors overly-complex models with large $n$ relative to BIC.
* For Bayesian models, consider [WAIC or
LOO](http://andrewgelman.com/2015/07/16/new-papers-on-loowaic-and-stan/)
(instead of DIC, which has issues with non-Gaussian posteriors McElreath
2015).
* Don't use information criteria for model selection between GLMs with
different link functions
### Further Reading & References
* [Ben Bolker on reporting results in the text](http://ms.mcmaster.ca/~bolker/misc/GLMM_results_report.pdf)
* Burnham, Kenneth P., and David R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York, NY, USA.
* [Hastie, Tibshirani, and Friedman. 2009](http://statweb.stanford.edu/~tibs/ElemStatLearn/) Elements of Statistical Learning, Second Edition. Springer, New York, NY, USA.
* [McElreath. 2015.](https://www.crcpress.com/Statistical-Rethinking-A-Bayesian-Course-with-Examples-in-R-and-Stan/McElreath/9781482253443) Statistical Rethinking. CRC Press.
* [Brian O'Meara's AIC tutorial](http://brianomeara.info/tutorials/aic/)
### PS: Nested models
Reporting the residual deviance and degrees of freedom as above is relatively
similar to R's output for conducting an ANOVA on a GLM (where you can
optionally add a statistical test). For nested models, you may as well just
do this and report the table:
```{r,best-model-anova-table}
m3.anova <- anova(m3, test="Chisq")
round(m3.anova, digits = 4)
```
### PPS: Evaluating the best model
For the best model(s), you should then go on to report model results in the
text, as
[Ben Bolker demonstrates](http://ms.mcmaster.ca/~bolker/misc/GLMM_results_report.pdf),
and visualize the fit relative to the data.
To visualize the admissions data and mean fits from models 2 and 3 (which have
approximately equal AIC weight), we can use `ggplot2` and `augment` from broom
(which adds model predictions and statistics to the data):
```{r, model-viz, fig.cap="Admissions (open circles, size indicates total applicants) versus deparatment and predictions from model 2 (department only, diamonds) and model 3 (department and gender, dots and triangles)"}
m3.pred <- broom::augment(m3)
m2.pred <- broom::augment(m2)
library(ggplot2)
ggplot(d) +
geom_point(aes(Dept, Admitted / (Admitted + Rejected), color=Gender, size=Admitted + Rejected)) +
geom_point(aes(Dept, plogis(.fitted)), data =m2.pred, shape =5, alpha=0.4) + geom_point(aes(Dept, plogis(.fitted), shape=Gender), alpha = 0.4, data = m3.pred) +
theme_minimal() + scale_color_manual(values=c("blue", "orange")) + guides(shape=FALSE)
```
Admissions (colored dots, size indicates total applicants) versus department
and predictions from model 2 (department only, diamonds) and model 3
(department and gender, dots and triangles)
#### Simpson's paradox
Comparing model 3 with model 1 illustrates Simpson's paradox. Without
accounting for department, the apparent effect of female gender on admission is
negative (female odds relative to male `r exp(coef(m1)['GenderFemale'])`, $p
\approx 0$), whereas after accounting for department, the effect is positive
(but weak: female odds relative to male `r exp(coef(m3)['GenderFemale'])`, $p
\approx$ `r round(broom::tidy(m3)[7, "p.value"], 2)`).