Browse files

models selection with AIC and GLM example

  • Loading branch information...
ashander committed Oct 27, 2015
1 parent eb50b3d commit b25af179befbfdcdf013f7b484092296a0324bf6
Showing with 185 additions and 0 deletions.
  1. +185 −0 content/notes/model-selection-GLMs-AIC-what-to-report.Rmd
@@ -0,0 +1,185 @@
title: So, you did some GLMs & compared with AIC. Congrats!
author: Jaime Ashander
date: 2015-10-26
tags: R, statistics, model selection, GLM, AIC, deviance
fig_caption: true
variant: markdown_strict
dev: 'svg'
encoding: 'UTF-8'

```{r knit-opts, echo=FALSE}
## important to let caching work between local render and building w/ pelican
knitr::opts_chunk$set(cache = 1)

Here's what you need to report in a paper about the model comparison:

- residual deviance
- residual df
- delta AIC
- AIC weight

You should also report the null deviance and degrees of freedom,
maybe in a table caption.

Thanks to [Emilio Bruna]( for prompting this post and
suggesting its title.
Below I'll do a worked example, explain why you should include at least these
numbers, and mention some situations where it's better to use something other
than AIC.

### Example: UC Berkeley admissions and gender

Let's look at the built-in `UCBAdmissions` data. As R
will tell you, these data are often used to illustrate Simpson's paradox (see
`?UCBAdmissions` and Bickel _et al._ 1975 or my PPS below).

```{r, admit-data}
d <-
d <- tidyr::spread(d, Admit, Freq) # use Hadley's excellent tidyr to reshape
d[order(d$Dept), ]

Using logistic regression, encode several models for the effect of applicant
gender, department identity, or both on admission.

```{r, glm-example}
m1 <- glm(cbind(Admitted, Rejected) ~ Gender, d, family='binomial')
m2 <- glm(cbind(Admitted, Rejected) ~ Dept, d, family = 'binomial')
m3 <- glm(cbind(Admitted, Rejected) ~ Dept + Gender, d, family = 'binomial')

(Note: although we might like to allow for an interaction between gender and
department, the data here are insufficient to fit such a model.)

Running `summary` on any one of the fits yields a bunch of stats: AIC, Residual
and null deviance, as well as coefficients, their standard errors, and

### What to report

We could type by hand the AIC and other stats. Instead, we'll use [David Robinson]('s `broom` which gives tidy summaries of model objects.

```{r, broom-it-up}
summary.table <-, lapply(list(m1, m2, m3), broom::glance))
summary.table[["model"]] <- 1:3

If we take a look at `summary.table`, we'll see it has all the ingredients we might like to report from model selection, whether via AIC, BIC, or just the deviance. These are, in order, `r names(summary.table)`.

Creating a table with our own desired columns in an appropriate order is easy.

```{r, reordered-subset}
table.cols <- c("model", "df.residual", "deviance", "AIC")
reported.table <- summary.table[table.cols]
names(reported.table) <- c("Model", "Resid. Df", "Resid. Dev", "AIC")

For model selection, a model's AIC is only meaningful relative to that of other
models, so Akaike and others recommend reporting differences in AIC from the
best model, $\Delta$_AIC_, and AIC weights _wAIC_. The latter can be viewed as
an estimate of the probability a model gives the best predictions on new data
(conditional on the models considered; Burnham and Anderson 2002, McElreath

```{r, delta-and-weights}
reported.table[['dAIC']] <- with(reported.table, AIC - min(AIC))
reported.table[['wAIC']] <- with(reported.table, exp(- 0.5 * dAIC) / sum(exp(- 0.5 * dAIC)))
reported.table$AIC <- NULL

With this advice in mind, here's *a minimal table for reporting model selection on GLMs*:

Caption: Model selection for the effect gender (model 1), department (model 2),
and both gender and department (model 3) on admission probability with `r unique(summary.table$df.null)`
null degrees of freedom and `r unique(summary.table$null.deviance)` null deviance.

Even if using an information criterion, include the residual deviance and
degrees of freedom for each model. These provide a (rough) goodness of fit
measure for each model.

Of course, model selection is just the beginning of reporting your results. See
the PPS below for some thoughts on reporting results of the best model(s).

### Is AIC the right criterion?

* For small data, you should use AICc -- the small sample correction which
provides greater penalty for each parameter but approaches AIC as $n$ becomes
large. If it makes a difference, you should use it. (I probably should have
used it above.)
* For large data, consider BIC, which is asymptotically consistent while AIC is
not (see Hastie _et al._ 2009, which is available online). AIC typically
favors overly-complex models with large $n$ relative to BIC.
* For Bayesian models, consider [WAIC or
(instead of DIC, which has issues with non-Gaussian posteriors McElreath
* Don't use information criteria for model selection between GLMs with
different link functions

### Further Reading & References

* [Ben Bolker on reporting results in the text](
* Burnham, Kenneth P., and David R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York, NY, USA.
* [Hastie, Tibshirani, and Friedman. 2009]( Elements of Statistical Learning, Second Edition. Springer, New York, NY, USA.
* [McElreath. 2015.]( Statistical Rethinking. CRC Press.
* [Brian O'Meara's AIC tutorial](

### PS: Nested models

Reporting the residual deviance and degrees of freedom as above is relatively
similar to R's output for conducting an ANOVA on a GLM (where you can
optionally add a statistical test). For nested models, you may as well just
do this and report the table:

m3.anova <- anova(m3, test="Chisq")
round(m3.anova, digits = 4)

### PPS: Evaluating the best model

For the best model(s), you should then go on to report model results in the
text, as
[Ben Bolker demonstrates](,
and visualize the fit relative to the data.

To visualize the admissions data and mean fits from models 2 and 3 (which have
approximately equal AIC weight), we can use `ggplot2` and `augment` from broom
(which adds model predictions and statistics to the data):

```{r, model-viz, fig.cap="Admissions (open circles, size indicates total applicants) versus deparatment and predictions from model 2 (department only, diamonds) and model 3 (department and gender, dots and triangles)"}
m3.pred <- broom::augment(m3)
m2.pred <- broom::augment(m2)
ggplot(d) +
geom_point(aes(Dept, Admitted / (Admitted + Rejected), color=Gender, size=Admitted + Rejected)) +
geom_point(aes(Dept, plogis(.fitted)), data =m2.pred, shape =5, alpha=0.4) + geom_point(aes(Dept, plogis(.fitted), shape=Gender), alpha = 0.4, data = m3.pred) +
theme_minimal() + scale_color_manual(values=c("blue", "orange")) + guides(shape=FALSE)

Admissions (colored dots, size indicates total applicants) versus department
and predictions from model 2 (department only, diamonds) and model 3
(department and gender, dots and triangles)

#### Simpson's paradox

Comparing model 3 with model 1 illustrates Simpson's paradox. Without
accounting for department, the apparent effect of female gender on admission is
negative (female odds relative to male `r exp(coef(m1)['GenderFemale'])`, $p
\approx 0$), whereas after accounting for department, the effect is positive
(but weak: female odds relative to male `r exp(coef(m3)['GenderFemale'])`, $p
\approx$ `r round(broom::tidy(m3)[7, "p.value"], 2)`).

4 comments on commit b25af17


This comment has been minimized.

Copy link

bbolker replied Oct 27, 2015

I like this - I definitely like encouraging people to report relative AICs! - but (surprise!) have a couple of comments.

  • it's nice to remind/advise people about appropriate degrees of precision (anything on the log-likelihood scale such as AICs or deviances should be reported at most to the nearest 0.1; degrees of freedom are integers). Unfortunately it's a bit hard in R to format different columns in a table to different precisions -- you have to turn each column into a character format and print the whole table without quotes. emdbook::AICtab tries to do some of this.
  • Please advise people to be very very very cautious about reporting unconditional results of the best model; the recommendations I give in my book chapter are not considering a post-model-selection context. In particular, any confidence intervals derived from the best model without considering among-model variation will be overly narrow (and of course the p-values will be too small).

This comment has been minimized.

Copy link

ashander replied Oct 27, 2015

Thanks Ben! Especially for the reminder on AICtab (in bbmle, as it turns out) -- definitely the easiest way to get a best-practices table to which I could just add deviances. I'll update this.

I'll also revise my mention of your book chapter recommendations. Is it fair to say that your chapter approaches inference more from a standpoint of "model checking and improvement (as opposed to testing multiple working hypotheses through model selection)"?


This comment has been minimized.

Copy link

bbolker replied Oct 27, 2015


This comment has been minimized.

Copy link

ashander replied Oct 27, 2015

+1 for flagging incautious use of "testing" there

Thanks again for your feedback,

Please sign in to comment.