content/notes/model-selection-GLMs-AIC-what-to-report.Rmd

---
title: So, you did some GLMs & compared with AIC. Congrats!
author: Jaime Ashander
date: 2015-10-26
tags: R, statistics, model selection, GLM, AIC, deviance
output:
    md_document:
        fig_caption: true
        variant: markdown_strict
        dev: 'svg'
    encoding: 'UTF-8'
---

```{r knit-opts, echo=FALSE}
## important to let caching work between local render and building w/ pelican
knitr::opts_chunk$set(cache = 1)
options(digits=2)
```


Here's what you need to report in a paper about the model comparison:

- residual deviance
- residual df
- delta AIC
- AIC weight

You should also report the null deviance and degrees of freedom,
maybe in a table caption.

Thanks to [Emilio Bruna](http://brunalab.org/) for prompting this post and
suggesting its title.
Below I'll do a worked example, explain why you should include at least these
numbers, and mention some situations where it's better to use something other
than AIC.


### Example: UC Berkeley admissions and gender

Let's look at the built-in `UCBAdmissions` data. As R
will tell you, these data are often used to illustrate Simpson's paradox (see
`?UCBAdmissions` and Bickel _et al._ 1975 or my PPS below).

```{r, admit-data}
d <- as.data.frame(UCBAdmissions)
d <- tidyr::spread(d, Admit, Freq) # use Hadley's excellent tidyr to reshape
d[order(d$Dept), ]
```

Using logistic regression, encode several models for the effect of applicant
gender, department identity, or both on admission.

```{r, glm-example}
m1 <- glm(cbind(Admitted, Rejected) ~ Gender, d, family='binomial')
m2 <- glm(cbind(Admitted, Rejected) ~ Dept, d, family = 'binomial')
m3 <- glm(cbind(Admitted, Rejected) ~ Dept + Gender, d, family = 'binomial')
```

(Note: although we might like to allow for an interaction between gender and
department, the data here are insufficient to fit such a model.)

Running `summary` on any one of the fits yields a bunch of stats: AIC, Residual
and null deviance, as well as coefficients, their standard errors, and
significance.

### What to report

We could type by hand the AIC and other stats. Instead, we'll use [David Robinson](http://varianceexplained.org/)'s `broom` which gives tidy summaries of model objects.

```{r, broom-it-up}
summary.table <- do.call(rbind, lapply(list(m1, m2, m3), broom::glance))
summary.table[["model"]] <- 1:3
```

If we take a look at `summary.table`, we'll see it has all the ingredients we might like to report from model selection, whether via AIC, BIC, or just the deviance. These are, in order, `r names(summary.table)`.

Creating a table with our own desired columns in an appropriate order is easy.

```{r, reordered-subset}
table.cols <- c("model", "df.residual", "deviance", "AIC")
reported.table <- summary.table[table.cols]
names(reported.table) <- c("Model", "Resid. Df", "Resid. Dev", "AIC")
```

For model selection, a model's AIC is only meaningful relative to that of other
models, so Akaike and others recommend reporting differences in AIC from the
best model, $\Delta$_AIC_, and AIC weights _wAIC_. The latter can be viewed as
an estimate of the probability a model gives the best predictions on new data
(conditional on the models considered; Burnham and Anderson 2002, McElreath
2015).

```{r, delta-and-weights}
reported.table[['dAIC']] <-  with(reported.table, AIC - min(AIC))
reported.table[['wAIC']] <- with(reported.table, exp(- 0.5 * dAIC) / sum(exp(- 0.5 * dAIC)))
reported.table$AIC <- NULL
```


With this advice in mind, here's *a minimal table for reporting model selection on GLMs*:

Caption: Model selection for the effect gender (model 1), department (model 2),
and both gender and department (model 3) on admission probability with `r unique(summary.table$df.null)`
null degrees of freedom and `r unique(summary.table$null.deviance)` null deviance.
```{r,model-selection-table}
reported.table
```

Even if using an information criterion, include the residual deviance and
degrees of freedom for each model. These provide a (rough) goodness of fit
measure for each model.

Of course, model selection is just the beginning of reporting your results. See
the PPS below for some thoughts on reporting results of the best model(s).


### Is AIC the right criterion?

* For small data, you should use AICc -- the small sample correction which
  provides greater penalty for each parameter but approaches AIC as $n$ becomes
  large. If it makes a difference, you should use it. (I probably should have
  used it above.)
* For large data, consider BIC, which is asymptotically consistent while AIC is
  not (see Hastie _et al._ 2009, which is available online). AIC typically
  favors overly-complex models with large $n$ relative to BIC.
* For Bayesian models, consider [WAIC or
  LOO](http://andrewgelman.com/2015/07/16/new-papers-on-loowaic-and-stan/)
  (instead of DIC, which has issues with non-Gaussian posteriors McElreath
  2015).
* Don't use information criteria for model selection between GLMs with
  different link functions

### Further Reading & References

* [Ben Bolker on reporting results in the text](http://ms.mcmaster.ca/~bolker/misc/GLMM_results_report.pdf)
* Burnham, Kenneth P., and David R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York, NY, USA.
* [Hastie, Tibshirani, and Friedman. 2009](http://statweb.stanford.edu/~tibs/ElemStatLearn/) Elements of Statistical Learning, Second Edition. Springer, New York, NY, USA.
* [McElreath. 2015.](https://www.crcpress.com/Statistical-Rethinking-A-Bayesian-Course-with-Examples-in-R-and-Stan/McElreath/9781482253443) Statistical Rethinking. CRC Press.
* [Brian O'Meara's AIC tutorial](http://brianomeara.info/tutorials/aic/)

### PS: Nested models

Reporting the residual deviance and degrees of freedom as above is relatively
similar to R's output for conducting an ANOVA on a GLM (where you can
optionally add a statistical test). For nested models, you may as well just
do this and report the table:

```{r,best-model-anova-table}
m3.anova <- anova(m3, test="Chisq")
round(m3.anova, digits = 4)
```


### PPS: Evaluating the best model

For the best model(s), you should then go on to report model results in the
text, as
[Ben Bolker demonstrates](http://ms.mcmaster.ca/~bolker/misc/GLMM_results_report.pdf),
and visualize the fit relative to the data.

To visualize the admissions data and mean fits from models 2 and 3 (which have
approximately equal AIC weight), we can use `ggplot2` and `augment` from broom
(which adds model predictions and statistics to the data):

```{r, model-viz, fig.cap="Admissions (open circles, size indicates total applicants) versus deparatment and predictions from model 2 (department only, diamonds) and model 3 (department and gender, dots and triangles)"}
m3.pred <- broom::augment(m3)
m2.pred <- broom::augment(m2)
library(ggplot2)
ggplot(d) +
    geom_point(aes(Dept, Admitted / (Admitted + Rejected), color=Gender, size=Admitted + Rejected)) +
    geom_point(aes(Dept, plogis(.fitted)), data =m2.pred, shape =5, alpha=0.4) + geom_point(aes(Dept, plogis(.fitted), shape=Gender), alpha = 0.4, data = m3.pred) +
    theme_minimal() + scale_color_manual(values=c("blue", "orange")) + guides(shape=FALSE)
```

Admissions (colored dots, size indicates total applicants) versus department
and predictions from model 2 (department only, diamonds) and model 3
(department and gender, dots and triangles)

#### Simpson's paradox

Comparing model 3 with model 1 illustrates Simpson's paradox. Without
accounting for department, the apparent effect of female gender on admission is
negative (female odds relative to male `r exp(coef(m1)['GenderFemale'])`, $p
\approx 0$), whereas after accounting for department, the effect is positive
(but weak: female odds relative to male `r exp(coef(m3)['GenderFemale'])`, $p
\approx$ `r round(broom::tidy(m3)[7, "p.value"], 2)`).