Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time
title: Using DHARMa for residual checks of unsupported models
author: Ariel Muldoon
date: '2017-12-21'
slug: using-dharma-for-residual-checks-of-unsupported-models
categories: [r, statistics]
tags: [glmm, dharma, simulation]
description: "Checking for model fit from generalized linear mixed models (GLMM) can be challenging. The DHARMa package helps with this by giving simulated residuals but doesn't work with all model types. I show how to use tools in DHARMa to extend it for use with unsupported models fit with glmmTMB() and zeroinfl()."
```{r setup, echo = FALSE, message = FALSE, purl = FALSE}
knitr::opts_chunk$set(comment = "#")
filename = 'render_toc.R')
One of the difficult things about working with generalized linear models (GLM) and generalized linear mixed models (GLMM) is figuring out how to interpret residual plots. We don't really expect residual plots from a GLMM to look like one from a linear model, sure, but how do we tell when something looks "bad"?
This is the situation I was in several years ago, working on an analysis involving counts from a fairly complicated study design. I was using a negative binomial generalized linear mixed models, and the residual vs fitted values plot looked, well, "funny". But I wasn't sure if something was wrong or if this was just how the residuals from a complicated model like this looked sometimes. At this point I dipped my toe into using simulations for model checking for the first time.
## Table of Contents
```{r toc, echo = FALSE, purl = FALSE}
# Why use simulations for model checking?
Here is an example of a plot I wasn't feeling too good about but also wasn't certain if what I was seeing indicated a lack of fit.
```{r badplot, echo = FALSE, purl = FALSE}
To try to figure out if what I was seeing was a problem, I fit models to response data simulated from my model. The beauty of such simulations is that I know that the model definitely *does* fit the simulated response data; the model is what created the data! I compared residuals plots from simulated data models to my real plot to help decide if what I was seeing was unusual. I looked at a fair number of simulated residual plots and decided that, yes, something was wrong with my model. I ended up moving on to a different model that worked better.
Here is an example of one of the simulated residual plots. There was definitely variation in plots from models fit to the simulated data, but this is a good example of what they generally looked like.
```{r goodplot, echo = FALSE, purl = FALSE}
# The DHARMa package
I found my "brute force" simulation approach useful, but I spent a lot of time visually comparing the simulated plots to my real plot. I didn't have a metric to help me decide if my actual residual plot seemed unusual compared to residual plots from my "true" models. This left me unable to recommend this as a general approach to folks I consult with.
Since then, the author of the [**DHARMa** package]( has come up with a clever way to use a simulation-based approach for residuals checks of GLMM's. If you are interested in trying the package out, it has [a very nice vignette]( to get you started.
These days I'm been happily recommending the **DHARMa** packages to students I work with for residual checks of GLMM's. However, students aren't always working with models that **DHARMa** currently supports. Luckily, **DHARMa** can simulate residuals for any model as long as the user can provide simulated values to the `createDHARMa()` function. Below I show how to do this for a couple of different situations.
# How to use createDHARMa()
In order to use `createDHARMa()`, the user needs to provide three pieces of information.
+ Simulated response vectors
+ Observed data
+ The predicted response from the model
I thinking making the simulated response vectors is the biggest bottleneck for folks I've worked with, and that is what I'm focusing on today.
# Simulations for models that have a simulate() function
When a function has a `simulate()` function, getting the simulations needed to use `createDHARMa()` can be comparatively straightforward.
## Example using glmmTMB()
The `glmmTMB()` function from package **glmmTMB** is one of those models that **DHARMa** doesn't currently support. (*2018-04-05 update: the development version of DHARMA [now supports glmmTMB objects for glmmTMB 0.2.1]( I believe the example below is still useful for showing how to work with DHARMa-unsupported model types that have a simulate() function.*)
There is a `simulate()` function for glmmTMB objects.
```{r, warning = FALSE, message = FALSE}
library(DHARMa) # version 0.1.5
library(glmmTMB) # version 0.1.1
I'm going to use one of the models from the **glmmTMB** documentation to demonstrate how to make the simulations and then use them in `createDHARMa()`.
Below I fit zero-inflated negative binomial model with `glmmTMB()`.
fit_nbz = glmmTMB(count ~ spp + mined + (1|site),
zi = ~spp + mined,
family = nbinom2, data = Salamanders)
If you try to calculate the scaled residuals via **DHARMa** functions for an unsupported model, you will get a warning and then an error. **DHARMa** attempts to make predictions from the model to simulate with, but it will then fail.
This is an indication that you'd to use `createDHARMa()` to make the residuals instead.
I can simulate from my model via the `simulate()` function (see the documentation for `?simulate.glmmTMB` for details). I usually do at least 1000 simulations if it isn't too time-consuming. I'll do only 10 here as an example.
sim_nbz = simulate(fit_nbz, nsim = 10)
The result is a list of simulated response vectors.
I need these in a matrix, not a list, where each column contains a simulated response vector and each row is an observation. I'll collapse the list into a matrix by using `cbind()` in ``.
sim_nbz =, sim_nbz)
Now I can pass these to `createDHARMa()` along with observed values and model predictions. I set `integerResponse` to `TRUE`, as well, as I'm working with counts.
sim_res_nbz = createDHARMa(simulatedResponse = sim_nbz,
observedResponse = Salamanders$count,
fittedPredictedResponse = predict(fit_nbz),
integerResponse = TRUE)
This function creates the scaled residuals that can be used to make residual plots via **DHARMa**'s `plotSimulatedResiduals()`. (In current versions of **DHARMa** just use `plot()` instead of `plotSimulatedResiduals()`.)
Remember I've only done 10 simulations here, so this particular set of plots don't look very nice.
```{r, message = FALSE, warning = FALSE}
# Simulations for models without a simulate() function
The `simulate()` function did most of the heavy lifting for me in the `glmmTMB` example. Not all models have a `simulate()` function, though. This doesn't mean I can't use **DHARMa**, but it does mean I have to put more effort in up front.
I will do the following simulations"by hand" in R, more-or-less following the method shown [in this answer on CrossValidated](
## Example using zeroinfl()
The `zeroinfl()` function from package **pscl** is an example of a model that doesn't have a `simulate()` function and is unsupported by **DHARMa**. I will use this in my next example.
I will also load package **VGAM**, which has a function for making random draws from a zero-inflated negative binomial distribution.
```{r, message = FALSE}
library(pscl) # version 1.5.2
library(VGAM) # version 1.0-4
I will use a `zeroinfl()` documentation example in this section. The zero-inflated negative binomial model is below.
fit_zinb = zeroinfl(art ~ . | 1,
data = bioChemists,
dist = "negbin")
In order to make my own simulations, I'll need both the model-predicted count and the model-predicted probability of a 0 for each observation in the dataset. I'll also need an estimate of the negative binomial dispersion parameter, $\theta$.
The `predict()` function for `zeroinfl` models lets the user define the kind of prediction desired. I use `predict()` twice, once to extra the predicted counts and once to extract the predicted probability of 0.
# Predicted probabilities
p = predict(fit_zinb, type = "zero")
# Predicted counts
mus = predict(fit_zinb, type = "count")
I can pull $\theta$ directly out of the model output.
Now that I have these, I can make random draws from a zero-inflated negative distribution using `rzinegbin` from package **VGAM**.
It took me awhile to figure out which arguments I needed to use in `rzinegbin`. I need to provide the predicted counts to the `"munb"` argument and the predicted probabilities to the `"pstr0"` argument. The `"size"` argument is the estimate of the dispersion parameter. And the `"n"` arguments indicates the number of simulated values needed, in this case the same number as the rows in the original dataset.
sim1 = rzinegbin(n = nrow(bioChemists),
size = fit_zinb$theta,
pstr0 = p,
munb = mus)
I use `replicate()` to draw more than one simulated vector. The first argument of `replicate()`, `"n"`, indicates the number of times to evaluate the given expression. Here I make 10 simulated response vectors.
sim_zinb = replicate(10, rzinegbin(n = nrow(bioChemists),
size = fit_zinb$theta,
pstr0 = p,
munb = mus) )
The output of `replicate()` in this case is a matrix, with one simulated response vector in every column and an observation in every row. This is ready for use in `createDHARMa()`.
Making the simulations was the hard part. Now that I have them, `createDHARMa()` works exactly the same way as in the `glmmTMB` example.
sim_res_zinb = createDHARMa(simulatedResponse = sim_zinb,
observedResponse = bioChemists$art,
fittedPredictedResponse = predict(fit_zinb, type = "response"),
integerResponse = TRUE)
```{r, warning = FALSE, message = FALSE}
# Just the code, please
```{r getlabels, echo = FALSE, purl = FALSE}
labs = knitr::all_labels()
labs = labs[!labs %in% c("setup", "toc", "getlabels", "allcode", "makescript", "badplot", "goodplot")]
```{r makescript, include = FALSE, purl = FALSE}
options(knitr.duplicate.label = "allow") # Needed to purl like this
input = knitr::current_input() # filename of input document
output = here::here("static", "script", paste(tools::file_path_sans_ext(input), "R", sep = ".") )
knitr::purl(input, output, documentation = 0, quiet = TRUE)
Here's the code without all the discussion. Copy and paste the code below or you can download an R script of uncommented code [from here](/script/2017-12-21-using-dharma-for-simulated-residual-checks-of-unsupported-models.R).
```{r allcode, ref.label = labs, eval = FALSE, purl = FALSE}