-
Notifications
You must be signed in to change notification settings - Fork 0
/
calibration.Rmd
177 lines (140 loc) · 7.2 KB
/
calibration.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
title: "Calibrating Compartmental Models to Data"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{Calibrating Compartmental Models to Data}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
[![status](https://img.shields.io/badge/status-working%20draft-red)](https://canmod.github.io/macpan2/articles/vignette-status#working-draft)
```{r opts, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
fig.width = 6,
fig.height = 4,
comment = "#>"
)
```
```{r setup, message = FALSE, warning = FALSE}
library(macpan2)
library(ggplot2)
library(dplyr)
library(broom.mixed)
options(macpan2_verbose = FALSE)
```
Before reading this article on calibrating models to data, please first look at the [quickstart guide](https://canmod.github.io/macpan2/articles/quickstart) and the article on the [model library](https://canmod.github.io/macpan2/articles/example_models).
## Hello World
We'll do the first thing you should always do when trying out a new fitting procedure: simulate clean, nice data from the model and see if you can recover something close to the true parameters.
### Step 0: set up simulator and generate 'data'
We will be using several different versions of the SIR model, all of which can be derived from the SIR specification in the model library.
```{r sir_spec}
sir_spec = mp_tmb_library("starter_models"
, "sir"
, package = "macpan2"
)
print(sir_spec)
```
From this specification we derive our first version of the model, which we use to generate synthetic data to see if optimization can recover the parameters that we use when simulating.
```{r sir_setup}
sir_simulator = mp_simulator(sir_spec
, time_steps = 100
, outputs = c("S", "I", "R")
, default = list(N = 300, R = 100, beta = 0.25, gamma = 0.1)
)
sir_results = mp_trajectory(sir_simulator)
(sir_results
|> ggplot(aes(time, value, colour = matrix))
+ geom_line()
)
```
Note that we changed the default values so that we can try to recover them
using optimization below.
To make things a little more challenging we add some Poisson noise to the prevalence (`I`) value:
```{r sir_noise}
set.seed(101)
sir_prevalence = (sir_results
|> dplyr::select(-c(row, col))
|> filter(matrix == "I")
|> rename(true_value = value)
|> mutate(value = rpois(n(), true_value))
)
plot_truth <- ggplot(sir_prevalence, aes(time)) +
geom_point(aes(y = value)) +
geom_line(aes(y = true_value))
print(plot_truth)
```
### Step 1: add calibration information
The next step is to produce an object that can be calibrated through optimization. To make this model we need to specify what trajectory we will fit to (`I` in this case). We also need to specify what parameters we will fit. Any value in the `default` list of a model spec can be selected for fitting. Note that here we only change the default value of `N`, and leave the other parameters where they were in the model spec. It is this difference between the defaults in the simulator versus the calibrator that will will hope to recover using optimization.
```{r calibrator}
sir_calibrator = mp_tmb_calibrator(sir_spec
, data = sir_prevalence
, traj = "I"
, par = c("beta", "R")
, default = list(N = 300)
)
print(sir_calibrator)
```
Note that the calibrator has a few new expressions that deal with comparisons with data. In particular it is the objective function that we will optimize. But before that we can do a sanity check to make sure that the default values give a reasonable-looking trajectory.
```{r plot1}
(sir_calibrator
|> mp_trajectory()
|> ggplot(aes(time, value))
+ geom_line()
)
```
### Step 2: do the fit
Doing the fit is straightforward.
```{r sir_fit, results = "hide"}
mp_optimize(sir_calibrator)
```
Note that the `mp_optimize` function has modified the `sir_calibrator` object, which now contains the new fitted parameter values and the results of the optimization.
### Step 3: check the fit
We can print the results of the optimizer (`nlminb` in this case) using the `mp_optimizer_output` function. **Always check the value of the convergence code** (if it's not 0, then something *may* have gone wrong ...).
```{r check_fit}
mp_optimizer_output(sir_calibrator)
mp_optimize(sir_calibrator)
mp_optimizer_output(sir_calibrator, what="all")
```
As mentioned above, the best-fit parameters are stored internally, and we can get information about
them using the `mp_tmb_coef` function. (Note that if you get a message about the `broom.mixed` package, please install it. `mp_tmb_coef` is a wrapper for this function).
```{r params}
sir_estimates = mp_tmb_coef(sir_calibrator, conf.int = TRUE)
print(sir_estimates)
```
These correspond pretty well to the known true values of the simulation model.
```{r default_reminder}
mp_default(sir_simulator) |> filter(matrix %in% sir_estimates$mat)
```
And the known simulated true value of the trajectory (black line) does in
fact fall within the 95% confidence region (red ribbon).
```{r plot_results}
sim_vals <- (sir_calibrator
|> mp_trajectory_sd(conf.int = TRUE)
|> filter(matrix == "I")
)
(plot_truth
+ geom_line(data = sim_vals
, aes(y = value)
, colour = "red"
)
+ geom_ribbon(data = sim_vals
, aes(ymin = conf.low, ymax = conf.high)
, fill = "red"
, alpha = 0.2
)
)
```
## Statistical Model
Above we were not specific about the statistical model used to fit the data. Here we describe it.
Let the observed and simulated trajectories be vectors $I_\textrm{obs}$ and $I_\textrm{sim}$. The $I$ symbol is chosen because we fitted to prevalence above, but it could be any trajectory in the model. For example, `traj = "infection"` would have fitted to incidence, because the `infection` variable in the model is the number of new cases at every time step.
The simulated trajectories are actually a function of the vector, $\mathbf b$, of default values that we chose to make statistical parameters. Therefore, we write the simulated trajectory as a function, $I_\textrm{sim}(\mathbf b)$.
We assume that the observed trajectory is Poisson distributed with mean given by the simulated trajectory.
$$
I_{\textrm{obs}} \sim \textrm{Poisson}(I_\textrm{sim}(\mathbf b))
$$
Given these assumptions we choose $\mathbf b$ to maximize the resulting likelihood function, and use functionality from the `TMB` package (and sometimes the `tmbstan`/`rstan` packages) to do statistical inference on the fitted parameters and trajectories.
We recognize that this statistical model will often be overly restrictive. The `macpan2` package has a developer interface that is much more flexible, allowing for more detailed control over `TMB`, `tmbstan`, and `rstan`. This interface allows for arbitrary likelihood functions, prior distributions, parameter transformations, flexible parameter time-variation models, random effects and more. See [here](https://canmod.github.io/macpan2/articles/calibration_advanced.html) and [here](https://canmod.github.io/macpan2/articles/time_varying_parameters.html) for more information, although because these guides describe a developer interface the instructions may be unclear to some or many readers. Our plan is to continue adding interface layers, such as the interface described in this vignette, so that more of `macpan2` can be exposed to users.