-
Notifications
You must be signed in to change notification settings - Fork 2
/
index.qmd
249 lines (186 loc) · 10.3 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
---
toc-depth: 2
nocite: |
@*
---
# Preface {.unnumbered}
```{r}
#| echo: false
source("_common.R")
```
::: {.callout-caution}
This book is still under construction and may not yet be fully self-contained or reproducible. But it hopefully will be!
:::
This book describes some of the functionality of the
`{epiprocess}` and `{epipredict}` R packages, with an eye toward creating various types of signal processing and forecast creation for epidemiological data. The goal is to be able to load, inspect, process, and forecast
--- using simple baselines to more elaborate customizations.
## Installation {#sec-installation}
The following commands install the latest versions of the packages we use in this book:
```{r, eval = FALSE}
# install.packages("pak")
# Install our packages from GitHub:
pak::pkg_install("cmu-delphi/epidatr")
pak::pkg_install("cmu-delphi/epiprocess")
pak::pkg_install("cmu-delphi/epipredict")
pak::pkg_install("cmu-delphi/epidatasets")
# Other model-fitting packages we use in this book (via epipredict):
pak::pkg_install("poissonreg")
pak::pkg_install("ranger")
pak::pkg_install("xgboost")
# Other data processing, model evaluation, example data, and other packages we
# use in this book:
pak::pkg_install("RcppRoll")
pak::pkg_install("tidyverse")
pak::pkg_install("tidymodels")
pak::pkg_install("broom")
pak::pkg_install("performance")
pak::pkg_install("modeldata")
pak::pkg_install("see")
pak::pkg_install("sessioninfo")
```
Much of the data used for illustration can be loaded directly from [Delphi's Epidata API](https://cmu-delphi.github.io/delphi-epidata/) which is built and maintained by the Carnegie Mellon University [Delphi research group](https://delphi.cmu.edu/). We have tried to provide most of the data used in these examples in a separate package, `{epidatasets}`, but it can also be accessed using `{epidatr}`, an R interface to the API and the successor to [`{covidcast}`](https://cmu-delphi.github.io/covidcast/covidcastR/). These are also available from GitHub:
```{r}
#| eval: false
pak::pkg_install("cmu-delphi/epidatasets")
pak::pkg_install("cmu-delphi/epidatr")
```
<details> <summary> Encountering installation issues? Click here to show some potential solutions. </summary>
### Linux installation issues: compilation errors or slowness
If you are using Linux and encounter any compilation errors above, or if
compilation is taking very long, you might try using the RStudio (now called
Posit) Package Manager to install binaries. You can try running this command
```{r, eval = FALSE}
options(
repos = c(
# contains binaries for Linux:
RSPM = "https://packagemanager.rstudio.com/all/latest",
# backup CRAN mirror of your choice:
CRAN = "https://cran.rstudio.com/"
)
)
```
### Reproducibility
The above commands will give you the current versions of the packages used in
this book. If you're having trouble reproducing some of the results, it may be
due to package updates that took place after the book was last updated. To match
the versions we used to generate this book, you can use the steps below.
#### First: set up and store a GitHub PAT
If you don't already have a GitHub PAT, you can use the following helper functions to create one:
```{r}
# Run this once:
install.packages("usethis")
usethis::create_github_token(
scopes = "public_repo",
description = "For public repo access"
)
```
This will open a web browser window allowing you to describe and customize
settings of the PAT. Scroll to the bottom and click "Generate
token". You'll see a screen that has `ghp_<lots of letters and numbers>` with a green background; you can click the two-squares ("copy") icon to copy this `ghp_......` string to the clipboard.
#### Either A: Download and use the `renv.lock`
```{r, eval = FALSE}
# Run this once:
install.packages(c("renv", "gitcreds"))
download.file("https://raw.githubusercontent.com/cmu-delphi/delphi-tooling-book/main/renv.lock", "delphi-tooling-book.renv.lock")
# Run this in a fresh session each time you'd like to use this set of versions.
# Warning: don't save your GitHub PAT in a file you might share with others;
# look into `gitcreds::gitcreds_set()` or `usethis::edit_r_environ()` instead.
Sys.setenv("GITHUB_PAT" = "ghp_............")
renv::use(lockfile = "delphi-tooling-book.renv.lock")
# If you get 401 errors, you may need to regenerate your GitHub PAT or check if
# `gitcreds::gitcreds_get()` is detecting an old PAT you have saved somewhere.
```
#### Or B: Download the book and use its `.Rprofile`
1. Download the book [here](https://github.com/cmu-delphi/delphi-tooling-book/archive/refs/heads/main.zip) and unzip it.
2. One-time setup: launch R inside the delphi-tooling-book directory (to use its
`.Rprofile` file) and run
```{r, eval = FALSE}
# Warning: don't save your GitHub PAT in a file you might share with others;
# look into `gitcreds::gitcreds_set()` or `usethis::edit_r_environ()` instead.
Sys.setenv("GITHUB_PAT" = "ghp_............")
renv::restore() # downloads the appropriate package versions
```
3. To use this set of versions: launch R inside the delphi-tooling-book directory.
### Other issues
Please let us know! You can file an issue with the book [here](https://github.com/cmu-delphi/delphi-tooling-book/issues), or with one of the individual packages at their own issue pages: [epidatr](https://github.com/cmu-delphi/epidatr/issues), [epiprocess](https://github.com/cmu-delphi/epiprocess/issues), [epipredict](https://github.com/cmu-delphi/epipredict/issues).
</details>
## Documentation
You can view the complete documentation for these packages at
* <https://cmu-delphi.github.io/epipredict>,
* <https://cmu-delphi.github.io/epiprocess>,
* <https://cmu-delphi.github.io/epidatasets>,
* <https://cmu-delphi.github.io/epidatr>.
## Attribution
This document contains a number of datasets that are a modified part of the [COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University](https://github.com/CSSEGISandData/COVID-19) as [republished in the COVIDcast Epidata API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html). These data are licensed under the terms of the [Creative Commons Attribution 4.0 International license](https://creativecommons.org/licenses/by/4.0/) by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering. Copyright Johns Hopkins University 2020.
[From the COVIDcast Epidata API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html):
These signals are taken directly from the JHU CSSE [COVID-19 GitHub repository](https://github.com/CSSEGISandData/COVID-19) without changes.
## Quick-start example
These packages come with some built-in historical data for illustration, but
up-to-date versions could be downloaded with the
[`{epidatr}`](https://cmu-delphi.github.io/epidatr) or
[`{covidcast}`](https://cmu-delphi.github.io/covidcast/covidcastR/index.html)
packages and processed using
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/).[^index1]
[^index1]: COVIDcast data and other epidemiological signals for non-Covid related illnesses are available with [`{epidatr}`](https://cmu-delphi.github.io/epidatr), which interfaces directly to Delphi's [Epidata API](https://cmu-delphi.github.io/delphi-epidata/).
```{r epidf, message=FALSE}
library(epipredict)
jhu <- case_death_rate_subset
jhu
```
To create and train a simple auto-regressive forecaster to predict the death rate two weeks into the future using past (lagged) deaths and cases, we could use the following function.
```{r make-forecasts, warning=FALSE}
two_week_ahead <- arx_forecaster(
jhu,
outcome = "death_rate",
predictors = c("case_rate", "death_rate"),
args_list = arx_args_list(
lags = list(case_rate = c(0, 1, 2, 3, 7, 14), death_rate = c(0, 7, 14)),
ahead = 14
)
)
```
In this case, we have used a number of different lags for the case rate, while only using 3 weekly lags for the death rate (as predictors). The result is both a fitted model object which could be used any time in the future to create different forecasts, as well as a set of predicted values (and prediction intervals) for each location 14 days after the last available time value in the data.
```{r print-model}
two_week_ahead$epi_workflow
```
The fitted model here involved preprocessing the data to appropriately generate lagged predictors, estimating a linear model with `stats::lm()` and then postprocessing the results to be meaningful for epidemiological tasks. We can also examine the predictions.
```{r show-preds}
two_week_ahead$predictions
```
The results above show a distributional forecast produced using data through the end of 2021 for the 14th of January 2022. A prediction for the death rate per 100K inhabitants is available for every state (`geo_value`) along with a 90% predictive interval. The figure below
displays the forecast for a small handful of states. The vertical black line is the forecast date. The forecast doesn't appear to be particularly good, but our choices above were intended to be illustrative of the functionality rather than optimized for accuracy.
```{r}
#| code-fold: true
samp_geos <- c("ca", "co", "ny", "pa")
hist <- jhu %>%
filter(geo_value %in% samp_geos,
time_value >= max(time_value) - 90L)
preds <- two_week_ahead$predictions %>%
filter(geo_value %in% samp_geos) %>%
mutate(q = nested_quantiles(.pred_distn)) %>%
unnest(q) %>%
pivot_wider(names_from = tau, values_from = q)
ggplot(hist, aes(color = geo_value)) +
geom_line(aes(time_value, death_rate)) +
theme_bw() +
geom_errorbar(data = preds, aes(x = target_date, ymin = `0.05`, ymax = `0.95`)) +
geom_point(data = preds, aes(target_date, .pred)) +
geom_vline(data = preds, aes(xintercept = forecast_date)) +
scale_colour_viridis_d(name = "") +
scale_x_date(date_labels = "%b %Y") +
theme(legend.position = "bottom") +
labs(x = "", y = "Incident deaths per 100K\n inhabitants")
```
## Contents
The remainder of this book examines this software in more detail, illustrating some of the flexibility that is available.
---
<details> <summary> Session Information. </summary>
See also @sec-installation.
```{r}
sessioninfo::session_info()
```
</details>
```{r include=FALSE}
# automatically create a bib database for R packages
knitr::write_bib(c(.packages()), 'packages.bib')
```