/
2020-09-20_2020-10-25.Rmd
278 lines (240 loc) · 15.3 KB
/
2020-09-20_2020-10-25.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
---
title: "Orange County, CA COVID Situation Report Sep 20, 2020 - Oct 25, 2020"
site: workflowr::wflow_site
output:
workflowr::wflow_html:
toc: false
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
results_folder <- "code/results/2020-09-20_2020-10-25"
knitr::opts_chunk$set(echo = F, fig.align = "center", fig.width=16, message=F)
knitr::opts_chunk$set(autodep = TRUE)
library(lubridate)
library(fs)
library(tidyverse)
library(tidybayes)
library(scales)
library(glue)
library(stemr)
library(cowplot)
library(coda)
theme_set(theme_bw(base_size = 22))
source("code/stemr_functions.R")
first_day <- ymd(str_sub(results_folder, start = -21, end = -12))
last_day <- ymd(str_sub(results_folder, start = -10))
multi_chain_stem_fit <- read_rds(path(results_folder, "original", ext = "rds"))
popsize <- multi_chain_stem_fit$stem_fit_list[[1]]$dynamics$popsize
forecast_obj <- read_rds(path(results_folder, "forecast", ext = "rds"))
last_forecast_day <- tail(forecast_obj$data$end_date, 1)
oc_data <- read_csv("data/oc_data.csv")
ci_width <- c(0.5, 0.8, 0.95)
ci_width_string <- str_c(percent(ci_width), collapse = ", ")
prev_models <-
tibble(path = dir_ls("code/results")) %>%
mutate(folder = path_file(path)) %>%
separate(folder, c("start_date", "end_date"), sep = "_") %>%
mutate(across(ends_with("date"), ymd)) %>%
filter(end_date <= last_day) %>%
mutate(multi_chain_stem_fit = map(path(path, "original", ext = "rds"), read_rds))
```
## Orange County, CA COVID-19 Situation Report, `r format(last_day + 15, "%B %e, %Y")`
#### Report period: `r format(first_day, "%b %d")` - `r format(last_day, "%b %d")` (we don't use the most recent data due to reporting delays)
The goal of this report is to inform interested parties about dynamics of SARS-CoV-2 spread in Orange County, CA and to predict epidemic trajectories.
Methodological details are provided below and in the accompanying [manuscript](https://arxiv.org/abs/2009.02654). We are also contributing to [COVID Trends by UC Irvine](https://www.stat.uci.edu/covid19/index.html) project that provides data visualizations of California County trends across time and space.
```{r D_I_P plot}
d_i_p_paths <-
forecast_obj$forecast_results %>%
select(starts_with("."), natural_paths) %>%
mutate(natural_paths = map(natural_paths, as_tibble)) %>%
unnest(natural_paths) %>%
select(-starts_with(".")) %>%
mutate(incidence = popsize - S,
prevalence = E + Ie + Ip) %>%
select(time, D, incidence, prevalence) %>%
pivot_longer(-time) %>%
group_by(time, name) %>%
median_qi(.width = ci_width) %>%
left_join(select(forecast_obj$data, time, date = end_date)) %>%
ungroup() %>%
select(date, everything(), -time)
d_i_p_data <-
oc_data %>%
mutate(D = cumsum(deaths),
incidence = cumsum(cases)) %>%
select(date, D, incidence) %>%
filter(date %in% unique(d_i_p_paths$date)) %>%
pivot_longer(-date)
D_adjustment <- filter(d_i_p_data, name == "D", date == min(date))$value - filter(d_i_p_paths, .width == 0.8, name == "D", date == min(date))$.lower
ggplot(mapping = aes(date, value)) +
facet_wrap(. ~ name, scales = "free_y", strip.position = "left", labeller = labeller(name = c("D" = "Cumulative Deaths", "incidence" = "Cumulative Incidence", "prevalence" = "Prevalence"))) +
geom_lineribbon(
data = d_i_p_paths %>%
mutate(value = value + (name == "D") * D_adjustment,
.lower = .lower + (name == "D") * D_adjustment,
.upper = .upper + (name == "D") * D_adjustment) %>%
filter(name %in% c("prevalence", "incidence")),
mapping = aes(ymin = .lower, ymax = .upper), size = 1.5,
color = "steelblue4"
) +
geom_col(
data = filter(d_i_p_data, name %in% c("prevalence", "incidence")),
fill = "black"
) +
ggtitle(glue("Latent & observed trajectories, posterior median & {ci_width_string} credible intervals")) +
scale_fill_brewer(name = "Credibility level", labels = str_c(percent(rev(ci_width))), guide = guide_legend(title.position = "top", direction = "horizontal")) +
scale_y_continuous(name = NULL, labels = comma) +
scale_x_date(name = "Date", breaks = c("2 week"), date_labels = "%b\n%e") +
theme(
strip.background = element_blank(),
strip.placement = "outside",
strip.text = element_text(size = 22),
legend.position = c(0.1, 0.9), legend.title = element_text(size = 15.5), legend.text = element_text(size = 15.5), legend.background = element_rect(fill = "transparent")
)
```
```{r}
ifr_R0_samples <-
prev_models %>%
mutate(model_number = rev(row_number())) %>%
group_by(month = month(start_date)) %>%
mutate(model_within_month = row_number()) %>%
ungroup() %>%
filter((model_number %in% 1:3) | model_within_month == 1) %>% # Show last 3 models + 1 from each prior month
mutate(parameter_samples_human = map(multi_chain_stem_fit, ~extract_stem_parameter_posterior(., transform = to_human_scale))) %>%
mutate(parameter_samples_human = map(parameter_samples_human, tidy_draws)) %>%
select(ends_with("date"), parameter_samples_human) %>%
unnest(parameter_samples_human) %>%
select(ends_with("date"), starts_with("."), R0, ifr) %>%
mutate(label = fct_inorder(glue("{format(start_date, '%b %e')} - {format(end_date, '%b %e')}")))
# ifr_R0_samples %>%
# ggplot(aes(label, ifr)) +
# stat_eye(.width = ci_width, fill="lightskyblue1", slab_color ="lightskyblue4", color="dodgerblue4", slab_size = 0.5) +
# ggtitle("Historical Estimates of Infection-to-Fatality Ratio (IFR)") +
# scale_y_continuous(name = "IFR", label = percent) +
# xlab("Model Fit Period")
#
# ifr_R0_samples %>%
# ggplot(aes(label, R0)) +
# stat_eye(.width = ci_width) +
# stat_eye(.width = ci_width, fill="lightskyblue1", slab_color ="lightskyblue4", color="dodgerblue4", slab_size = 0.5) +
# geom_hline(yintercept = 1, linetype = "dashed") +
# ggtitle(expression(paste('Historical estimates of the basic reproductive number (', R[0], ')'))) +
# scale_y_continuous(name = expression(R[0])) +
# xlab("Model Fit Period")
ifr_R0_intervals <-
ifr_R0_samples %>%
select(date = end_date, R0, ifr) %>%
pivot_longer(-date) %>%
group_by(date, name) %>%
median_qi(.width = ci_width)
ifr_R0_intervals %>%
filter(name == "ifr") %>%
ggplot(aes(date, value, ymin = .lower, ymax = .upper)) +
geom_lineribbon(color = "steelblue4") +
ggtitle("Historical Estimates of Infection-to-Fatality Ratio (IFR)") +
scale_fill_brewer(name = "Credibility level", labels = str_c(percent(rev(ci_width))), guide = guide_legend(title.position = "top", direction = "horizontal")) +
scale_y_continuous(name = "IFR", label = percent) +
scale_x_date(name = "Date", date_breaks = "1 month", date_labels = "%b") +
theme(legend.position = c(0.8, 0.8), legend.title = element_text(size = 15.5), legend.text = element_text(size = 15.5), legend.background = element_rect(fill = "transparent"))
ifr_R0_intervals %>%
filter(name == "R0") %>%
ggplot(aes(date, value, ymin = .lower, ymax = .upper)) +
geom_lineribbon(color = "steelblue4") +
ggtitle(expression(paste('Historical estimates of the basic reproductive number (', R[0], ')'))) +
scale_fill_brewer(name = "Credibility level", labels = str_c(percent(rev(ci_width))), guide = guide_legend(title.position = "top", direction = "horizontal")) +
scale_y_continuous(name = expression(R[0])) +
scale_x_date(name = "Date", date_breaks = "1 month", date_labels = "%b") +
theme(legend.position = c(0.2, 0.8), legend.title = element_text(size = 15.5), legend.text = element_text(size = 15.5), legend.background = element_rect(fill = "transparent"))
```
```{r}
Reff_intervals <-
prev_models %>%
mutate(epi_curves = map(multi_chain_stem_fit, extract_epi_curves)) %>%
select(start_date, end_date, epi_curves) %>%
unnest(epi_curves) %>%
mutate(prop_S = S / popsize) %>%
select(ends_with("date"), starts_with("."), time, prop_S) %>%
left_join(select(ifr_R0_samples, -ifr, -label)) %>%
mutate(Reff = R0 * prop_S) %>%
select(ends_with("date"), time, Reff) %>%
group_by(start_date, end_date, time) %>%
median_qi(.width = ci_width) %>%
ungroup() %>%
left_join(prev_models %>%
mutate(time_date_conversion = multi_chain_stem_fit %>% map("data") %>% map(~select(., time, date = end_date))) %>%
select(contains("date")) %>%
unnest(time_date_conversion)) %>%
select(date, Reff, starts_with(".")) %>%
drop_na()
Reff_intervals %>%
drop_na() %>%
ggplot(aes(date, Reff, ymin = .lower, ymax = .upper)) +
geom_lineribbon(color = "steelblue4") +
geom_hline(yintercept = 1, linetype = "dashed") +
scale_fill_brewer(name = "Credibility level", labels = str_c(percent(rev(ci_width))), guide = guide_legend(title.position = "top", direction = "horizontal")) +
ggtitle(expression(paste('Historical estimates of the effective reproductive number (', R[e], ')'))) +
ylab(expression(R[e])) +
scale_x_date(name = "Date", date_breaks = "1 month", date_labels = "%b") +
theme(legend.position = c(0.2, 0.8), legend.title = element_text(size = 15.5), legend.text = element_text(size = 15.5), legend.background = element_rect(fill = "transparent"))
```
```{r summary prep}
reported_cases <- filter(d_i_p_data, name == "incidence", date == max(date))$value
reported_cases_lower_est <- filter(d_i_p_paths, name == "incidence", date == max(d_i_p_data$date), .width == 0.95)$.lower
reported_cases_upper_est <- filter(d_i_p_paths, name == "incidence", date == max(d_i_p_data$date), .width == 0.95)$.upper
future_cases_lower_est <- filter(d_i_p_paths, name == "incidence", date == max(date), .width == 0.95)$.lower
future_cases_upper_est <- filter(d_i_p_paths, name == "incidence", date == max(date), .width == 0.95)$.upper
prevalence_direction <- ifelse(filter(d_i_p_paths, name == "prevalence", date == min(date))$value[1] > filter(d_i_p_paths, name == "prevalence", date == max(date))$value[1], "decreasing", "increasing")
future_prevalence_lower_est <- filter(d_i_p_paths, name == "prevalence", date == max(date), .width == 0.95)$.lower
future_prevalence_upper_est <- filter(d_i_p_paths, name == "prevalence", date == max(date), .width == 0.95)$.upper
ifr_est <- ifr_R0_samples %>% filter(end_date == max(end_date)) %>% pull(ifr) %>% quantile(c(0.025, 0.975))
R0_est <- ifr_R0_samples %>% filter(end_date == max(end_date)) %>% pull(R0) %>% quantile(c(0.025, 0.975))
Reff_est <- Reff_intervals %>% drop_na() %>% filter(date == max(date), .width == 0.95) %>% select(".lower", ".upper") %>% pivot_longer(everything()) %>% deframe()
```
## Summary (statements are made assuming 95% credibility levels)
- The number of reported cases (`r comma(reported_cases)`, shown as black bars in the top-middle plot above) underestimates the actual number of infections by a factor that ranges between `r round(reported_cases_lower_est / reported_cases, 1)` and `r round(reported_cases_upper_est / reported_cases, 1)`.
This means that we estimate that the total number of infections which occurred by `r format(last_day,'%B %d, %Y')` is between `r comma(round(reported_cases_lower_est))` and `r comma(round(reported_cases_upper_est))`.
We estimate that the total number of infections will be between `r comma(round(future_cases_lower_est))` and `r comma(round(future_cases_upper_est))` on `r format(last_forecast_day,'%B %d, %Y')`.
- Prevalence (number of infectious individuals at any time point) is `r prevalence_direction` and projected to be between `r comma(round(future_prevalence_lower_est))` and `r comma(round(future_prevalence_upper_est))` on `r format(last_forecast_day,'%B %d, %Y')`.
- Somewhere between `r percent(ifr_est[["2.5%"]], accuracy = 0.01)` and `r percent(ifr_est[["97.5%"]], accuracy = 0.01)` of all infections (not cases!) result in death.
- Basic reproductive number ($R_0$), defined as the average number of secondary infections one infectious individual produces in a completely susceptible population, is estimated to be between `r round(R0_est[["2.5%"]], 1)` and `r round(R0_est[["97.5%"]], 1)`.
- Effective reproductive number ($R_e$), defined as its basic counterpart above, but allowing for some fraction of the population to be removed (recovered or deceased), is estimated to be between `r round(Reff_est[[".lower"]], 1)` and `r round(Reff_est[[".upper"]], 1)` on `r format(last_day,'%B %d, %Y')`. **We want to keep $R_e < 1$ in order to control virus transmission**.
<hr style="height:3px;border-width:0;color:gray;background-color:gray">
## Abbreviated technical details (optional)
Our approach is based on fitting a mechanistic model of SARS-CoV-2 spread to multiple sources of surveillance data.
A more fleshed out method description is in the [manuscript](https://arxiv.org/abs/2009.02654).
### Model inputs
Our method takes three time series as input: daily new tests, case counts, and deaths. However, we find daily resolution to be too noisy due to delay in testing reports, weekend effect, etc. So we aggregated/binned the three types of counts in 3 day intervals. These aggregated time series are shown below.
```{r model-inputs, cache=TRUE}
multi_chain_stem_fit$data %>%
dplyr::select(-start_date, -time) %>%
pivot_longer(-end_date) %>%
mutate(name = str_to_title(name)) %>%
mutate(name = fct_relevel(name, c("Tests", "Cases", "Deaths"))) %>%
ggplot(aes(end_date, value)) +
geom_line(size=1) +
geom_point(size=3) +
facet_wrap(. ~ name, scales = "free_y") +
xlab("Date") +
ylab("Count") +
scale_y_continuous(labels = comma) +
ggtitle("Orange County, CA data", subtitle = "Counts binned into 3 day periods") +
scale_x_date(breaks=c("10 day"), date_labels = "%b %d", ) +
theme(strip.text = element_text(size = 20))
```
### Model structure
We assume that all individuals in Orange County, CA can be split into 6 compartments: S = susceptible individuals, E = infected, but not yet infectious individuals, $\text{I}_\text{e}$ = individuals at early stages of infection, $\text{I}_\text{p}$ = individuals at progressed stages of infection (assumed 20% less infectious than individuals at the early infection stage), R = recovered individuals, D = individuals who died due to COVID-19. Possible progressions of an individual through the above compartments are depicted in the diagram below.
```{r Model Structure, out.width = "60%"}
knitr::include_graphics("assets/model_figure.svg", error=FALSE)
```
Mathematically, we assume that dynamics of the proportions of individuals in each compartment follow a set of ordinary differential equations corresponding to the above diagram. These equations are controlled by the following parameters:
- Basic reproductive number ($R_0$)
- mean duration of the latent period
- mean duration of the early infection period
- mean duration of the progressed infection period
- probability of transitioning from progressed infection to death, rather than to recovery (i.e., IFR)
We fit this model to data by assuming that case counts are noisy realizations of the actual number of individuals progressing from $\text{I}_\text{e}$ compartment to $\text{I}_\text{p}$ compartment.
Similarly we assume that observed deaths are noisy realizations of the actual number of individuals progressing from $\text{I}_\text{p}$ compartment to $\text{D}$ compartment.
*A priori*, we assume that death counts are significantly less noisy than case counts.
We use a Bayesian estimation framework, which means that all estimated quantities receive credible intervals (e.g., 80% or 95% credible intervals).
Width of these credible intervals encode the amount of uncertainty that we have in the estimated quantities.