Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,29 +38,49 @@ relativeURLs = false
weight = 2
[[menu.main]]
parent = "covidcast"
name = "Map"
name = "Map Overview"
url = "/covidcast"
weight = 1
[[menu.main]]
parent = "covidcast"
name = "Timelapse"
url = "/covidcast/timelapse"
weight = 2
[[menu.main]]
parent = "covidcast"
name = "Top 10"
url = "/covidcast/top10"
weight = 3
[[menu.main]]
parent = "covidcast"
name = "Single Region"
url = "/covidcast/single"
weight = 4
[[menu.main]]
parent = "covidcast"
name = "Surveys"
url = "/covidcast/surveys"
weight = 2
weight = 5
[[menu.main]]
parent = "covidcast"
name = "Survey Results"
url = "/covidcast/survey-results"
weight = 3
weight = 6
[[menu.main]]
parent = "covidcast"
name = "Export Data"
url = "/covidcast/export"
weight = 7
[[menu.main]]
parent = "covidcast"
name = "Release Log"
url = "/covidcast/release-log"
weight = 4
weight = 8
[[menu.main]]
parent = "covidcast"
name = "Terms Of Use"
url = "/covidcast/terms-of-use"
weight = 5
weight = 9
[[menu.main]]
identifier = "flu"
name = "Flu and Other Diseases"
Expand Down
39 changes: 18 additions & 21 deletions content/blog/2020-09-21-forecast-demo.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -111,20 +111,16 @@ We evaluate the following four models:

$$
\begin{aligned}
&\text{Cases:} \\
& h(Y_{\ell,t+d})
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
&\text{Cases + Facebook:} \\
& h(Y_{\ell,t+d})
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
h(Y_{\ell,t+d})
&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
h(Y_{\ell,t+d})
&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) \\
&\text{Cases + Google:} \\
& h(Y_{\ell,t+d})
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
h(Y_{\ell,t+d})
&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
\sum_{j=0}^2 \gamma_j h(G_{\ell,t-7j}) \\
&\text{Cases + Facebook + Google:} \\
& h(Y_{\ell,t+d})
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
h(Y_{\ell,t+d})
&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) +
\sum_{j=0}^2 \tau_j h(G_{\ell,t-7j}).
\end{aligned}
Expand All @@ -134,14 +130,15 @@ Here $d=7$ or $d=14$, depending on the target value
(number of days we predict ahead),
and $h$ is a transformation to be specified later.

Informally, the first model bases its predictions of future case rates
on the following three features:
Informally, the first model, which we'll call the "Cases" model,
bases its predictions of future case rates on the following three features:
current COVID-19 case rates, and those 1 and 2 weeks back.
The second model additionally incorporates the current Facebook signal,
and the Facebook signal from 1 and 2 weeks back.
The third model is exactly same but substitutes the Google signal
instead of the Facebook one.
Finally, the fourth model uses both Facebook and Google signals.
The second model, "Cases + Facebook", additionally incorporates the
current Facebook signal, and the Facebook signal from 1 and 2 weeks back.
The third model, "Cases + Google", is exactly the same but substitutes the
Google signal instead of the Facebook one.
Finally, the fourth model, "Cases + Facebook + Google",
uses both Facebook and Google signals.
For each model, in order to make a forecast at time $t_0$
(to predict case rates at time $t_0+d$),
we fit a linear model using least absolute deviations (LAD) regression,
Expand Down Expand Up @@ -293,8 +290,8 @@ is much bigger but still below 0.01.
test](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test)
(for paired data, as we have here) is more popular,
because it tends to be more powerful than the sign test.
Applied here, it does indeed give smaller p-values pretty much across the board.
However, it assumes symmetry of the distribution in question
Applied here, it does indeed give smaller p-values pretty much across the
board. However, it assumes symmetry of the distribution in question
(in our case, the difference in scaled errors),
whereas the sign test does not, and thus we show results from the latter.

Expand Down
123 changes: 60 additions & 63 deletions content/blog/2020-09-21-forecast-demo.html
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@
summary: |
Building on our previous two posts (on our COVID-19 symptom surveys through
Facebook and Google)
this post offers a deeper dive into empirical analysis, examining whether the
% CLI-in-community indicators from our two surveys can be used to improve
this post offers a deeper dive into empirical analysis, examining whether the
% CLI-in-community indicators from our two surveys can be used to improve
the accuracy of short-term forecasts of county-level COVID-19 case rates.
acknowledgements: |
Delphi's forecasting effort involves many people from our
modeling team, from forecaster design, to implementation, to evaluation. The
modeling team, from forecaster design, to implementation, to evaluation. The
broader insights on forecasting shared in this post certainly cannot be
attributable to Ryan's work alone, and are a reflection of the work carried out
attributable to Ryan's work alone, and are a reflection of the work carried out
by all these team members.
related:
- 2020-09-18-google-survey
Expand Down Expand Up @@ -120,35 +120,32 @@ <h2>Problem Setup</h2>
We evaluate the following four models:</p>
<p><span class="math display">\[
\begin{aligned}
&amp;\text{Cases:} \\
&amp; h(Y_{\ell,t+d})
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
&amp;\text{Cases + Facebook:} \\
&amp; h(Y_{\ell,t+d})
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
h(Y_{\ell,t+d})
&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
h(Y_{\ell,t+d})
&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) \\
&amp;\text{Cases + Google:} \\
&amp; h(Y_{\ell,t+d})
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
h(Y_{\ell,t+d})
&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
\sum_{j=0}^2 \gamma_j h(G_{\ell,t-7j}) \\
&amp;\text{Cases + Facebook + Google:} \\
&amp; h(Y_{\ell,t+d})
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
h(Y_{\ell,t+d})
&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) +
\sum_{j=0}^2 \tau_j h(G_{\ell,t-7j}).
\end{aligned}
\]</span></p>
<p>Here <span class="math inline">\(d=7\)</span> or <span class="math inline">\(d=14\)</span>, depending on the target value
(number of days we predict ahead),
and <span class="math inline">\(h\)</span> is a transformation to be specified later.</p>
<p>Informally, the first model bases its predictions of future case rates
on the following three features:
<p>Informally, the first model, which we’ll call the “Cases” model,
bases its predictions of future case rates on the following three features:
current COVID-19 case rates, and those 1 and 2 weeks back.
The second model additionally incorporates the current Facebook signal,
and the Facebook signal from 1 and 2 weeks back.
The third model is exactly same but substitutes the Google signal
instead of the Facebook one.
Finally, the fourth model uses both Facebook and Google signals.
The second model, “Cases + Facebook”, additionally incorporates the
current Facebook signal, and the Facebook signal from 1 and 2 weeks back.
The third model, “Cases + Google”, is exactly the same but substitutes the
Google signal instead of the Facebook one.
Finally, the fourth model, “Cases + Facebook + Google”,
uses both Facebook and Google signals.
For each model, in order to make a forecast at time <span class="math inline">\(t_0\)</span>
(to predict case rates at time <span class="math inline">\(t_0+d\)</span>),
we fit a linear model using least absolute deviations (LAD) regression,
Expand Down Expand Up @@ -217,17 +214,17 @@ <h2>Forecasting Code</h2>
as.Date(max(time_value)),
by = &quot;day&quot;)) %&gt;% ungroup()
df = full_join(df, df_all, by = c(&quot;geo_value&quot;, &quot;time_value&quot;))

# Group by geo value, sort rows by increasing time
df = df %&gt;% group_by(geo_value) %&gt;% arrange(time_value)

df = df %&gt;% group_by(geo_value) %&gt;% arrange(time_value)
# Load over shifts, and add lag value or lead value
for (shift in shifts) {
fun = ifelse(shift &lt; 0, lag, lead)
varname = sprintf(&quot;value%+d&quot;, shift)
df = mutate(df, !!varname := fun(value, n = abs(shift)))
}

# Ungroup and return
return(ungroup(df))
}
Expand Down Expand Up @@ -261,40 +258,40 @@ <h2>Forecasting Code</h2>
case_num = 200
geo_values = covidcast_signal(&quot;jhu-csse&quot;, &quot;confirmed_cumulative_num&quot;,
&quot;2020-05-14&quot;, &quot;2020-05-14&quot;) %&gt;%
filter(value &gt;= case_num) %&gt;% pull(geo_value)
filter(value &gt;= case_num) %&gt;% pull(geo_value)

# Fetch county-level Google and Facebook % CLI-in-community signals, and JHU
# confirmed case incidence proportion
start_day = &quot;2020-04-11&quot;
end_day = &quot;2020-09-01&quot;
g = covidcast_signal(&quot;google-survey&quot;, &quot;smoothed_cli&quot;) %&gt;%
filter(geo_value %in% geo_values) %&gt;%
select(geo_value, time_value, value)
f = covidcast_signal(&quot;fb-survey&quot;, &quot;smoothed_hh_cmnty_cli&quot;,
filter(geo_value %in% geo_values) %&gt;%
select(geo_value, time_value, value)
f = covidcast_signal(&quot;fb-survey&quot;, &quot;smoothed_hh_cmnty_cli&quot;,
start_day, end_day) %&gt;%
filter(geo_value %in% geo_values) %&gt;%
select(geo_value, time_value, value)
filter(geo_value %in% geo_values) %&gt;%
select(geo_value, time_value, value)
c = covidcast_signal(&quot;jhu-csse&quot;, &quot;confirmed_7dav_incidence_prop&quot;,
start_day, end_day) %&gt;%
filter(geo_value %in% geo_values) %&gt;%
filter(geo_value %in% geo_values) %&gt;%
select(geo_value, time_value, value)

# Find &quot;complete&quot; counties, present in all three data signals at all times
# Find &quot;complete&quot; counties, present in all three data signals at all times
geo_values_complete = intersect(intersect(g$geo_value, f$geo_value),
c$geo_value)

# Filter to complete counties, transform the signals, append 1-2 week lags to
# Filter to complete counties, transform the signals, append 1-2 week lags to
# all three, and also 1-2 week leads to case rates
lags = 1:2 * -7
lags = 1:2 * -7
leads = 1:2 * 7
g = g %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
mutate(value = trans(value * rescale_g)) %&gt;%
append_shifts(shifts = lags)
f = f %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
mutate(value = trans(value * rescale_f)) %&gt;%
append_shifts(shifts = lags)
g = g %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
mutate(value = trans(value * rescale_g)) %&gt;%
append_shifts(shifts = lags)
f = f %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
mutate(value = trans(value * rescale_f)) %&gt;%
append_shifts(shifts = lags)
c = c %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
mutate(value = trans(value * rescale_c)) %&gt;%
mutate(value = trans(value * rescale_c)) %&gt;%
append_shifts(shifts = c(lags, leads))

# Rename columns
Expand All @@ -310,55 +307,55 @@ <h2>Forecasting Code</h2>

# Use quantgen for LAD regression (this package supports quantile regression and
# more; you can find it on GitHub here: https://github.com/ryantibs/quantgen)
library(quantgen)
library(quantgen)

res_list = vector(&quot;list&quot;, length = length(leads))

# Loop over lead, forecast dates, build models and record errors (warning: this
# computation takes a while)
for (i in 1:length(leads)) {
for (i in 1:length(leads)) {
lead = leads[i]; if (verbose) cat(&quot;***&quot;, lead, &quot;***\n&quot;)

# Create a data frame to store our forecast results. Code below populates its
# rows in a way that breaks from typical dplyr operations, done for efficiency
res_list[[i]] = z %&gt;%
filter(between(time_value, as.Date(start_day) - min(lags) + lead,
# rows in a way that breaks from typical dplyr operations, done for efficiency
res_list[[i]] = z %&gt;%
filter(between(time_value, as.Date(start_day) - min(lags) + lead,
as.Date(end_day) - lead)) %&gt;%
select(geo_value, time_value) %&gt;%
mutate(err0 = as.double(NA), err1 = as.double(NA), err2 = as.double(NA),
err3 = as.double(NA), err4 = as.double(NA), lead = lead)
mutate(err0 = as.double(NA), err1 = as.double(NA), err2 = as.double(NA),
err3 = as.double(NA), err4 = as.double(NA), lead = lead)
valid_dates = unique(res_list[[i]]$time_value)

for (k in 1:length(valid_dates)) {
date = valid_dates[k]; if (verbose) cat(format(date), &quot;... &quot;)

# Filter down to training set and test set
z_tr = z %&gt;% filter(between(time_value, date - lead - n, date - lead))
z_te = z %&gt;% filter(time_value == date)
inds = which(res_list[[i]]$time_value == date)

# Create training and test responses
y_tr = z_tr %&gt;% pull(paste0(&quot;case+&quot;, lead))
y_te = z_te %&gt;% pull(paste0(&quot;case+&quot;, lead))

# Strawman model
if (verbose) cat(&quot;0&quot;)
y_hat = z_te %&gt;% pull(case)
res_list[[i]][inds,]$err0 = abs(inv_trans(y_hat) - inv_trans(y_te))

# Cases only model
if (verbose) cat(&quot;1&quot;)
x_tr_case = z_tr %&gt;% select(starts_with(&quot;case&quot;) &amp; !contains(&quot;+&quot;))
x_te_case = z_te %&gt;% select(starts_with(&quot;case&quot;) &amp; !contains(&quot;+&quot;))
x_tr = x_tr_case; x_te = x_te_case # For symmetry wrt what follows
x_tr = x_tr_case; x_te = x_te_case # For symmetry wrt what follows
ok = complete.cases(x_tr, y_tr)
if (sum(ok) &gt; 0) {
obj = quantile_lasso(as.matrix(x_tr[ok,]), y_tr[ok], tau = 0.5,
lambda = 0, lp_solver = lp_solver)
y_hat = as.numeric(predict(obj, newx = as.matrix(x_te)))
res_list[[i]][inds,]$err1 = abs(inv_trans(y_hat) - inv_trans(y_te))
}

# Cases and Facebook model
if (verbose) cat(&quot;2&quot;)
x_tr_fb = z_tr %&gt;% select(starts_with(&quot;fb&quot;))
Expand Down Expand Up @@ -386,7 +383,7 @@ <h2>Forecasting Code</h2>
y_hat = as.numeric(predict(obj, newx = as.matrix(x_te)))
res_list[[i]][inds,]$err3 = abs(inv_trans(y_hat) - inv_trans(y_te))
}

# Cases, Facebook, and Google model
if (verbose) cat(&quot;4\n&quot;)
x_tr = cbind(x_tr_case, x_tr_fb, x_tr_goog)
Expand All @@ -401,7 +398,7 @@ <h2>Forecasting Code</h2>
}
}

# Bind results over different leads into one big data frame, and save
# Bind results over different leads into one big data frame, and save
res = do.call(rbind, res_list)
save(list = ls(), file = &quot;demo.rda&quot;)</code></pre>
</div>
Expand Down Expand Up @@ -1036,8 +1033,8 @@ <h2>Wrap-Up</h2>
test</a>
(for paired data, as we have here) is more popular,
because it tends to be more powerful than the sign test.
Applied here, it does indeed give smaller p-values pretty much across the board.
However, it assumes symmetry of the distribution in question
Applied here, it does indeed give smaller p-values pretty much across the
board. However, it assumes symmetry of the distribution in question
(in our case, the difference in scaled errors),
whereas the sign test does not, and thus we show results from the latter.<a href="#fnref2" class="footnote-back">↩︎</a></p></li>
<li id="fn3"><p>Delphi’s “production” forecasters are still based on relatively simple
Expand Down
6 changes: 5 additions & 1 deletion content/covidcast/_index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
---
title: COVIDcast
layout: covidcast_app
description: COVIDcast tracks and forecasts the spread of COVID-19. By Carnegie Mellon's Delphi Research Group.
layout: covidcast_app
app_mode: overview
order: 1
modeTitle: Map Overview
icon: solid/map
heroImage: /images/landing-page/hero-images/covidcast_withfill.jpg
---
3 changes: 3 additions & 0 deletions content/covidcast/export.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,8 @@ title: COVIDCast Export Data
linkTitle: Export Data
description: Use COVIDcast data in your own analysis
layout: covidcast_app
app_mode: export
order: 6
icon: solid/download
heroImage: /images/landing-page/hero-images/covidcast_withfill.jpg
---
Loading