cmu-delphi · sgratzl · Dec 29, 2020 · Dec 11, 2020 · Dec 23, 2020 · Dec 23, 2020
@@ -38,29 +38,49 @@ relativeURLs = false
     weight = 2
 [[menu.main]]
     parent = "covidcast"
-    name = "Map"
+    name = "Map Overview"
     url = "/covidcast"
     weight = 1
+[[menu.main]]
+    parent = "covidcast"
+    name = "Timelapse"
+    url = "/covidcast/timelapse"
+    weight = 2
+[[menu.main]]
+    parent = "covidcast"
+    name = "Top 10"
+    url = "/covidcast/top10"
+    weight = 3
+[[menu.main]]
+    parent = "covidcast"
+    name = "Single Region"
+    url = "/covidcast/single"
+    weight = 4
 [[menu.main]]
     parent = "covidcast"
     name = "Surveys"
     url = "/covidcast/surveys"
-    weight = 2
+    weight = 5
 [[menu.main]]
     parent = "covidcast"
     name = "Survey Results"
     url = "/covidcast/survey-results"
-    weight = 3
+    weight = 6
+[[menu.main]]
+    parent = "covidcast"
+    name = "Export Data"
+    url = "/covidcast/export"
+    weight = 7
 [[menu.main]]
     parent = "covidcast"
     name = "Release Log"
     url = "/covidcast/release-log"
-    weight = 4
+    weight = 8
 [[menu.main]]
     parent = "covidcast"
     name = "Terms Of Use"
     url = "/covidcast/terms-of-use"
-    weight = 5
+    weight = 9
 [[menu.main]]
     identifier = "flu"
     name = "Flu and Other Diseases"

@@ -111,20 +111,16 @@ We evaluate the following four models:
 
 $$
 \begin{aligned}
-&\text{Cases:} \\
-& h(Y_{\ell,t+d})
-\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
-&\text{Cases + Facebook:} \\
-& h(Y_{\ell,t+d})
-\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
+h(Y_{\ell,t+d})
+&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
+h(Y_{\ell,t+d})
+&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
 \sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) \\
-&\text{Cases + Google:} \\
-& h(Y_{\ell,t+d})
-\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
+h(Y_{\ell,t+d})
+&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
 \sum_{j=0}^2 \gamma_j h(G_{\ell,t-7j}) \\
-&\text{Cases + Facebook + Google:} \\
-& h(Y_{\ell,t+d})
-\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
+h(Y_{\ell,t+d})
+&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
 \sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) +
 \sum_{j=0}^2 \tau_j h(G_{\ell,t-7j}).
 \end{aligned}
@@ -134,14 +130,15 @@ Here $d=7$ or $d=14$, depending on the target value
 (number of days we predict ahead),
 and $h$ is a transformation to be specified later.
 
-Informally, the first model bases its predictions of future case rates
-on the following three features:
+Informally, the first model, which we'll call the "Cases" model, 
+bases its predictions of future case rates on the following three features:
 current COVID-19 case rates, and those 1 and 2 weeks back.
-The second model additionally incorporates the current Facebook signal,
-and the Facebook signal from 1 and 2 weeks back.
-The third model is exactly same but substitutes the Google signal
-instead of the Facebook one.
-Finally, the fourth model uses both Facebook and Google signals.
+The second model, "Cases + Facebook", additionally incorporates the 
+current Facebook signal, and the Facebook signal from 1 and 2 weeks back.
+The third model, "Cases + Google", is exactly the same but substitutes the 
+Google signal instead of the Facebook one.
+Finally, the fourth model, "Cases + Facebook + Google", 
+uses both Facebook and Google signals.
 For each model, in order to make a forecast at time $t_0$
 (to predict case rates at time $t_0+d$),
 we fit a linear model using least absolute deviations (LAD) regression,
@@ -293,8 +290,8 @@ is much bigger but still below 0.01.
     test](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test)
     (for paired data, as we have here) is more popular,
     because it tends to be more powerful than the sign test.
-    Applied here, it does indeed give smaller p-values pretty much across the board.
-    However, it assumes symmetry of the distribution in question
+    Applied here, it does indeed give smaller p-values pretty much across the 
+    board. However, it assumes symmetry of the distribution in question
     (in our case, the difference in scaled errors),
     whereas the sign test does not, and thus we show results from the latter.
 

@@ -14,14 +14,14 @@
 summary: |
   Building on our previous two posts (on our COVID-19 symptom surveys through
   Facebook and Google)
-  this post offers a deeper dive into empirical analysis, examining whether the
-  % CLI-in-community indicators from our two surveys can be used to improve
+  this post offers a deeper dive into empirical analysis, examining whether the 
+  % CLI-in-community indicators from our two surveys can be used to improve 
   the accuracy of short-term forecasts of county-level COVID-19 case rates.
 acknowledgements: |
   Delphi's forecasting effort involves many people from our
-  modeling team, from forecaster design, to implementation, to evaluation. The
+  modeling team, from forecaster design, to implementation, to evaluation. The 
   broader insights on forecasting shared in this post certainly cannot be
-  attributable to Ryan's work alone, and are a reflection of the work carried out
+  attributable to Ryan's work alone, and are a reflection of the work carried out 
   by all these team members.
 related:
   - 2020-09-18-google-survey
@@ -120,35 +120,32 @@ <h2>Problem Setup</h2>
 We evaluate the following four models:</p>
 <p><span class="math display">\[
 \begin{aligned}
-&amp;\text{Cases:} \\
-&amp; h(Y_{\ell,t+d})
-\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
-&amp;\text{Cases + Facebook:} \\
-&amp; h(Y_{\ell,t+d})
-\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
+h(Y_{\ell,t+d})
+&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
+h(Y_{\ell,t+d})
+&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
 \sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) \\
-&amp;\text{Cases + Google:} \\
-&amp; h(Y_{\ell,t+d})
-\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
+h(Y_{\ell,t+d})
+&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
 \sum_{j=0}^2 \gamma_j h(G_{\ell,t-7j}) \\
-&amp;\text{Cases + Facebook + Google:} \\
-&amp; h(Y_{\ell,t+d})
-\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
+h(Y_{\ell,t+d})
+&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
 \sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) +
 \sum_{j=0}^2 \tau_j h(G_{\ell,t-7j}).
 \end{aligned}
 \]</span></p>
 <p>Here <span class="math inline">\(d=7\)</span> or <span class="math inline">\(d=14\)</span>, depending on the target value
 (number of days we predict ahead),
 and <span class="math inline">\(h\)</span> is a transformation to be specified later.</p>
-<p>Informally, the first model bases its predictions of future case rates
-on the following three features:
+<p>Informally, the first model, which we’ll call the “Cases” model,
+bases its predictions of future case rates on the following three features:
 current COVID-19 case rates, and those 1 and 2 weeks back.
-The second model additionally incorporates the current Facebook signal,
-and the Facebook signal from 1 and 2 weeks back.
-The third model is exactly same but substitutes the Google signal
-instead of the Facebook one.
-Finally, the fourth model uses both Facebook and Google signals.
+The second model, “Cases + Facebook”, additionally incorporates the
+current Facebook signal, and the Facebook signal from 1 and 2 weeks back.
+The third model, “Cases + Google”, is exactly the same but substitutes the
+Google signal instead of the Facebook one.
+Finally, the fourth model, “Cases + Facebook + Google”,
+uses both Facebook and Google signals.
 For each model, in order to make a forecast at time <span class="math inline">\(t_0\)</span>
 (to predict case rates at time <span class="math inline">\(t_0+d\)</span>),
 we fit a linear model using least absolute deviations (LAD) regression,
@@ -217,17 +214,17 @@ <h2>Forecasting Code</h2>
                                     as.Date(max(time_value)),
                                     by = &quot;day&quot;)) %&gt;% ungroup()
   df = full_join(df, df_all, by = c(&quot;geo_value&quot;, &quot;time_value&quot;))
-
+  
   # Group by geo value, sort rows by increasing time
-  df = df %&gt;% group_by(geo_value) %&gt;% arrange(time_value)
-
+  df = df %&gt;% group_by(geo_value) %&gt;% arrange(time_value) 
+  
   # Load over shifts, and add lag value or lead value
   for (shift in shifts) {
     fun = ifelse(shift &lt; 0, lag, lead)
     varname = sprintf(&quot;value%+d&quot;, shift)
     df = mutate(df, !!varname := fun(value, n = abs(shift)))
   }
-
+  
   # Ungroup and return
   return(ungroup(df))
 }
@@ -261,40 +258,40 @@ <h2>Forecasting Code</h2>
 case_num = 200
 geo_values = covidcast_signal(&quot;jhu-csse&quot;, &quot;confirmed_cumulative_num&quot;,
                               &quot;2020-05-14&quot;, &quot;2020-05-14&quot;) %&gt;%
-  filter(value &gt;= case_num) %&gt;% pull(geo_value)
+  filter(value &gt;= case_num) %&gt;% pull(geo_value) 
 
 # Fetch county-level Google and Facebook % CLI-in-community signals, and JHU
 # confirmed case incidence proportion
 start_day = &quot;2020-04-11&quot;
 end_day = &quot;2020-09-01&quot;
 g = covidcast_signal(&quot;google-survey&quot;, &quot;smoothed_cli&quot;) %&gt;%
-  filter(geo_value %in% geo_values) %&gt;%
-  select(geo_value, time_value, value)
-f = covidcast_signal(&quot;fb-survey&quot;, &quot;smoothed_hh_cmnty_cli&quot;,
+  filter(geo_value %in% geo_values) %&gt;% 
+  select(geo_value, time_value, value) 
+f = covidcast_signal(&quot;fb-survey&quot;, &quot;smoothed_hh_cmnty_cli&quot;, 
                      start_day, end_day) %&gt;%
-  filter(geo_value %in% geo_values) %&gt;%
-  select(geo_value, time_value, value)
+  filter(geo_value %in% geo_values) %&gt;% 
+  select(geo_value, time_value, value) 
 c = covidcast_signal(&quot;jhu-csse&quot;, &quot;confirmed_7dav_incidence_prop&quot;,
                      start_day, end_day) %&gt;%
-  filter(geo_value %in% geo_values) %&gt;%
+  filter(geo_value %in% geo_values) %&gt;% 
   select(geo_value, time_value, value)
 
-# Find &quot;complete&quot; counties, present in all three data signals at all times
+# Find &quot;complete&quot; counties, present in all three data signals at all times 
 geo_values_complete = intersect(intersect(g$geo_value, f$geo_value),
                                 c$geo_value)
 
-# Filter to complete counties, transform the signals, append 1-2 week lags to
+# Filter to complete counties, transform the signals, append 1-2 week lags to 
 # all three, and also 1-2 week leads to case rates
-lags = 1:2 * -7
+lags = 1:2 * -7 
 leads = 1:2 * 7
-g = g %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
-  mutate(value = trans(value * rescale_g)) %&gt;%
-  append_shifts(shifts = lags)
-f = f %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
-  mutate(value = trans(value * rescale_f)) %&gt;%
-  append_shifts(shifts = lags)
+g = g %&gt;% filter(geo_value %in% geo_values_complete) %&gt;% 
+  mutate(value = trans(value * rescale_g)) %&gt;% 
+  append_shifts(shifts = lags) 
+f = f %&gt;% filter(geo_value %in% geo_values_complete) %&gt;% 
+  mutate(value = trans(value * rescale_f)) %&gt;% 
+  append_shifts(shifts = lags) 
 c = c %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
-  mutate(value = trans(value * rescale_c)) %&gt;%
+  mutate(value = trans(value * rescale_c)) %&gt;% 
   append_shifts(shifts = c(lags, leads))
 
 # Rename columns
@@ -310,55 +307,55 @@ <h2>Forecasting Code</h2>
 
 # Use quantgen for LAD regression (this package supports quantile regression and
 # more; you can find it on GitHub here: https://github.com/ryantibs/quantgen)
-library(quantgen)
+library(quantgen) 
 
 res_list = vector(&quot;list&quot;, length = length(leads))
 
 # Loop over lead, forecast dates, build models and record errors (warning: this
 # computation takes a while)
-for (i in 1:length(leads)) {
+for (i in 1:length(leads)) { 
   lead = leads[i]; if (verbose) cat(&quot;***&quot;, lead, &quot;***\n&quot;)
-
+  
   # Create a data frame to store our forecast results. Code below populates its
-  # rows in a way that breaks from typical dplyr operations, done for efficiency
-  res_list[[i]] = z %&gt;%
-    filter(between(time_value, as.Date(start_day) - min(lags) + lead,
+  # rows in a way that breaks from typical dplyr operations, done for efficiency 
+  res_list[[i]] = z %&gt;% 
+    filter(between(time_value, as.Date(start_day) - min(lags) + lead, 
                    as.Date(end_day) - lead)) %&gt;%
     select(geo_value, time_value) %&gt;%
-    mutate(err0 = as.double(NA), err1 = as.double(NA), err2 = as.double(NA),
-           err3 = as.double(NA), err4 = as.double(NA), lead = lead)
+    mutate(err0 = as.double(NA), err1 = as.double(NA), err2 = as.double(NA), 
+           err3 = as.double(NA), err4 = as.double(NA), lead = lead) 
   valid_dates = unique(res_list[[i]]$time_value)
-
+  
   for (k in 1:length(valid_dates)) {
     date = valid_dates[k]; if (verbose) cat(format(date), &quot;... &quot;)
-
+    
     # Filter down to training set and test set
     z_tr = z %&gt;% filter(between(time_value, date - lead - n, date - lead))
     z_te = z %&gt;% filter(time_value == date)
     inds = which(res_list[[i]]$time_value == date)
-
+    
     # Create training and test responses
     y_tr = z_tr %&gt;% pull(paste0(&quot;case+&quot;, lead))
     y_te = z_te %&gt;% pull(paste0(&quot;case+&quot;, lead))
-
+    
     # Strawman model
     if (verbose) cat(&quot;0&quot;)
     y_hat = z_te %&gt;% pull(case)
     res_list[[i]][inds,]$err0 = abs(inv_trans(y_hat) - inv_trans(y_te))
-
+    
     # Cases only model
     if (verbose) cat(&quot;1&quot;)
     x_tr_case = z_tr %&gt;% select(starts_with(&quot;case&quot;) &amp; !contains(&quot;+&quot;))
     x_te_case = z_te %&gt;% select(starts_with(&quot;case&quot;) &amp; !contains(&quot;+&quot;))
-    x_tr = x_tr_case; x_te = x_te_case # For symmetry wrt what follows
+    x_tr = x_tr_case; x_te = x_te_case # For symmetry wrt what follows 
     ok = complete.cases(x_tr, y_tr)
     if (sum(ok) &gt; 0) {
       obj = quantile_lasso(as.matrix(x_tr[ok,]), y_tr[ok], tau = 0.5,
                            lambda = 0, lp_solver = lp_solver)
       y_hat = as.numeric(predict(obj, newx = as.matrix(x_te)))
       res_list[[i]][inds,]$err1 = abs(inv_trans(y_hat) - inv_trans(y_te))
     }
-
+    
     # Cases and Facebook model
     if (verbose) cat(&quot;2&quot;)
     x_tr_fb = z_tr %&gt;% select(starts_with(&quot;fb&quot;))
@@ -386,7 +383,7 @@ <h2>Forecasting Code</h2>
       y_hat = as.numeric(predict(obj, newx = as.matrix(x_te)))
       res_list[[i]][inds,]$err3 = abs(inv_trans(y_hat) - inv_trans(y_te))
     }
-
+    
     # Cases, Facebook, and Google model
     if (verbose) cat(&quot;4\n&quot;)
     x_tr = cbind(x_tr_case, x_tr_fb, x_tr_goog)
@@ -401,7 +398,7 @@ <h2>Forecasting Code</h2>
   }
 }
 
-# Bind results over different leads into one big data frame, and save
+# Bind results over different leads into one big data frame, and save 
 res = do.call(rbind, res_list)
 save(list = ls(), file = &quot;demo.rda&quot;)</code></pre>
 </div>
@@ -1036,8 +1033,8 @@ <h2>Wrap-Up</h2>
 test</a>
 (for paired data, as we have here) is more popular,
 because it tends to be more powerful than the sign test.
-Applied here, it does indeed give smaller p-values pretty much across the board.
-However, it assumes symmetry of the distribution in question
+Applied here, it does indeed give smaller p-values pretty much across the
+board. However, it assumes symmetry of the distribution in question
 (in our case, the difference in scaled errors),
 whereas the sign test does not, and thus we show results from the latter.<a href="#fnref2" class="footnote-back">↩︎</a></p></li>
 <li id="fn3"><p>Delphi’s “production” forecasters are still based on relatively simple

@@ -1,6 +1,10 @@
 ---
 title: COVIDcast
-layout: covidcast_app
 description: COVIDcast tracks and forecasts the spread of COVID-19. By Carnegie Mellon's Delphi Research Group.
+layout: covidcast_app
+app_mode: overview
+order: 1
+modeTitle: Map Overview
+icon: solid/map
 heroImage: /images/landing-page/hero-images/covidcast_withfill.jpg
 ---
@@ -3,5 +3,8 @@ title: COVIDCast Export Data
 linkTitle: Export Data
 description: Use COVIDcast data in your own analysis
 layout: covidcast_app
+app_mode: export
+order: 6
+icon: solid/download
 heroImage: /images/landing-page/hero-images/covidcast_withfill.jpg
 ---