-
Notifications
You must be signed in to change notification settings - Fork 12
Created post processing steps for adding the forecast and target dates #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rachlobay
merged 43 commits into
frosting
from
38-post_add_forecast_date-_add_target_date
Jul 19, 2022
Merged
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
59ca31a
Created layer_add_forecast_date and ..._target_date as well as tests …
cf9aaa9
recipes::step
6b82626
recipes:: to Rd file for added layers
bcfda7f
parsnip::fit
7c7e6e4
recipes::all_predictors()
85e5232
epipredict:::frosting()
26aa627
epipredict::add_frosting()
8780a36
Removed epipredict::: from doc
a64a549
Removed id from user facing
fcdddaf
Setting up for changes to make once able to access the preprocessor
0ae474a
Trying to see if recipe is accessible
ab795c1
testing forecast date layer
5e79dc3
Re-added forecast date
0c21a6d
testing
b8211fa
Updating this branch to reflect previous updates to frosting
c3491db
Put id back to where it was before
882a493
Add id
12351d3
Updated documentation & fixed fun
606fd89
removed object
af49a7e
Updated doc and ex
b05cff5
added import
33b8018
Got ahead from recipe
235cba8
some updates to forecast_date script
b608c2e
Fixed layer_add_forecast_date to remove parameter for newdata
ddf3d94
Added more details
8726dd9
Changed around spacing of doc.
2ebf283
Updates as per comments left
2792bc4
Enabled user to specify a target date
1037c79
Minor rewording
14d8344
Made suggested changes
0a053d3
Removed white space
0b2f048
Better way to access ahead
77ae185
Reformatted
3010355
is.null()
0084c5f
remove test
77e7e86
Merge branch 'frosting' of https://github.com/cmu-delphi/epipredict i…
78ecd9d
Had to call ahead another way after update from frosting
6997017
Took out test
9a304ed
Simplify code a little
09a7e94
take out test
b2ba576
To force update this branch to what is on frosting (used git pull ori…
cb1c753
Update to match frosting branch
2745927
extract_argument() to get ahead
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
#' Postprocessing step to add the forecast date | ||
#' | ||
#' @param frosting a `frosting` postprocessor | ||
#' @param forecast_date The forecast date to add as a column to the `epi_df`. | ||
#' For most cases, this should be specified in the form "yyyy-mm-dd". Note that | ||
#' when the forecast date is left unspecified, it is set to the maximum time | ||
#' value in the test data after any processing (ex. leads and lags) has been | ||
#' applied. | ||
#' @param id a random id string | ||
#' | ||
#' @return an updated `frosting` postprocessor | ||
#' | ||
#' @details To use this function, either specify a forecast date or leave the | ||
#' forecast date unspecifed here. In the latter case, the forecast date will | ||
#' be set as the maximum time value in the processed test data. In any case, | ||
#' when the forecast date is less than the most recent update date of the data | ||
#' (ie. the `as_of` value), an appropriate warning will be thrown. | ||
#' | ||
#' @export | ||
#' @examples | ||
#' jhu <- case_death_rate_subset %>% | ||
#' dplyr::filter(time_value > "2021-11-01", geo_value %in% c("ak", "ca", "ny")) | ||
#' r <- epi_recipe(jhu) %>% | ||
#' step_epi_lag(death_rate, lag = c(0, 7, 14)) %>% | ||
#' step_epi_ahead(death_rate, ahead = 7) %>% | ||
#' recipes::step_naomit(recipes::all_predictors()) %>% | ||
#' recipes::step_naomit(recipes::all_outcomes(), skip = TRUE) | ||
#' wf <- epi_workflow(r, parsnip::linear_reg()) %>% parsnip::fit(jhu) | ||
#' latest <- jhu %>% | ||
#' dplyr::filter(time_value >= max(time_value) - 14) | ||
#' | ||
#' # Specify a `forecast_date` that is greater than or equal to `as_of` date | ||
#' f <- frosting() %>% layer_predict() %>% | ||
#' layer_add_forecast_date(forecast_date = "2022-05-31") %>% | ||
#' layer_naomit(.pred) | ||
#' wf1 <- wf %>% add_frosting(f) | ||
#' | ||
#' p1 <- predict(wf1, latest) | ||
#' p1 | ||
#' | ||
#' # Specify a `forecast_date` that is less than `as_of` date | ||
#' f2 <- frosting() %>% | ||
#' layer_predict() %>% | ||
#' layer_add_forecast_date(forecast_date = "2021-12-31") %>% | ||
#' layer_naomit(.pred) | ||
#' wf2 <- wf %>% add_frosting(f2) | ||
#' | ||
#' p2 <- predict(wf2, latest) | ||
#' p2 | ||
#' | ||
#' # Do not specify a forecast_date | ||
#' f3 <- frosting() %>% | ||
#' layer_predict() %>% | ||
#' layer_add_forecast_date() %>% | ||
#' layer_naomit(.pred) | ||
#' wf3 <- wf %>% add_frosting(f3) | ||
#' | ||
#' p3 <- predict(wf3, latest) | ||
#' p3 | ||
layer_add_forecast_date <- | ||
function(frosting, forecast_date = NULL, id = rand_id("add_forecast_date")) { | ||
add_layer( | ||
frosting, | ||
layer_add_forecast_date_new( | ||
forecast_date = forecast_date, | ||
id = id | ||
) | ||
) | ||
} | ||
|
||
layer_add_forecast_date_new <- function(forecast_date, id = id) { | ||
layer("add_forecast_date", forecast_date = forecast_date, id = id) | ||
} | ||
|
||
#' @export | ||
slather.layer_add_forecast_date <- function(object, components, the_fit, the_recipe, ...) { | ||
|
||
if (is.null(object$forecast_date)) { | ||
max_time_value <- max(components$keys$time_value) | ||
object$forecast_date <- max_time_value | ||
} | ||
|
||
as_of_date <- as.Date(attributes(components$keys)$metadata$as_of) | ||
|
||
if (object$forecast_date < as_of_date) { | ||
warning("forecast_date is less than the most recent update date of the data.") | ||
} | ||
|
||
components$predictions <- dplyr::bind_cols(components$predictions, | ||
forecast_date = as.Date(object$forecast_date)) | ||
components | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
#' Postprocessing step to add the target date | ||
#' | ||
#' @param frosting a `frosting` postprocessor | ||
#' @param target_date The target date to add as a column to the `epi_df`. | ||
#' By default, this is the maximum `time_value` from the processed test | ||
#' data plus `ahead`, where `ahead` has been specified in preprocessing | ||
#' (most likely in `step_epi_ahead`). The user may override this with a | ||
#' date of their own (that will usually be in the form "yyyy-mm-dd"). | ||
#' @param id a random id string | ||
#' | ||
#' @return an updated `frosting` postprocessor | ||
#' | ||
#' @details By default, this function assumes that a value for `ahead` | ||
#' has been specified in a preprocessing step (most likely in | ||
#' `step_epi_ahead`). Then, `ahead` is added to the maximum `time_value` | ||
#' in the test data to get the target date. | ||
#' | ||
#' @export | ||
#' @examples | ||
#' jhu <- case_death_rate_subset %>% | ||
#' dplyr::filter(time_value > "2021-11-01", geo_value %in% c("ak", "ca", "ny")) | ||
#' r <- epi_recipe(jhu) %>% | ||
#' step_epi_lag(death_rate, lag = c(0, 7, 14)) %>% | ||
#' step_epi_ahead(death_rate, ahead = 7) %>% | ||
#' recipes::step_naomit(recipes::all_predictors()) %>% | ||
#' recipes::step_naomit(recipes::all_outcomes(), skip = TRUE) | ||
#' wf <- epi_workflow(r, parsnip::linear_reg()) %>% parsnip::fit(jhu) | ||
#' latest <- jhu %>% | ||
#' dplyr::filter(time_value >= max(time_value) - 14) | ||
#' | ||
#' # Use ahead from preprocessing | ||
#' f <- frosting() %>% layer_predict() %>% | ||
#' layer_add_target_date() %>% layer_naomit(.pred) | ||
#' wf1 <- wf %>% add_frosting(f) | ||
#' | ||
#' p <- predict(wf1, latest) | ||
#' p | ||
#' | ||
#' # Override default behaviour by specifying own target date | ||
#' f2 <- frosting() %>% layer_predict() %>% | ||
#' layer_add_target_date(target_date = "2022-01-08") %>% layer_naomit(.pred) | ||
#' wf2 <- wf %>% add_frosting(f2) | ||
#' | ||
#' p2 <- predict(wf2, latest) | ||
#' p2 | ||
layer_add_target_date <- | ||
function(frosting, target_date = NULL, id = rand_id("add_target_date")) { | ||
add_layer( | ||
frosting, | ||
dajmcdon marked this conversation as resolved.
Show resolved
Hide resolved
|
||
layer_add_target_date_new( | ||
target_date = target_date, | ||
id = id | ||
) | ||
) | ||
} | ||
|
||
layer_add_target_date_new <- function(id = id, target_date = target_date) { | ||
layer("add_target_date", target_date = target_date, id = id) | ||
} | ||
|
||
#' @export | ||
slather.layer_add_target_date <- function(object, components, the_fit, the_recipe, ...) { | ||
|
||
if (is.null(object$target_date)) { | ||
max_time_value <- max(components$keys$time_value) | ||
ahead <- extract_argument(the_recipe, "step_epi_ahead", "ahead") | ||
|
||
if (is.null(ahead)){ | ||
stop("`ahead` must be specified in preprocessing.") | ||
} | ||
target_date = max_time_value + ahead | ||
} else{ | ||
target_date = as.Date(object$target_date) | ||
} | ||
|
||
components$predictions <- dplyr::bind_cols(components$predictions, | ||
target_date = target_date) | ||
components | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Sorry, I missed this. To my mind the
forecast_date
is the date on which the forecast is made. So, by default, it should bemax(time_value)
from the training data. Thetarget_date
should be "the date the forecast is for". So that one should bemax(time_value) + ahead
by default.It looks like they're both the same currently right?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost… Current defaults:
forecast_date = max_time_value + ahead
(wheremax_time_value
is for test data),target_date = time_value + ahead
(based on simple.forecasts.Rmd). But I will change to what you specified. A couple qs for that...For
forecast_date
, to get the max time value from the training data, is that using mold from components? Then, I check if theforecast_date < as_of_date
of the test data and throw a warning there (that says "forecast_date is less than the most recent update date of the data.”, yes?For
target_date
, by default, that is the max time value in the in the test data + ahead or no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difficulty here is that leading/lagging increments the
time_value
. So this could all be a bit dangerous. In an ideal setting:target_date
would be the max time value in the test data + ahead, as you said.forecast_date
would be the max time value in the test data. (So seemingly, the date of the most recent data available to you). Acting like we have data up to (and including) today, and we produce a forecast today, then this should work.However, lots of weird crud can happen that would screw these up. If data isn't available for today but only for yesterday, then that would throw things off. If we accidentally lead the test data, then it'll produce "future" time_values.