-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Making this thread to address the broader question opened in #3 about the function signature of a forecaster object. cc @dajmcdon @brookslogan @ryantibs
Current proposals:
- A combination of features, response variables, a few index args exposed, and the rest bundled into a control arg
prob_arx <- function(x, y, geo_value, time_value, args = prob_arx_args())
- A plain data interface and the rest of the arguments bundled into a control arg (see Logan's full comment here)
function(df, args = prob_arx_args())
Two dimensions of variation here:
- do forecasters need to be aware of the index variables, if they are already being given axis-stripped training/testing matrices?
- does everything except data go into the control arg? what columns does the forecaster expect from
df
?
One possibility (setting aside the question of whether indexing variables need to be arguments) is that the forecaster signature in (1) could be a function inside the forecaster in signature (2). (2) would do additional parsing of the dataframe df
and produce training/testing matrices x
, y
and hand them off to something that looks like (1).
A question to pump our design intuition: what do we want a forecaster call pattern to look like? For example, following along with the vignettes in epiprocess, we can imagine having a call that looks something like this:
library(covidcast)
library(epiprocess)
library(data.table)
library(dplyr)
forecast_date <- "2021-12-01"
dv <- covidcast_signal("doctor-visits",
"smoothed_adj_cli",
start_day = "2020-06-01",
end_day = "2021-12-01",
issues = c("2020-06-01", "2021-12-01"),
geo_type = "state",
geo_values = c("ca", "fl")) %>%
select(geo_value, time_value, version = issue, percent_cli = value) %>%
as_epi_df()
forecasts <- dv %>% group_by(geo_value) %>%
map(some_forecaster, forecast_date = forecast_date, args = prob_arx_args())
This assumes some_forecaster
to have a similar signature as when we are using evalcast
, namely forecaster(df_list, forecast_date)
(I'm not entirely sure if I can pipe the result of the group_by into map like that, but let's suppose that's correct).
That is one possibility. Another possibility is to have an evalcast::get_predictions
-like function
forecasts <- get_predictions(dv, some_forecaster, forecast_dates = forecast_dates, args = prob_arx_args())
This function would a lot incorporate grouping and indexing logic, do argument validation, etc. There is logic here that we likely won't want to have a first-time user do on their own. Ideally, once they've put their data in the epi_df
or epi_archive
format, we can be confident the data has the columns and features we need to do model training and forecasting,
An implicit question here is: how much complexity do we place in the forecaster object and in the get_predictions
routine. Does the forecaster assume a clean, indexed time-series, so it can focus on just doing the math/statistics or does it also need to intersect with the domain of validating data, etc.? Do we like the structure we used in evalcast::get_predictions
or does that bake-in too much information in one spot and assume a one size fits all model for forecasters?