Skip to content

What signature should a forecaster object have? #8

@dshemetov

Description

@dshemetov

Making this thread to address the broader question opened in #3 about the function signature of a forecaster object. cc @dajmcdon @brookslogan @ryantibs

Current proposals:

  1. A combination of features, response variables, a few index args exposed, and the rest bundled into a control arg
prob_arx <- function(x, y, geo_value, time_value, args = prob_arx_args())
  1. A plain data interface and the rest of the arguments bundled into a control arg (see Logan's full comment here)
function(df, args = prob_arx_args())

Two dimensions of variation here:

  • do forecasters need to be aware of the index variables, if they are already being given axis-stripped training/testing matrices?
  • does everything except data go into the control arg? what columns does the forecaster expect from df?

One possibility (setting aside the question of whether indexing variables need to be arguments) is that the forecaster signature in (1) could be a function inside the forecaster in signature (2). (2) would do additional parsing of the dataframe df and produce training/testing matrices x, y and hand them off to something that looks like (1).

A question to pump our design intuition: what do we want a forecaster call pattern to look like? For example, following along with the vignettes in epiprocess, we can imagine having a call that looks something like this:

library(covidcast)
library(epiprocess)
library(data.table)
library(dplyr)

forecast_date <- "2021-12-01"
dv <- covidcast_signal("doctor-visits",
                       "smoothed_adj_cli",
                       start_day = "2020-06-01",
                       end_day = "2021-12-01",
                       issues = c("2020-06-01", "2021-12-01"),
                       geo_type = "state",
                       geo_values = c("ca", "fl")) %>% 
           select(geo_value, time_value, version = issue, percent_cli = value) %>%
           as_epi_df()

forecasts <- dv %>% group_by(geo_value) %>%
                     map(some_forecaster, forecast_date = forecast_date, args = prob_arx_args())

This assumes some_forecaster to have a similar signature as when we are using evalcast, namely forecaster(df_list, forecast_date) (I'm not entirely sure if I can pipe the result of the group_by into map like that, but let's suppose that's correct).

That is one possibility. Another possibility is to have an evalcast::get_predictions-like function

forecasts <- get_predictions(dv, some_forecaster, forecast_dates = forecast_dates, args = prob_arx_args())

This function would a lot incorporate grouping and indexing logic, do argument validation, etc. There is logic here that we likely won't want to have a first-time user do on their own. Ideally, once they've put their data in the epi_df or epi_archive format, we can be confident the data has the columns and features we need to do model training and forecasting,

An implicit question here is: how much complexity do we place in the forecaster object and in the get_predictions routine. Does the forecaster assume a clean, indexed time-series, so it can focus on just doing the math/statistics or does it also need to intersect with the domain of validating data, etc.? Do we like the structure we used in evalcast::get_predictions or does that bake-in too much information in one spot and assume a one size fits all model for forecasters?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions