What signature should a forecaster object have?

Making this thread to address the broader question opened in #3 about the function signature of a forecaster object. cc @dajmcdon @brookslogan @ryantibs 

Current proposals:
1. A combination of features, response variables, a few index args exposed, and the rest bundled into a control arg
```r
prob_arx <- function(x, y, geo_value, time_value, args = prob_arx_args())
```

2. A plain data interface and the rest of the arguments bundled into a control arg (see Logan's full comment [here](https://github.com/cmu-delphi/epipredict/issues/3#issuecomment-1061193088))
```r
function(df, args = prob_arx_args())
```

Two dimensions of variation here: 
- do forecasters need to be aware of the index variables, if they are already being given axis-stripped training/testing matrices?
- does everything except data go into the control arg? what columns does the forecaster expect from `df`?

One possibility (setting aside the question of whether indexing variables need to be arguments) is that the forecaster signature in (1) could be a function inside the forecaster in signature (2). (2) would do additional parsing of the dataframe `df` and produce training/testing matrices `x`, `y` and hand them off to something that looks like (1).

A question to pump our design intuition: what do we want a forecaster call pattern to look like? For example, following along with the [vignettes](https://cmu-delphi.github.io/epiprocess/articles/archive.html) in epiprocess, we can imagine having a call that looks something like this:

```r
library(covidcast)
library(epiprocess)
library(data.table)
library(dplyr)

forecast_date <- "2021-12-01"
dv <- covidcast_signal("doctor-visits",
                       "smoothed_adj_cli",
                       start_day = "2020-06-01",
                       end_day = "2021-12-01",
                       issues = c("2020-06-01", "2021-12-01"),
                       geo_type = "state",
                       geo_values = c("ca", "fl")) %>% 
           select(geo_value, time_value, version = issue, percent_cli = value) %>%
           as_epi_df()

forecasts <- dv %>% group_by(geo_value) %>%
                     map(some_forecaster, forecast_date = forecast_date, args = prob_arx_args())
```

This assumes `some_forecaster` to have a similar signature as when we are using `evalcast`, namely `forecaster(df_list, forecast_date)` (I'm not entirely sure if I can pipe the result of the group_by into map like that, but let's suppose that's correct).

That is one possibility. Another possibility is to have an `evalcast::get_predictions`-like function
```r
forecasts <- get_predictions(dv, some_forecaster, forecast_dates = forecast_dates, args = prob_arx_args())
```
This function would a lot incorporate grouping and indexing logic, do argument validation, etc. There is logic here that we likely won't want to have a first-time user do on their own. Ideally, once they've put their data in the `epi_df` or `epi_archive` format, we can be confident the data has the columns and features we need to do model training and forecasting,

An implicit question here is: how much complexity do we place in the forecaster object and in the `get_predictions` routine. Does the forecaster assume a clean, indexed time-series, so it can focus on just doing the math/statistics or does it also need to intersect with the domain of validating data, etc.? Do we like the structure we used in `evalcast::get_predictions` or does that bake-in too much information in one spot and assume a one size fits all model for forecasters?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What signature should a forecaster object have? #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What signature should a forecaster object have? #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions