-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Context
@dsweber2 was just noting that
epi_df %>%
epi_slide(
~ .x %>%
group_by(geo_value) %>%
arx_forecaster(.....)
)doesn't fit per-geo models; it actually just ignores the grouping altogether.
We suggested "transposing" the operations, but @rnayebi21 found that
epi_df %>%
group_by(geo_value) %>%
epi_slide(~ arx_forecaster(.x, .....), .....)doesn't work either; .x doesn't have the geo_value column thus lacks epi_dfness. I believe these problems also apply to when you are trying to do version-faithful backtesting with epix_slide().
Workarounds seem a little bit of a pain, either
- fixing up the first approach by doing something like
- split + map + bind_rows, or
- group_by + group_split + map + bind_rows, or
- group_by + group_modify(.keep = TRUE), or
- group_by + reframe (using the deprecated-but-not-replaced cur_data_all()...)
- fixing up the second approach by reconstructing an
epi_dfinside the slide computation using.x,.group_key, and.ref_time_value, or - [
mutate(geo_value2 = geo_value)and group by that instead ofgeo_value. Or justgroup_by(geo_value2 = geo_value).]
The first workaround seems more modular (you can have a list of forecasters that can all rely on ungrouped slides, rather than having to do a different type of slide call for each one).
Proposal
- Make
arx_forecaster()check specifically if there's a missinggeo_valueand hint that if they were doing a groupedepix_slide()orepi_slide()withgeo_valuein the group variables, that won't work, and to do <some workaround / feature> instead. - Check if input to
arx_forecaster()etc. is grouped; if so, either- warn
- abort
- fit & forecast one model per group
- [Also check for groupedness and warn/abort in the epi workflow internals.]
Musings
We can also probably make things easier epiprocess-side, by adding a .keep parameter if we're not already able to forward to group_modify() via dots. But I'm not sure we actually want to... this makes it easier to use epi_slide() for forecasting when it shouldn't actually be (epix_slide() should be favored and maybe renamed to make this clear).
[@dshemetov points out we should document this geo-grouped epi_slide gotcha in epiprocess. And actually fixing what's going wrong is part of a much larger project, epiprocess#223.]