Review and consider applying `?dplyr_extending` for `epi_df` #223

brookslogan · 2022-08-31T16:10:46Z

I believe that we implemented epi_df without the use of ?dplyr_extending. We may have:

Missing S3 implementations
Unnecessary S3 implementations
Noncompliant behavior

We should review ?dplyr_extending to see if it hints at bugs, enhancements, code simplifications, etc. (E.g., the mention of [ might have hinted that we would want a [ implementation before we discovered it on our own.)

The text was updated successfully, but these errors were encountered:

brookslogan · 2022-10-29T22:06:24Z

Might have ran into one of these cases in #64 where grouped epi_dfs being bound together lost their epi_dfness. Implementing a dplyr_reconstruct.epi_df based on [.epi_df seems to partially resolve. But note this exacerbates #242 (which underlies #213).

May also related to #117, cmu-delphi/epipredict#51.

* Fix failing grouped `epix_slide` test, where `group_modify` dropped the `epi_df` class when binding results together, by implementing `dplyr_reconstruct.epi_df`. * Implement the rest of `?dplyr_extending` and remove some now-unnecessary S3 methods for dplyr verbs, addressing #195, #223, and changing `epi[x]_slide` to leave grouping intact. * Update tests for grouped slides to reflect new behavior. * Update NEWS.md.

* Half-fix failing grouped `epix_slide` test, where `group_modify` dropped the `epi_df` class when binding results together, by implementing `dplyr_reconstruct.epi_df`. Somehow a session documenting and testing immediately before this commit and then after it will have the tests pass, but the tests won't pass off of this branch directly. The class vector involved in the failure does appear to change. * Explicate `epix_slide` `count` derivation. * Implement the rest of `?dplyr_extending` and remove some now-unnecessary S3 methods for dplyr verbs, addressing #195, #223, and changing `epi[x]_slide` to leave grouping intact. * Update tests for grouped slides to reflect new behavior. * Update NEWS.md.

…modify` * Implement `?dplyr_extending` and remove some now-unnecessary S3 methods for dplyr verbs, addressing #195, #223, and failing `epix_slide` test. * Don't ungroup `epix_slide` result. Update corresponding test. * Update NEWS.md. * Explicate `epix_slide` `count` derivation in comments and variable names. * Fix some desynced duplicated code in `epix_slide` and use `group_modify` again instead of `summarize` in order to keep slide computation input available as an `epi_df`.

brookslogan · 2022-11-16T01:17:47Z

A potential issue with ?dplyr_extending guidance: arrange on grouped epi_dfs drops epi_dfness. We need to implement dplyr_row_slice as well. (We also need to implement other methods interacting with grouping: group_by, ungroup, group_modify [or maybe not; an appropriate group_data + any other required extensions of grouped_df methods (see below) might get this "for free"/"properly"]; as well as unnest; we may be missing some others.)

brookslogan · 2023-04-13T22:30:52Z

There are a few other issues with grouped epi dfs in the current dev / lcb/make_epix_slide_more_like_reframe branches; see #113, #298. Note:

Part of this may be due to missing a group_data method, alluded to in ?dplyr_extending. We'd want either/both (a) "epi_df" before "grouped_df" in the class vector to be able to dispatch to group_data.epi_df instead of group_data.grouped_df, or (b) to mess with the class of the "groups" attr generated by group_by (doesn't sound great.)
Note in comments of Make print.epi_df header mirror style of print.tsibble and print.tbl_df #113 that we want "grouped_df" before "epi_df" due to some pillar methods for grouped_df expecting it to be before more base-like classes in the class vector.

1(a) and 2 (and maybe more points) seem to conflict. One potential resolution:

Make grouped epi_dfs use the class c("grouped_edf", "grouped_df", "epi_df", "tbl_df", "tbl", "data.frame"). That's also suggested in this older thread.

brookslogan · 2023-04-24T02:22:54Z

While applying the "potential resolution" above seems to help in some aspects, it also loses epi_df-ness in a lot of other situations. The problem appears to be that grouped_df's methods don't really respect dplyr_extending for grouped_* classes in front of it. So we'd need to look at methods(,"grouped_df") and/or methods(,"grouped_{somethingelse}") and make sure to override/mirror the things we want. [2024-09-24: perhaps this wasn't/isn't the case; it seems you can get these ops to work as long as you make sure not to lose your head class after any NextMethods in dplyr_extending impls, including e.g. names<-, and this may be sufficient. Also, looking at grouped_df S3 methods might not have even worked, as there might be some non-S3 functions manually branching on grouped_df-ness.] Related: #145. [We'll need to look in some packages beyond dplyr for these, e.g., for vctrs::restore.]

[Same approach would need to be applied for rowwise edfs. Plus, in any approach, we will want to review tidyr for more methods that may need wrapped like, e.g., unnest.]

brookslogan · 2023-04-24T03:49:42Z

More recent reference: tidyverse/dplyr#5480

Also, I'm not sure if the approaches with a separate grouped_{something} class approach were all limited to tidyverts and not tidyverse. It seems like tidyverse things like dtplyr and sf might use either (a) manual delegation rather than via NextMethod (and have custom print with as_tibble, format, [-1] to remove header line(s)) or (b) the single-class-in-front-of-grouped_df approach with some additional methods implemented (and just has the same "# A tibble: [...]" line sticking around).

brookslogan · 2023-04-27T03:21:32Z

Progress is in lcb/round-out-grouped-epi_dfs. However, it already seems to have some problems with summarize drop_last seeing time_value as an additional grouping variable when we are only grouping by geo_value, and thus grouping results by geo_value by default when outputting 1 row per computation [within epi_cor. Seems like we can't just use reclass or dplyr_reconstruct everywhere without thinking, and/or there's an issue with the current dplyr_reconstruct.grouped_edf; dplyr_reconstruct.grouped_df tries to take grouping from template (intersecting with available columns in data)]. Also needs further, more focused, tests that it actually fixes what we wanted to fix. Some other existing test(s) are broken from not dropping data.table attributes that are now being kept around by group_modify result assembly in epix_slide (and kept inside the computations by as_tibble).

brookslogan · 2023-04-27T17:38:01Z

The questions raised by how to reconstruct a grouped summarize might tie into questions about when epix_slide should output an epi_df, when geo_value might either come from "outside", when the the slide is grouped by geo_value, or the "inside", when the slide isn't grouped but the user computations output epi_dfs (with geo_values). A useful concept might be an epi_df_selection/epi_df_slice/sub_epi_df or an epi_df_group, which would be a special class we'd use, e.g., as the first or both args in group_modify computations, but only decaying if essential columns don't appear in either arg, rather than decaying when they're gone from the first arg. First, this special class would solve us having to use .keep = TRUE to keep a relevant class passed in the first arg of group_modify, and second, it might help in other summarization situations to avoid having to attempt reconstructing things against a template that might not be very similar (since summarization can alter so much). [Alternatively, there could be an epi_df_groups class output from group_data and hopefully retained until it's time to unnest, etc.]

brookslogan · 2023-05-04T17:11:14Z

Another point where things seem unclear: should we drop metadata in an as_tibble.epi_df method? This seems like it would correspond to grouped_df, rowwise_df, and other classes' behavior of dropping groupings (though the former is viewed in vctrs as a more general concept than ordinary tibbles, while epi_dfs are a narrower concept). And we have the metadata dropped when as_tibble-ing grouped epi_dfs currently anyway. I'll likely implement metadata-dropping in #311.

brookslogan · 2024-07-30T20:29:34Z

Having an epi_df_slice/epi_df_selection/sub_epi_df / epi_df_group / epi_df_view class would also allow us to disable accidental use of mean/sum on .x in slide computations when .x$value was meant instead. By simply providing an S3 method that complains. Currently we lose epi-dfness when doing slides grouped by geos, so doing the same for epi_dfs isn't going to catch much. [Though alternatively, we may be able to use output column de-duplication on unnesting (rather than errors) + do keep = TRUE for epi_slides as we do epix_slides. Then epi_df impls should catch it.]

brookslogan · 2024-09-24T09:16:37Z

Some notes:

Our current application of ?dplyr_extending is awkward, routing many/all core ops through a dplyr_reconstruct with all validation checks. Experimenting with a "real" application in brookslogan/dplyr.extending.test
We avoid some issues by using .keep = TRUE in group_map ops both epix_slide and epi_slide now/soon. But if a user directly uses group_modify with the default .keep = FALSE, then they could run into annoyances, so we should at least maintain a group_modify method to fix up there. But there's less need for a slice/group/view class given these adjustments.

brookslogan · 2024-09-24T19:44:01Z

Requiring key uniqueness makes geo/time aggregation awkward; it'd be nice if we could edf %>% group_by(geo_value = state_of(geo_value), time_value = week_of(time_value)) %>% summarize(value = sum(value)) and get an epi_df out

brookslogan added the P2 low priority label Aug 31, 2022

brookslogan mentioned this issue Sep 1, 2022

Create a rename() function that is more specialized for an epi_df #195

Open

brookslogan added the cleanup improvements to developing&building experience&quality, not directly to built pkg&docs label Sep 20, 2022

brookslogan self-assigned this Nov 1, 2022

brancengregory mentioned this issue Apr 17, 2023

🤖 Towards an ojo_tbl class openjusticeok/ojodb#110

Closed

brookslogan mentioned this issue Apr 21, 2023

Probably don't want to drop the grouping in [.epi_df when as_tibble() is used #145

Open

This was referenced Apr 27, 2023

unnest.epi_df doesn't update other_keys #306

Open

Consider implementing appropriate vctrs "methods" #309

Open

Round out dplyr&tidyr functionality for grouped epi_dfs #310

Draft

brookslogan mentioned this issue May 5, 2023

Add a complete method for epi_dfs #250

Closed

brookslogan mentioned this issue Jun 28, 2023

Gap filling and other S3 issues #138

Closed

brookslogan mentioned this issue Dec 13, 2023

select causes type conversion #388

Closed

brookslogan mentioned this issue Jan 13, 2024

Patches over dplyr_extending for grouped epi_dfs don't account for column renaming #395

Open

nmdefries added this to the Epiprocess Issue Triage milestone Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review and consider applying `?dplyr_extending` for `epi_df` #223

Review and consider applying `?dplyr_extending` for `epi_df` #223

brookslogan commented Aug 31, 2022

brookslogan commented Oct 29, 2022 •

edited

Loading

brookslogan commented Nov 16, 2022 •

edited

Loading

brookslogan commented Apr 13, 2023

brookslogan commented Apr 24, 2023 •

edited

Loading

brookslogan commented Apr 24, 2023 •

edited

Loading

brookslogan commented Apr 27, 2023 •

edited

Loading

brookslogan commented Apr 27, 2023 •

edited

Loading

brookslogan commented May 4, 2023

brookslogan commented Jul 30, 2024 •

edited

Loading

brookslogan commented Sep 24, 2024

brookslogan commented Sep 24, 2024

Review and consider applying ?dplyr_extending for epi_df #223

Review and consider applying ?dplyr_extending for epi_df #223

Comments

brookslogan commented Aug 31, 2022

brookslogan commented Oct 29, 2022 • edited Loading

brookslogan commented Nov 16, 2022 • edited Loading

brookslogan commented Apr 13, 2023

brookslogan commented Apr 24, 2023 • edited Loading

brookslogan commented Apr 24, 2023 • edited Loading

brookslogan commented Apr 27, 2023 • edited Loading

brookslogan commented Apr 27, 2023 • edited Loading

brookslogan commented May 4, 2023

brookslogan commented Jul 30, 2024 • edited Loading

brookslogan commented Sep 24, 2024

brookslogan commented Sep 24, 2024

Review and consider applying `?dplyr_extending` for `epi_df` #223

Review and consider applying `?dplyr_extending` for `epi_df` #223

brookslogan commented Oct 29, 2022 •

edited

Loading

brookslogan commented Nov 16, 2022 •

edited

Loading

brookslogan commented Apr 24, 2023 •

edited

Loading

brookslogan commented Apr 24, 2023 •

edited

Loading

brookslogan commented Apr 27, 2023 •

edited

Loading

brookslogan commented Apr 27, 2023 •

edited

Loading

brookslogan commented Jul 30, 2024 •

edited

Loading