wip feat: replace R6 `epi_archive` with S3 implementation #431

dshemetov · 2024-03-20T01:13:36Z

Checklist

Please:

Make sure this PR is against "dev", not "main" (unless this is a release
PR).
Request a review from one of the current main reviewers:
brookslogan, nmdefries.
Makes sure to bump the version number in DESCRIPTION. Always increment
the patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
See DEVELOPMENT.md for more information on the development
process.

Change explanations for reviewer

Attention conservation notice: probably don't review this yet, still waiting downstream A/B tests
Prior comparison work in feat+wip: refactor epi_archive to use S3 #430
The diffs should be easier to read here

A/B tests TODO (A = epiprocess dev, B = this branch):

make sure vignettes are identical
make sure the output from a subset of the exploration-tooling forecasters is identical

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Resolves Remove R6 interface for epi_archives #340

brookslogan

Some minor todos & larger suggestions from a partial first pass; thought I'd just post them now since some probably will be slightly annoying to implement & finishing reviewing's going to take a while.

For future review passes:

I need to learn the new_ etc. pattern --- I think Daniel opened an issue about us using this for epi_dfs & Hadley has a chapter about it & related functions.
Check that we're achieving the intended non-mutation interface.
Finish looking over files; rest of archive.R, all of grouped_epi_archive.R, ...
Check archive_cases_dv_subset.R [and vignettes].
[Assess vignette snaps disk size, testing completeness.]

tests/testthat/test-epix_fill_through_version.R

R/archive.R

dshemetov · 2024-04-18T18:39:05Z

Force pushed some changes, just a heads up @brookslogan

brookslogan · 2024-04-29T15:54:53Z

R/archive.R

+    other_keys = NULL,
+    additional_metadata = NULL,
+    compactify = NULL,
+    clobberable_versions_start = NA,


suggest: either (A) eliminate new_epi_archive, keep it as as_epi_archive, or (B) introduce a new_ and validate_ that are more minimal, and have as_epi_archive be the "helper" as described here [and use new_epi_archive internally where we can] [and use the same approach to default handling in new_ and as_]. [epi_archive() as the helper seems like it might be confusing; people are used to data.frame(), tibble(), list(), etc., which allow you to construct things in a different way.]

suggest: either (a) make more defaults non-NULL, or (b) make clobberable_versions_start default also NULL (& replace with NA) if it's possible. I'm not sure about (a) vs. (b). (a) gives more type info, but also could hide other arg names in limited-width autocomplete windows. With (a) approach, there can sometimes also be some issues with confusing defaults --- imagine we had geo_type = guess_geo_type(x) default and rebound x before geo_type was evaluated --- but it doesn't look like we rebind before assigning any of the defaults, so I don't think we're in such a situation .

Alright, I added a validate_epi_archive and moved a bulk of the validation from new_epi_archive in there. Now as_epi_archive calls validate and then new. new_epi_archive is only used there, so this change should be fine for now, but in the future we can refactor some safe internal calls to as_epi_archive with new_epi_archive. There is still a bunch of validation and construction of the data.table object in new_epi_archive, so that's another thing that could be fixed.

I also made clobberable_versions_start default to NULL and then I set it to NA in that case. Just went with the consistent option for now, though we can easily switch it later.

brookslogan · 2024-04-29T18:17:17Z

Core refactor changes look pretty good! Please:

todo: Review & if good, merge changes from Edit R6 refactor #443. Some things of note:
- grouped_epi_archives aren't epi_archives (e.g., we don't want to try to x$DT on a grouped_epi_archive since that will be NULL. And we don't want to try to mirror the epi_archive structure with DT at the top level because that's asking for bugs --- (which I have encountered/made due to grouped_dfs also being tibbles; I don't think we want to mirror this).)
- clone() wasn't needed in a lot of places, and we want to avoid deep copies when we can [I think I oversimplified this in commit msg as that we don't need to worry about (idiomatic or even fairly rare) shallow mutation of S3 list, but it is that we don't have to worry about aliased mutation; R's smart CoWish stuff]
Look at above/below github code comments.
Perform vignette snapshot tests
then purge the vignette snapshots from history?
- purged
- unnecessary
Address failing checks? At least one is in the growth rate vignette... not sure this would be affecting it... [Oh, it's from %||% refactor --- typo in handling k.]

brookslogan

Please see separate checklist.

brookslogan · 2024-04-29T16:14:23Z

R/methods-epi_archive.R

-#' operator. Currently, the only situation where there is potentially aliasing
-#' is of the `DT` in edge cases with `all_versions = TRUE`, but this may change
-#' in the future.
-#'
 #' @examples
 #' # warning message of data latency shown


note: while trying to understand what this old comment was, I ran into this possible mistake:
archive_cases_dv_subset$version --- which partial-matches (potentially silently by default?) to $versions_end. I'm not sure if there's a class we can swap in to prevent this (vctrs::list_of requires exact names, but also is designed for homogeneous lists with too complex of an entry type for an atomic vector). Since we expect to encapsulate DT sometime, maybe this will become less of an issue (users should expect less to be able to get version very easily), but would probably be nice if there's an easy pre-baked solution. Guess we might be able to just implement $ for the epi_archive or roll our own intermediate class...

I'm updating some docs in this region in a side branch.

note: also finding that x$clobberable_versions_start <- <value> doesn't validate, which is probably suboptimal

Hm, interesting observations. Partial matching and S3 classes not having private attributes seem like a very leaky things to try to guard. I'm leaning towards just telling users that it is unsafe to directly modify epi_archives, outside epiprocess functions. We can then add safe modify functions as we get feature requests in individual instances.

That makes sense. Though I might sort of soft-request the [forbiddence of] partial matching thing in $ and at least name validation in $<- and [[<- also to catch errors we make in development. Not part of this PR though.

R/methods-epi_archive.R

brookslogan · 2024-04-29T18:38:51Z

R/methods-epi_archive.R

@@ -171,8 +256,6 @@ epix_fill_through_version <- function(x, fill_versions_end,
 #'   as_epi_archive(compactify = TRUE)
 #' # merge results stored in a third object:
 #' xy <- epix_merge(x, y)
-#' # vs. mutating x to hold the merge result:
-#' x$merge(y)
 #'
 #' @importFrom data.table key set setkeyv
 #' @export


question: would we want this as an S3 generic as well, or as an implementation of the merge generic? Then people using methods(, "epi_archive") might get better results.

Not sure. I've never used methods(, "<class>") to explore documentation. Let's just see how epix_* works out for now?

Sure. Side note: methods(, "<class>") or sloop::{s3,s4}_methods_class() (better than the former when s4 is involved in ways that confuse methods() or its users) is useful for exploring undocumented stuff --- often S3 implementations won't actually be exported, documented, etc. Of course it's not comprehensive if there are regular functions that can also operate on the class. I guess that was one benefit of R6, potentially having a sort of full list of functionality at your fingertips.

R/growth_rate.R

dshemetov · 2024-04-30T00:47:43Z

I manually checked the diffs in the vignettes and found that there were no unexpected changes. I've purged the vignette snapshots from this branch by rebasing and force pushing, FYI @brookslogan

Here is the code used to do vignette snapshots, in case we need it later:

# test-snapshots.R
vignettes <- paste0(here::here("vignettes/"), c(
  "advanced.Rmd",
  "aggregation.Rmd",
  "archive.Rmd",
  "epiprocess.Rmd",
  "slide.Rmd"
))
for (input_file in vignettes) {
  test_that(paste0("snapshot vignette ", basename(input_file)), {
    # skip("Skipping snapshot tests by default, as they are slow.")
    output_file <- sub("\\.Rmd$", ".html", input_file)
    withr::with_file(output_file, {
      devtools::build_rmd(input_file)
      expect_snapshot_file(output_file)
    })
  })
}

Instructions:

add this file to your tests
run devtools::test(filter="snapshots") on the dev branch
run devtools::test(filter="snapshots") on the feature branch
either both match or run testthat::snapshot_review("snapshots/") to view diffs

* remove comment #417 * bump version to 0.7.6 and add NEWS line

- Forbid `NA` `compactify` - Remove `missing` checks when `is.null` suffices - Remove redundant default code - Make local `other_keys` have more consistent typing across branches

- Validate length. - Tweak message regarding type since typeof is length 1. - Actually raise error if NA when NA not allowed. - Make tests check the source of the error, since not being specific + R configuration masked some of these issues.

See https://rlang.r-lib.org/reference/topic-condition-formatting.html#transitioning-from-abort-to-cli-abort-

- S3 class vectors are ordered, so use `identical` - Improve class vector formatting - Tweak other `class` and `typeof` message text - Improve duplicate colnames message - Improve vector interpolation formatting - Fix typo in GCD error messaging

Print to stdout and without using messages for all the output. Prevents Rmds from splitting print output into multiple chunks. Allows `capture.output` by default to capture all expected output, and the same for logging utilities expecting regular output to come from stdout.

This applied for a different default `clobberable_versions_start`.

- Update `epix_as_of` docs further based on `clobberable_versions_start` now defaulting to `NA`. - Don't include `max_version =` in example `epix_as_of` calls as it seems atypical and a strange name if extracting a snapshot rather than an archive.

We don't want to try to use an `epi_archive` method implementation on a `grouped_epi_archive`, or have `is_epi_archive` succeed on them even with `grouped_okay = FALSE`, to prevent attempted extraction of nonexistent fields.

- Use new `%>% clone()` when we want a deep copy - Use aliasing instead of shallow copies, since with S3 lists we should not have the threat of mutation of the shallow list structure

* remove is_epi_archive and delete in epix_slide * simplify group_by_drop_default * prune library calls in tests * remove here and waldo from Suggests * pull most validation work from new_epi_archive into validate_epi_archive * call validate_epi_archive in as_epi_archive

dshemetov · 2024-05-01T02:32:10Z

I think this PR is ready. @nmdefries (thank you!) reproduced my run of exploration-tooling with this branch and found that the forecaster outputs did not change.

I found a few things that need to change in https://github.com/cmu-delphi/delphi-tooling-book, so I'm working on a PR there. We can probably merge this and I can have that one ready tomorrow.

dshemetov force-pushed the ds/r6-clean branch 5 times, most recently from 6f1891d to d479882 Compare March 21, 2024 23:04

dshemetov mentioned this pull request Mar 25, 2024

testing: ab testing branch for epiprocess r6 refactor cmu-delphi/exploration-tooling#112

Merged

nmdefries mentioned this pull request Apr 4, 2024

Move some documentation to templates for easier updating #434

Merged

4 tasks

brookslogan requested changes Apr 12, 2024

View reviewed changes

brookslogan reviewed Apr 14, 2024

View reviewed changes

R/archive.R Outdated Show resolved Hide resolved

dshemetov force-pushed the ds/r6-clean branch from e09fd7f to 586d5fa Compare April 18, 2024 18:38

dshemetov force-pushed the ds/r6-clean branch from 586d5fa to 179a8fb Compare April 18, 2024 18:40

dshemetov mentioned this pull request Apr 18, 2024

Namespace cleanup #438

Open

dshemetov requested a review from brookslogan April 18, 2024 22:45

dshemetov force-pushed the ds/r6-clean branch 2 times, most recently from 5ff4e03 to a67a324 Compare April 19, 2024 18:09

dshemetov marked this pull request as ready for review April 19, 2024 18:10

brookslogan reviewed Apr 29, 2024

View reviewed changes

brookslogan self-requested a review April 29, 2024 18:17

brookslogan approved these changes Apr 29, 2024

View reviewed changes

brookslogan reviewed Apr 29, 2024

View reviewed changes

R/methods-epi_archive.R Outdated Show resolved Hide resolved

brookslogan reviewed Apr 29, 2024

View reviewed changes

R/growth_rate.R Outdated Show resolved Hide resolved

dshemetov mentioned this pull request Apr 30, 2024

[DOC] Centralize and clarify documentation on mutation, compactification, and other footguns #444

Open

dshemetov force-pushed the ds/r6-clean branch from 07aabbd to 8edd5a5 Compare April 30, 2024 00:45

dshemetov and others added 3 commits April 29, 2024 17:50

wip+doc: add S3 implementation of epi_archive

01f2262

* remove comment #417 * bump version to 0.7.6 and add NEWS line

feat: replace epi_archive with S3 implementation

74aa831

new_archive validation tightening, streamlining, type stability

183d0f1

- Forbid `NA` `compactify` - Remove `missing` checks when `is.null` suffices - Remove redundant default code - Make local `other_keys` have more consistent typing across branches

brookslogan added 4 commits April 29, 2024 17:52

Fix version_bound_arg validation issues

7bc4735

- Validate length. - Tweak message regarding type since typeof is length 1. - Actually raise error if NA when NA not allowed. - Make tests check the source of the error, since not being specific + R configuration masked some of these issues.

Don't use cli_ with dynamic format strings

5392641

See https://rlang.r-lib.org/reference/topic-condition-formatting.html#transitioning-from-abort-to-cli-abort-

Note reassignment in R6 migration for mutating functions, + details

9491797

dshemetov force-pushed the ds/r6-clean branch from 8edd5a5 to 09ad0dc Compare April 30, 2024 00:52

dshemetov and others added 10 commits April 29, 2024 21:22

lint: use rlang %||% idiom

cd7f83c

Use an actual existence-checking [[ instead of pluck

9641bde

Improve print.epi_archive on empty archives

7bf29d8

Eliminate single-use, unneeded local var

6f37e3e

Remove outdated doc comment

861cdd3

This applied for a different default `clobberable_versions_start`.

fix: grouped_epi_archives are not epi_archives

1cead30

We don't want to try to use an `epi_archive` method implementation on a `grouped_epi_archive`, or have `is_epi_archive` succeed on them even with `grouped_okay = FALSE`, to prevent attempted extraction of nonexistent fields.

Clean up clone() usage

38c3322

- Use new `%>% clone()` when we want a deep copy - Use aliasing instead of shallow copies, since with S3 lists we should not have the threat of mutation of the shallow list structure

Remove remaining reference to R6 method

5ea168e

dshemetov force-pushed the ds/r6-clean branch 2 times, most recently from f45f30e to d5c89b7 Compare April 30, 2024 04:28

dshemetov mentioned this pull request May 1, 2024

[PERF] Make new_epi_archive more unsafe and more efficient and use it in safe places #445

Open

dshemetov force-pushed the ds/r6-clean branch from d5c89b7 to e61e11a Compare May 1, 2024 02:07

dshemetov mentioned this pull request May 1, 2024

feat: update for epiprocess R6 refactor cmu-delphi/delphi-tooling-book#15

Merged

brookslogan merged commit eec777f into dev May 3, 2024
5 checks passed

brookslogan deleted the ds/r6-clean branch May 3, 2024 23:55

dshemetov mentioned this pull request Jun 22, 2024

Djm/remove fabletools #315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip feat: replace R6 `epi_archive` with S3 implementation #431

wip feat: replace R6 `epi_archive` with S3 implementation #431

dshemetov commented Mar 20, 2024 •

edited

Loading

brookslogan left a comment •

edited

Loading

dshemetov commented Apr 18, 2024

brookslogan Apr 29, 2024 •

edited

Loading

dshemetov May 1, 2024

dshemetov May 1, 2024

brookslogan commented Apr 29, 2024 •

edited by dshemetov

Loading

brookslogan left a comment

brookslogan Apr 29, 2024

brookslogan Apr 29, 2024

dshemetov Apr 29, 2024

brookslogan Apr 30, 2024 •

edited

Loading

brookslogan May 3, 2024

brookslogan Apr 29, 2024

dshemetov Apr 30, 2024

brookslogan Apr 30, 2024

dshemetov commented Apr 30, 2024 •

edited

Loading

dshemetov commented May 1, 2024

wip feat: replace R6 epi_archive with S3 implementation #431

wip feat: replace R6 epi_archive with S3 implementation #431

Conversation

dshemetov commented Mar 20, 2024 • edited Loading

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

brookslogan left a comment • edited Loading

Choose a reason for hiding this comment

dshemetov commented Apr 18, 2024

brookslogan Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brookslogan commented Apr 29, 2024 • edited by dshemetov Loading

brookslogan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brookslogan Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dshemetov commented Apr 30, 2024 • edited Loading

dshemetov commented May 1, 2024

wip feat: replace R6 `epi_archive` with S3 implementation #431

wip feat: replace R6 `epi_archive` with S3 implementation #431

dshemetov commented Mar 20, 2024 •

edited

Loading

brookslogan left a comment •

edited

Loading

brookslogan Apr 29, 2024 •

edited

Loading

brookslogan commented Apr 29, 2024 •

edited by dshemetov

Loading

brookslogan Apr 30, 2024 •

edited

Loading

dshemetov commented Apr 30, 2024 •

edited

Loading