Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip feat: replace R6 epi_archive with S3 implementation #431

Merged
merged 18 commits into from
May 3, 2024
Merged

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Mar 20, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

A/B tests TODO (A = epiprocess dev, B = this branch):

  • make sure vignettes are identical
  • make sure the output from a subset of the exploration-tooling forecasters is identical

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor todos & larger suggestions from a partial first pass; thought I'd just post them now since some probably will be slightly annoying to implement & finishing reviewing's going to take a while.

For future review passes:

  • I need to learn the new_ etc. pattern --- I think Daniel opened an issue about us using this for epi_dfs & Hadley has a chapter about it & related functions.
  • Check that we're achieving the intended non-mutation interface.
  • Finish looking over files; rest of archive.R, all of grouped_epi_archive.R, ...
  • Check archive_cases_dv_subset.R [and vignettes].
  • [Assess vignette snaps disk size, testing completeness.]

tests/testthat/test-epix_fill_through_version.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
@dshemetov
Copy link
Contributor Author

Force pushed some changes, just a heads up @brookslogan

@dshemetov dshemetov mentioned this pull request Apr 18, 2024
@dshemetov dshemetov force-pushed the ds/r6-clean branch 2 times, most recently from 5ff4e03 to a67a324 Compare April 19, 2024 18:09
@dshemetov dshemetov marked this pull request as ready for review April 19, 2024 18:10
R/archive.R Outdated
other_keys = NULL,
additional_metadata = NULL,
compactify = NULL,
clobberable_versions_start = NA,
Copy link
Contributor

@brookslogan brookslogan Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: either (A) eliminate new_epi_archive, keep it as as_epi_archive, or (B) introduce a new_ and validate_ that are more minimal, and have as_epi_archive be the "helper" as described here [and use new_epi_archive internally where we can] [and use the same approach to default handling in new_ and as_]. [epi_archive() as the helper seems like it might be confusing; people are used to data.frame(), tibble(), list(), etc., which allow you to construct things in a different way.]

suggest: either (a) make more defaults non-NULL, or (b) make clobberable_versions_start default also NULL (& replace with NA) if it's possible. I'm not sure about (a) vs. (b). (a) gives more type info, but also could hide other arg names in limited-width autocomplete windows. With (a) approach, there can sometimes also be some issues with confusing defaults --- imagine we had geo_type = guess_geo_type(x) default and rebound x before geo_type was evaluated --- but it doesn't look like we rebind before assigning any of the defaults, so I don't think we're in such a situation .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I added a validate_epi_archive and moved a bulk of the validation from new_epi_archive in there. Now as_epi_archive calls validate and then new. new_epi_archive is only used there, so this change should be fine for now, but in the future we can refactor some safe internal calls to as_epi_archive with new_epi_archive. There is still a bunch of validation and construction of the data.table object in new_epi_archive, so that's another thing that could be fixed.

I also made clobberable_versions_start default to NULL and then I set it to NA in that case. Just went with the consistent option for now, though we can easily switch it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #445

@brookslogan
Copy link
Contributor

brookslogan commented Apr 29, 2024

Core refactor changes look pretty good! Please:

  • todo: Review & if good, merge changes from Edit R6 refactor #443. Some things of note:
    • grouped_epi_archives aren't epi_archives (e.g., we don't want to try to x$DT on a grouped_epi_archive since that will be NULL. And we don't want to try to mirror the epi_archive structure with DT at the top level because that's asking for bugs --- (which I have encountered/made due to grouped_dfs also being tibbles; I don't think we want to mirror this).)
    • clone() wasn't needed in a lot of places, and we want to avoid deep copies when we can [I think I oversimplified this in commit msg as that we don't need to worry about (idiomatic or even fairly rare) shallow mutation of S3 list, but it is that we don't have to worry about aliased mutation; R's smart CoWish stuff]
  • Look at above/below github code comments.
  • Perform vignette snapshot tests
  • then purge the vignette snapshots from history?
    • purged
    • unnecessary
  • Address failing checks? At least one is in the growth rate vignette... not sure this would be affecting it... [Oh, it's from %||% refactor --- typo in handling k.]

@brookslogan brookslogan self-requested a review April 29, 2024 18:17
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see separate checklist.

#' operator. Currently, the only situation where there is potentially aliasing
#' is of the `DT` in edge cases with `all_versions = TRUE`, but this may change
#' in the future.
#'
#' @examples
#' # warning message of data latency shown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: while trying to understand what this old comment was, I ran into this possible mistake:
archive_cases_dv_subset$version --- which partial-matches (potentially silently by default?) to $versions_end. I'm not sure if there's a class we can swap in to prevent this (vctrs::list_of requires exact names, but also is designed for homogeneous lists with too complex of an entry type for an atomic vector). Since we expect to encapsulate DT sometime, maybe this will become less of an issue (users should expect less to be able to get version very easily), but would probably be nice if there's an easy pre-baked solution. Guess we might be able to just implement $ for the epi_archive or roll our own intermediate class...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm updating some docs in this region in a side branch.

note: also finding that x$clobberable_versions_start <- <value> doesn't validate, which is probably suboptimal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, interesting observations. Partial matching and S3 classes not having private attributes seem like a very leaky things to try to guard. I'm leaning towards just telling users that it is unsafe to directly modify epi_archives, outside epiprocess functions. We can then add safe modify functions as we get feature requests in individual instances.

Copy link
Contributor

@brookslogan brookslogan Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Though I might sort of soft-request the [forbiddence of] partial matching thing in $ and at least name validation in $<- and [[<- also to catch errors we make in development. Not part of this PR though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> #449

@@ -171,8 +256,6 @@ epix_fill_through_version <- function(x, fill_versions_end,
#' as_epi_archive(compactify = TRUE)
#' # merge results stored in a third object:
#' xy <- epix_merge(x, y)
#' # vs. mutating x to hold the merge result:
#' x$merge(y)
#'
#' @importFrom data.table key set setkeyv
#' @export
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: would we want this as an S3 generic as well, or as an implementation of the merge generic? Then people using methods(, "epi_archive") might get better results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I've never used methods(, "<class>") to explore documentation. Let's just see how epix_* works out for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Side note: methods(, "<class>") or sloop::{s3,s4}_methods_class() (better than the former when s4 is involved in ways that confuse methods() or its users) is useful for exploring undocumented stuff --- often S3 implementations won't actually be exported, documented, etc. Of course it's not comprehensive if there are regular functions that can also operate on the class. I guess that was one benefit of R6, potentially having a sort of full list of functionality at your fingertips.

R/growth_rate.R Outdated Show resolved Hide resolved
@dshemetov
Copy link
Contributor Author

dshemetov commented Apr 30, 2024

I manually checked the diffs in the vignettes and found that there were no unexpected changes. I've purged the vignette snapshots from this branch by rebasing and force pushing, FYI @brookslogan


Here is the code used to do vignette snapshots, in case we need it later:

# test-snapshots.R
vignettes <- paste0(here::here("vignettes/"), c(
  "advanced.Rmd",
  "aggregation.Rmd",
  "archive.Rmd",
  "epiprocess.Rmd",
  "slide.Rmd"
))
for (input_file in vignettes) {
  test_that(paste0("snapshot vignette ", basename(input_file)), {
    # skip("Skipping snapshot tests by default, as they are slow.")
    output_file <- sub("\\.Rmd$", ".html", input_file)
    withr::with_file(output_file, {
      devtools::build_rmd(input_file)
      expect_snapshot_file(output_file)
    })
  })
}

Instructions:

  • add this file to your tests
  • run devtools::test(filter="snapshots") on the dev branch
  • run devtools::test(filter="snapshots") on the feature branch
  • either both match or run testthat::snapshot_review("snapshots/") to view diffs

dshemetov and others added 3 commits April 29, 2024 17:50
* remove comment #417
* bump version to 0.7.6 and add NEWS line
- Forbid `NA` `compactify`
- Remove `missing` checks when `is.null` suffices
- Remove redundant default code
- Make local `other_keys` have more consistent typing across branches
- Validate length.
- Tweak message regarding type since typeof is length 1.
- Actually raise error if NA when NA not allowed.
- Make tests check the source of the error, since not being specific + R
  configuration masked some of these issues.
- S3 class vectors are ordered, so use `identical`
  - Improve class vector formatting
  - Tweak other `class` and `typeof` message text
- Improve duplicate colnames message
  - Improve vector interpolation formatting
- Fix typo in GCD error messaging
dshemetov and others added 10 commits April 29, 2024 21:22
Print to stdout and without using messages for all the output.  Prevents Rmds
from splitting print output into multiple chunks. Allows `capture.output` by
default to capture all expected output, and the same for logging utilities
expecting regular output to come from stdout.
This applied for a different default `clobberable_versions_start`.
- Update `epix_as_of` docs further based on `clobberable_versions_start` now
  defaulting to `NA`.
- Don't include `max_version =` in example `epix_as_of` calls as it seems
  atypical and a strange name if extracting a snapshot rather than an archive.
We don't want to try to use an `epi_archive` method implementation on a
`grouped_epi_archive`, or have `is_epi_archive` succeed on them even with
`grouped_okay = FALSE`, to prevent attempted extraction of nonexistent fields.
- Use new `%>% clone()` when we want a deep copy
- Use aliasing instead of shallow copies, since with S3 lists we should not have
  the threat of mutation of the shallow list structure
* remove is_epi_archive and delete in epix_slide
* simplify group_by_drop_default
* prune library calls in tests
* remove here and waldo from Suggests
* pull most validation work from new_epi_archive
  into validate_epi_archive
* call validate_epi_archive in as_epi_archive
@dshemetov
Copy link
Contributor Author

I think this PR is ready. @nmdefries (thank you!) reproduced my run of exploration-tooling with this branch and found that the forecaster outputs did not change.

I found a few things that need to change in https://github.com/cmu-delphi/delphi-tooling-book, so I'm working on a PR there. We can probably merge this and I can have that one ready tomorrow.

@brookslogan brookslogan merged commit eec777f into dev May 3, 2024
5 checks passed
@brookslogan brookslogan deleted the ds/r6-clean branch May 3, 2024 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove R6 interface for epi_archives
2 participants