Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(epi_slide, epix_slide): time types refactor #472

Merged
merged 2 commits into from
Jul 20, 2024

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Jun 21, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

  • guess time_type of "day", "week", "yearmonth", "integer" from time column and warn if not compatible at epi_df and epi_archive construction time
  • restrict before and after to types compatible with time column
  • deprecate geo_type and time_type in epi_df and epi_archive constructor arguments, infer only
  • improve documentation on geo and time types
  • enforce time_value and version being same type in epi_archive
  • update vignettes
  • move arg validation from new_epi_df to as_epi_df.tbl_df to match epi_archive
  • allow before = Inf in epi_slide and epix_slide and make it default in epix_slide
  • remove time_step argument in slides

New time type behaviors

  • we discover "day" and "weekly" types if the time value column is Date and based on the cadence; if the cadence is slower than a month, then we give an error and suggest converting to yearmonth or year
  • we discover "yearmonth" and "integer" based on type
  • otherwise time_type is "custom" and it gives a warning that unexpected behavior may result
  • time value and version columns are expected to have the same type in epi_archives
  • time_type and geo_type arguments to as_epi_df and as_epi_archive are now deprecated and they say so in documentation and on use
  • before/after arguments allow integers, day-difftimes for "day", allow week-difftimes for "week", allow integer for "yearmonth" and "integer"
  • slide functions (including opt functions) error with a clear error when they're given an epi_df with a "custom" time type
    previously allowed types that are now labeled custom: POSIXct, yearquarter, yearweek

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments; very minor stuff + a couple potential tasks that could be spun off into separate issues. If you tackle the latter in this PR, might be worth a re-review from me or Nat.

R/grouped_epi_archive.R Outdated Show resolved Hide resolved
NAMESPACE Outdated Show resolved Hide resolved
R/epiprocess.R Outdated Show resolved Hide resolved
@dshemetov dshemetov changed the title refactor(epix_slide): before defaults to Inf, remove time_step refactor(epi_slide, epix_slide): remove time_step, remove time_type, unify time handling, require difftimes in before and after Jul 2, 2024
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach seems good. Design-wise I had some thoughts on what column we base before validation on in epix_slide(). Plus maybe found a bug.

R/epi_df.R Outdated Show resolved Hide resolved
R/grouped_epi_archive.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
R/slide.R Outdated Show resolved Hide resolved
R/utils.R Show resolved Hide resolved
R/slide.R Outdated Show resolved Hide resolved
R/utils.R Show resolved Hide resolved
@brookslogan
Copy link
Contributor

issue: local tests fail, with various complaints like

Error in `validate_version_bound(fill_versions_end, x$DT, na_ok = FALSE)`: fill_versions_end must have the same `class` vector as x$version, which has a `class` of "Date"

as well as

Error in `epix_merge(as_epi_archive(tibble::tibble(geo_value = "pa", time_value = d, 
    version = d + 1L, x_value = 1L)), as_epi_archive(tibble::tibble(geo_value = "pa", 
    time_value = tsibble::make_yearweek(2020, 1), version = d + 
        5L, y_value = 2L)))`: `x` and `y` were not equally up to date version-wise: `x$versions_end` was not identical to `y$versions_end`; either ensure that `x` and `y` are equally up to date before merging, or specify how to deal with this using `sync`

... and it looks like we're not running checks in CI somehow??

@brookslogan
Copy link
Contributor

@nmdefries I would also appreciate if you could take a pass. I feel like I must have glossed over something given the check results, and was pausing until tomorrow but accidentally hit approve instead of comment.

@brookslogan
Copy link
Contributor

brookslogan commented Jul 17, 2024

Sorry, I believe I said I'd take a look at the failing checks while @nmdefries finishes looking over the rest.

  • There's a number of failing tests regarding epi_df grouping; this is a breaking change due to an upstream package update; rebase onto dev and you will get test grouping only before tsibble 1.1.5 #480 to skip these tests for now.
  • Some vignettes are failing to build with errors like
   Both `before` and `after` are required scalar values.

   Expected before to be a difftime with units in days.

   `...` must be empty.
   ✖ Problematic argument:
   • time_type = "day"
  • Some of these might be addressed by loosening the restrictions as noted in some of the discussion above; others might require vignette updates to conform with the new interface.

In particular, along the loosening-restrictions route, I think we discussed the following but maybe only verbally; sorry if these are repeats:

  • suggest: allow integerish time windows when time_type is "day".
  • suggest: in guess_time_type, don't infer "day" if length(unique_time_gaps) != 0L and it doesn't contain any values that are equal to 1. The intent is to generate errors on monthly and yearly data. You could also maybe require something like 10% rounded up of unique_time_gaps to be equal to 1 or something along these lines.

@dshemetov dshemetov force-pushed the ds/before-default branch 2 times, most recently from 20fe4ff to 0d61a98 Compare July 17, 2024 20:46
Copy link
Contributor

@nmdefries nmdefries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epi_slide_opt and related look good aside from the factor issue Logan brought up. I did not look at other changes.

praise: This is quite the complex PR! Thanks for robustifying this, and all the associated research into time-handling.

tests/testthat/test-epix_fill_through_version.R Outdated Show resolved Hide resolved
R/archive.R Outdated Show resolved Hide resolved
R/slide.R Outdated Show resolved Hide resolved
R/slide.R Show resolved Hide resolved
R/slide.R Outdated Show resolved Hide resolved
@dshemetov
Copy link
Contributor Author

suggest: in guess_time_type, don't infer "day" if length(unique_time_gaps) != 0L and it doesn't contain any values that are equal to 1. The intent is to generate errors on monthly and yearly data. You could also maybe require something like 10% rounded up of unique_time_gaps to be equal to 1 or something along these lines.

@lcbrooks the >10% equal to 1 thing seems too fuzzy to turn into an error. I think checking if all(unique_time_gaps) >= 28 and then giving an error is better.

@brookslogan
Copy link
Contributor

I think checking if all(unique_time_gaps) >= 28 and then giving an error is better.

Sounds good, but check parens here (all(unique_time_gaps >= 28)).

(In "normal" data I'd expect this to be similar to a length(unique_time_gaps) == 0L || any(unique_time_gaps == 1) approach. For non-"normal" data.... not sure, not a priority.)

@dshemetov dshemetov force-pushed the ds/before-default branch 9 times, most recently from df4fd05 to c095c1f Compare July 19, 2024 22:21
* guess time_type of "day", "week", "yearmonth", "integer" from time column and warn if not compatible
* restrict `before` and `after` to types compatible with time column
* deprecate geo_type and time_type constructor arguments, infer only
* improve documentation on geo and time types
* enforce time_value and version being same type in epi_archive
* update vignettes
* move arg validation from new_epi_df to as_epi_df.tbl_df to match epi_archive

a
@dshemetov dshemetov merged commit 4f44223 into dev Jul 20, 2024
5 checks passed
@dshemetov dshemetov deleted the ds/before-default branch July 20, 2024 00:43
@dshemetov dshemetov changed the title refactor(epi_slide, epix_slide): remove time_step, remove time_type, unify time handling, require difftimes in before and after refactor(epi_slide, epix_slide): time types refactor Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants