Add `epi_archive` #22

brookslogan · 2021-11-09T17:16:44Z

This is a work in progress. Opening the PR just for any feedback and discussion.

Add `epi_tibble_archive` as an `R6` class with just - `$new` (requires the user to perform any compression of snapshots/updates into minimal diffs if desired, and to reconstruct upon updating the archive itself) - `$epi_tibble_as_of` The next priority is to implement `epi_slide` for `epi_tibble_archive`s.

These need testing. The `epi_slide` implementation needs testing and documentation and examples. The issues vignette still needs to be completed with this implementation.

- Renamed `epi_df_archive` to `epi_archive` - Started updating documentation for `epi_slide()` - Rebuilt documentation

- This allows us to create an `epi_df` object directly from columns, just like `tibble()` or `data.frame()` - It also gives us a better place to point the documentation to, for describing the structure of an `epi_df` object (previously we were pointing to `as_epi_df()`, which was a bit awkward) - Rename `as.epi_df()` to `as_epi_df()` to be consistent with tibble and tsibble functions - Rebuild doc and vignettes

- Trying to make `epi_archive` "look" more like `epi_df` in terms of both the way the user sees it, and the way we maintain it - Rename "issue" to be "version" - Enforce "geo_value", "time_value" columns (no custom names) - Added `epi_archive()` and `as_epi_archive()` functions - Ensure that `epi_archive$as_of()` returns a tsibble, and make sure that this can be pushed through `as_epi_df()` without us losing the `tbl_ts` class - Fiddle with various small tweaks to `epi_df` stuff

- Check out vignettes/weird.Rmd. Something weird is going on. When we have an `epi_df` that is `tbl_df`, then the `epi_df` class is kept when we do things like `dplyr::select()`. But when we have an `epi_df` that is `tbl_ts`, then the `epi_df` class is DROPPED when we do things like `dplyr::select()` - I have no idea why this is going on. Also, I'm not sure what behavior we actually want. Seems like being brittle could be a good thing across many dplyr verbs (e.g., `dplyr::select()` on an `epi_df` could drop the required columns of an `epi_df, so the result SHOULDN'T be an `epi_df`)

brookslogan · 2022-02-03T02:44:51Z

Weird thing is probably because [relocate calls [ and:]

tibble:::`[.tbl_df`

looks like it copies the class and other attributes using tibble:::vectbl_restore (calling tibble_restore_impl),
but

tsibble:::`[.tbl_ts`

does not look like it does the same [it rebuilds the result as a tibble or tsibble rather than reassigning attributes].

- Refactor the way key variables work within an `epi_archive` so we have "other key variables" (apart from geo, time, version) which is back to the way Logan had it originally - Get rid of `as_of()` returning a tsibble; now it gives a tibble but it puts the key variables into the metadata - Add `as_tibble.epi_df()` S3 method so that it knows how to set the key variables sensibly from "geo_value" and the metadata, if keys exist in the metadata; and it also takes the index to be "time_value". This means there will be no loss of information if we do something like epi_archive -> epi_df -> tsibble

R/slide.R

- Continuing to edit/refine the way we implement and document this - Now we have the following symmetry: `epi_df` and `epi_archive` each have their own documentation files (and hopefully this is picked up on with auto-linking when the vignettes are rebuilt), which explain the formats in a kind of symmetric way - We also have `as_epi_df()` and `as_epi_archive()` as fronts for converting data to those formats, which also proceed somewhat symmetrically - Got rid of explicit constructor for `epi_df`; but the explicit constructor for `epi_archive` still lives in R6's `new()` method (which is confusingly overwritten by `initialize()`, yet still accessible to users via `new()`). I think this lack of symmetry is fine, because these are genuinely different structures - Cleaned up the `as_of()` method of for `epi_archive`, and also introduced a `min_time_value` argument which should make sliding much more efficient - Have yet to look at sliding, or revisit vignettes in any way

- Also fix assorted little things here and there - Still need to work through slide and rebuild vignettes

- Clean up a lot more on `epi_archive` (and `epi_df` by symmetry) - Simplify `epi_slide()`: now it does not require the user to set the column type explicitly, only specify whether they want it to be a list column or not - Re-implement sliding for archive objects. Make it a method of the `epi_archive` class rather than a special case of `epi_slide()`. Just simpler that way. Also, I think Logan's understanding or conception of sliding here was different than mine: mine is now clearly explained in the documentation. Basically it is just the version-aware analog of `epi_slide()` for `epi_df` objects - Start writing epi archive vignette. Got decently far but got stuck before I could test out the `slide()` method. I got stuck on this: I think we need a `join()` method to `epi_archive` so it knows how to join on another data table onto its own one, and perform LOCF appropriately, update its metadata, etc. I want to do this at the data table level (rather than data frame or tibble level) because it should be faster. But something isn't quite working for me - Committing for now; still need to figure out joins, and test out sliding, then finish and rebuild vignettes

- I forgot to include it in the last commit

- Now we have two grouping variables, both customizable: `shift_by` and `cor_by`. The former specifies which variables we want to grp on before shifting, and the latter before computing correlations. They can be different in general - Update correlation vignette and test out it works as expected

- Move epi_archive stuff into a wip folder (along with derivative vignette, which is no longer something to highlight) - Rebuld documentation and vignettes

ryantibs · 2022-02-07T17:15:42Z

@lcbrooks I'm going to go ahead and merge this PR. I ended up doing a lot of work on this branch that I think will be valuable beyond archive that I just want to get this in the main branch, and take a break from the archive work while I pursue other things. Summary outside of the epi_archive work:

Cleaned up a lot of epi_df and epi_slide() behavior so that they're simpler and more complete
Generalized epi_cor() so that it takes two "by" arguments: shift_by and cor_by
Lots of documentation improvements and updates

This archive work is now in a wip folder, and currently removed from the main package (so no users try to user it at this point). We should revisit it in another PR to finish it off. It is close to being finished. Summary of what's left:

I think we need a join() method for epi_archive so it knows how to join on another data table onto its own one, and perform LOCF appropriately, update its metadata, etc. I want to do this at the data table level (rather than data frame or tibble level) because it should be faster
Still need to test out the slide() method for epi_archive (which I reimplemented)
Still need to finish off the archive vignette, probably the best idea just to repeat the case forecasting example from the slide vignette, but with doctor's visits as a covariate

brookslogan marked this pull request as draft November 9, 2021 17:16

Add metainfo fns, update_DT_as_of, epi_slide for epi_tibble_archive

e19958c

These need testing. The `epi_slide` implementation needs testing and documentation and examples. The issues vignette still needs to be completed with this implementation.

brookslogan mentioned this pull request Nov 9, 2021

Add epi_archive class #23

Closed

ryantibs added 7 commits January 25, 2022 13:19

epi_tibble -> epi_df

2296916

Merge branch 'main' into introduce-epi-archive

8a91de6

Assorted cleanup

22c61a3

- Renamed `epi_df_archive` to `epi_archive` - Started updating documentation for `epi_slide()` - Rebuilt documentation

Little import fixes

e6acb17

ryantibs changed the title ~~Add epi_tibble_archive~~ Add epi_archive Feb 3, 2022

ryantibs added 2 commits February 2, 2022 22:17

Small bug fixes

a19cdc4

brookslogan commented Feb 3, 2022

View reviewed changes

R/slide.R Outdated Show resolved Hide resolved

ryantibs added 8 commits February 3, 2022 12:47

Fix bugs in

d848c39

Update index.md description of main data structures

e9d8e4b

- Also fix assorted little things here and there - Still need to work through slide and rebuild vignettes

Update archive vignette

4755419

- I forgot to include it in the last commit

Misc doc updates

f21091a

Rebuild vignettes and reorganize

d6d6896

- Move epi_archive stuff into a wip folder (along with derivative vignette, which is no longer something to highlight) - Rebuld documentation and vignettes

ryantibs marked this pull request as ready for review February 7, 2022 17:08

ryantibs merged commit 23c829f into main Feb 7, 2022

ryantibs deleted the introduce-epi-archive branch February 20, 2022 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `epi_archive` #22

Add `epi_archive` #22

Uh oh!

brookslogan commented Nov 9, 2021

Uh oh!

brookslogan commented Feb 3, 2022 •

edited

Loading

Uh oh!

Uh oh!

ryantibs commented Feb 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add epi_archive #22

Add epi_archive #22

Uh oh!

Conversation

brookslogan commented Nov 9, 2021

Uh oh!

brookslogan commented Feb 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ryantibs commented Feb 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add `epi_archive` #22

Add `epi_archive` #22

brookslogan commented Feb 3, 2022 •

edited

Loading