-
Notifications
You must be signed in to change notification settings - Fork 8
Add epi_archive
#22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add epi_archive
#22
Conversation
Add `epi_tibble_archive` as an `R6` class with just - `$new` (requires the user to perform any compression of snapshots/updates into minimal diffs if desired, and to reconstruct upon updating the archive itself) - `$epi_tibble_as_of` The next priority is to implement `epi_slide` for `epi_tibble_archive`s.
These need testing. The `epi_slide` implementation needs testing and documentation and examples. The issues vignette still needs to be completed with this implementation.
- Renamed `epi_df_archive` to `epi_archive` - Started updating documentation for `epi_slide()` - Rebuilt documentation
- This allows us to create an `epi_df` object directly from columns, just like `tibble()` or `data.frame()` - It also gives us a better place to point the documentation to, for describing the structure of an `epi_df` object (previously we were pointing to `as_epi_df()`, which was a bit awkward) - Rename `as.epi_df()` to `as_epi_df()` to be consistent with tibble and tsibble functions - Rebuild doc and vignettes
- Trying to make `epi_archive` "look" more like `epi_df` in terms of both the way the user sees it, and the way we maintain it - Rename "issue" to be "version" - Enforce "geo_value", "time_value" columns (no custom names) - Added `epi_archive()` and `as_epi_archive()` functions - Ensure that `epi_archive$as_of()` returns a tsibble, and make sure that this can be pushed through `as_epi_df()` without us losing the `tbl_ts` class - Fiddle with various small tweaks to `epi_df` stuff
- Check out vignettes/weird.Rmd. Something weird is going on. When we have an `epi_df` that is `tbl_df`, then the `epi_df` class is kept when we do things like `dplyr::select()`. But when we have an `epi_df` that is `tbl_ts`, then the `epi_df` class is DROPPED when we do things like `dplyr::select()` - I have no idea why this is going on. Also, I'm not sure what behavior we actually want. Seems like being brittle could be a good thing across many dplyr verbs (e.g., `dplyr::select()` on an `epi_df` could drop the required columns of an `epi_df, so the result SHOULDN'T be an `epi_df`)
Weird thing is probably because [
looks like it copies the class and other attributes using
does not look like it does the same [it rebuilds the result as a tibble or tsibble rather than reassigning attributes]. |
- Refactor the way key variables work within an `epi_archive` so we have "other key variables" (apart from geo, time, version) which is back to the way Logan had it originally - Get rid of `as_of()` returning a tsibble; now it gives a tibble but it puts the key variables into the metadata - Add `as_tibble.epi_df()` S3 method so that it knows how to set the key variables sensibly from "geo_value" and the metadata, if keys exist in the metadata; and it also takes the index to be "time_value". This means there will be no loss of information if we do something like epi_archive -> epi_df -> tsibble
- Continuing to edit/refine the way we implement and document this - Now we have the following symmetry: `epi_df` and `epi_archive` each have their own documentation files (and hopefully this is picked up on with auto-linking when the vignettes are rebuilt), which explain the formats in a kind of symmetric way - We also have `as_epi_df()` and `as_epi_archive()` as fronts for converting data to those formats, which also proceed somewhat symmetrically - Got rid of explicit constructor for `epi_df`; but the explicit constructor for `epi_archive` still lives in R6's `new()` method (which is confusingly overwritten by `initialize()`, yet still accessible to users via `new()`). I think this lack of symmetry is fine, because these are genuinely different structures - Cleaned up the `as_of()` method of for `epi_archive`, and also introduced a `min_time_value` argument which should make sliding much more efficient - Have yet to look at sliding, or revisit vignettes in any way
- Also fix assorted little things here and there - Still need to work through slide and rebuild vignettes
- Clean up a lot more on `epi_archive` (and `epi_df` by symmetry) - Simplify `epi_slide()`: now it does not require the user to set the column type explicitly, only specify whether they want it to be a list column or not - Re-implement sliding for archive objects. Make it a method of the `epi_archive` class rather than a special case of `epi_slide()`. Just simpler that way. Also, I think Logan's understanding or conception of sliding here was different than mine: mine is now clearly explained in the documentation. Basically it is just the version-aware analog of `epi_slide()` for `epi_df` objects - Start writing epi archive vignette. Got decently far but got stuck before I could test out the `slide()` method. I got stuck on this: I think we need a `join()` method to `epi_archive` so it knows how to join on another data table onto its own one, and perform LOCF appropriately, update its metadata, etc. I want to do this at the data table level (rather than data frame or tibble level) because it should be faster. But something isn't quite working for me - Committing for now; still need to figure out joins, and test out sliding, then finish and rebuild vignettes
- I forgot to include it in the last commit
- Now we have two grouping variables, both customizable: `shift_by` and `cor_by`. The former specifies which variables we want to grp on before shifting, and the latter before computing correlations. They can be different in general - Update correlation vignette and test out it works as expected
- Move epi_archive stuff into a wip folder (along with derivative vignette, which is no longer something to highlight) - Rebuld documentation and vignettes
@lcbrooks I'm going to go ahead and merge this PR. I ended up doing a lot of work on this branch that I think will be valuable beyond archive that I just want to get this in the main branch, and take a break from the archive work while I pursue other things. Summary outside of the
This archive work is now in a
|
This is a work in progress. Opening the PR just for any feedback and discussion.