Add option to compactify issue data in `epi_archive`

`epi_archive`s can be formed based on a conglomeration of full snapshots, issue data with duplicate re-reporting, and/or minimal patch-like issues.  Some space (unsure about time) can be saved by removing rows that match LOCF of previous issues.  Space can be essential if we are attempting in-memory analysis.

Proposal: introduce a constructor argument `compactify`:
- `TRUE`: remove unnecessary rows to give same LOCF results.  Make sure to maintain the same `max_issue` value as the original data.
- `FALSE`: leave data as-is
- default value (say, `NULL`): same as `TRUE` except message the user if this actually changed the data, and telling them how to silence the message

Use cases:
- User inputs full snapshot data, to prevent using space quadratic in the number of snapshots.  (A further enhancement would be to directly work off of a directory of snapshot files or something similar.)
- We are working off of a covidcast data source that historically did not use diff-based issues and/or has many full re-issues.  (E.g., repeating the analysis [here](https://github.com/cmu-delphi/covidcast-indicators/issues/1362) gives covidcast jhu-csse state-level case issue data at 79% duplicates despite the shift to having routine issues being diff-based.  This is still just reducing what `object.size` says is 40MB--50MB down to ~10MB, but at the county level it _might_ matter more.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to compactify issue data in `epi_archive` #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add option to compactify issue data in epi_archive #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add option to compactify issue data in `epi_archive` #62