Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deletes: adding purge option for consolidation.
This adds the ability to purge deleted cells when running consolidation with deletes. When this is done, the cells that were deleted are fully removed from the fragment, unless they get added again after the deletion. This will also not write the delete metadata columns for this fragment as there is no delete times for the cells. The harder problem to solve for this PR was for the no duplicates array, when a cell gets deleted, deduplication needs to delete only the cells that were added before a certain cell was deleted. For fragments with timestamps, as we still want to write every cells with their appropriate timestamps, this means that a fragment could have more than one cell with the same coordinate to process. The solution is to add all cells with the same coordinate to the sorting tile queue, and to add the timestamp dimension to the sorting (with the greater timestamp coming first). That way we can merge all cells until a deleted cell gets hit, at which point we stop and get rid of the cells that came in before the delete. This also fixed a few tests that actually didn't run consolidation, and fixes consolidating a fragment consolidated with deletes, as the delete condition index tiles were not getting loaded properly. --- TYPE: IMPROVEMENT DESC: Deletes: adding purge option for consolidation.
- Loading branch information