Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deletes consolidation: switch from marker hashes to condition indexes. #3451

Merged
merged 2 commits into from Aug 16, 2022

Conversation

KiterLuc
Copy link
Contributor

@KiterLuc KiterLuc commented Aug 13, 2022

std::hash is not guaranteed to give consistent results on different
platforms so it cannot be used to store on disk data. It was used to
hash the delete condition marker to know which delete condition deleted
a specific cell. Fortunately, as we already store the processed
conditions in the fragment metadata, we can store an index to the
processed conditions and that will allow to retrieve the same
information. The only complexity comes from when a previous fragment
consolidated with deletes gets processed by consolidation, we need to
convert the index into the original fragment processed condition to the
new fragment processed conditions. This can be done by building a hash
table with a key of the condition marker and a value of the index into
the new processed condition array for constant time conversion.


TYPE: IMPROVEMENT
DESC: Deletes consolidation: switch from marker hashes to condition indexes.

@KiterLuc KiterLuc requested a review from ihnorton August 13, 2022 21:55
@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #19590: Deletes: consolidation support..

@KiterLuc KiterLuc changed the title Deletes condolidation: switch from marker hashes to condition indexes. Deletes consolidation: switch from marker hashes to condition indexes. Aug 13, 2022
@KiterLuc KiterLuc force-pushed the lr/delete-condolidation-marker-hashes-to-index/ch19590 branch from dc66a4f to 894c8f5 Compare August 13, 2022 21:56
@@ -43,7 +43,7 @@ There can be any number of fragments in an array. The fragment folder contains:
* The names of the data files are not dependent on the names of the attributes/dimensions. The file names are determined by the order of the attributes and dimensions in the array schema.
* The timestamp fixed attribute (`t.tdb`) is, for fragments consolidated with timestamps, the time at which a cell was added.
* The delete timestamp fixed attribute (`dt.tdb`) is, for fragments consolidated with delete conditions, the time at which a cell was deleted.
* The delete condition marker hash fixed attribute (`dcmh.tdb`) is, for fragments consolidated with delete conditions, the hash of the delete condition marker that deleted the cell. The delete condition marker is the file path of the delete condition relative to the array URI.
* The delete condition index fixed attribute (`dci.tdb`) is, for fragments consolidated with delete conditions, the index of the delete condition (inside of [Tile Processed Conditions](#tile-processed-conditions)) that deleted the cell.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please cross-reference format_spec/delete_commit_file.md for clarity here.

std::hash is not guaranteed to give consistent results on different
platforms so it cannot be used to store on disk data. It was used to
hash the delete condition marker to know which delete condition deleted
a specific cell. Fortunately, as we already store the processed
conditions in the fragment metadata, we can store an index to the
processed conditions and that will allow to retrieve the same
information. The only complexity comes from when a previous fragment
consolidated with deletes gets processed by consolidation, we need to
convert the index into the original fragment processed condition to the
new fragment processed conditions. This can be done by building a hash
table with a key of the condition marker and a value of the index into
the new processed condition array for constant time conversion.

---
TYPE: IMPROVEMENT
DESC: Deletes condolidation: switch from marker hashes to condition indexes.
@KiterLuc KiterLuc force-pushed the lr/delete-condolidation-marker-hashes-to-index/ch19590 branch from 894c8f5 to 8b53f52 Compare August 16, 2022 13:35
@KiterLuc KiterLuc merged commit 73e9905 into dev Aug 16, 2022
@KiterLuc KiterLuc deleted the lr/delete-condolidation-marker-hashes-to-index/ch19590 branch August 16, 2022 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants