New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deletes consolidation: switch from marker hashes to condition indexes. #3451
Merged
KiterLuc
merged 2 commits into
dev
from
lr/delete-condolidation-marker-hashes-to-index/ch19590
Aug 16, 2022
Merged
Deletes consolidation: switch from marker hashes to condition indexes. #3451
KiterLuc
merged 2 commits into
dev
from
lr/delete-condolidation-marker-hashes-to-index/ch19590
Aug 16, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This pull request has been linked to Shortcut Story #19590: Deletes: consolidation support.. |
KiterLuc
changed the title
Deletes condolidation: switch from marker hashes to condition indexes.
Deletes consolidation: switch from marker hashes to condition indexes.
Aug 13, 2022
KiterLuc
force-pushed
the
lr/delete-condolidation-marker-hashes-to-index/ch19590
branch
from
August 13, 2022 21:56
dc66a4f
to
894c8f5
Compare
ihnorton
reviewed
Aug 16, 2022
ihnorton
approved these changes
Aug 16, 2022
format_spec/fragment.md
Outdated
@@ -43,7 +43,7 @@ There can be any number of fragments in an array. The fragment folder contains: | |||
* The names of the data files are not dependent on the names of the attributes/dimensions. The file names are determined by the order of the attributes and dimensions in the array schema. | |||
* The timestamp fixed attribute (`t.tdb`) is, for fragments consolidated with timestamps, the time at which a cell was added. | |||
* The delete timestamp fixed attribute (`dt.tdb`) is, for fragments consolidated with delete conditions, the time at which a cell was deleted. | |||
* The delete condition marker hash fixed attribute (`dcmh.tdb`) is, for fragments consolidated with delete conditions, the hash of the delete condition marker that deleted the cell. The delete condition marker is the file path of the delete condition relative to the array URI. | |||
* The delete condition index fixed attribute (`dci.tdb`) is, for fragments consolidated with delete conditions, the index of the delete condition (inside of [Tile Processed Conditions](#tile-processed-conditions)) that deleted the cell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cross-reference format_spec/delete_commit_file.md
for clarity here.
std::hash is not guaranteed to give consistent results on different platforms so it cannot be used to store on disk data. It was used to hash the delete condition marker to know which delete condition deleted a specific cell. Fortunately, as we already store the processed conditions in the fragment metadata, we can store an index to the processed conditions and that will allow to retrieve the same information. The only complexity comes from when a previous fragment consolidated with deletes gets processed by consolidation, we need to convert the index into the original fragment processed condition to the new fragment processed conditions. This can be done by building a hash table with a key of the condition marker and a value of the index into the new processed condition array for constant time conversion. --- TYPE: IMPROVEMENT DESC: Deletes condolidation: switch from marker hashes to condition indexes.
KiterLuc
force-pushed
the
lr/delete-condolidation-marker-hashes-to-index/ch19590
branch
from
August 16, 2022 13:35
894c8f5
to
8b53f52
Compare
KiterLuc
deleted the
lr/delete-condolidation-marker-hashes-to-index/ch19590
branch
August 16, 2022 16:53
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
std::hash is not guaranteed to give consistent results on different
platforms so it cannot be used to store on disk data. It was used to
hash the delete condition marker to know which delete condition deleted
a specific cell. Fortunately, as we already store the processed
conditions in the fragment metadata, we can store an index to the
processed conditions and that will allow to retrieve the same
information. The only complexity comes from when a previous fragment
consolidated with deletes gets processed by consolidation, we need to
convert the index into the original fragment processed condition to the
new fragment processed conditions. This can be done by building a hash
table with a key of the condition marker and a value of the index into
the new processed condition array for constant time conversion.
TYPE: IMPROVEMENT
DESC: Deletes consolidation: switch from marker hashes to condition indexes.