change log ids in data keys and current table#207
Closed
ekg wants to merge 3 commits intodat-ecosystem:masterfrom
ekg:master
Closed
change log ids in data keys and current table#207ekg wants to merge 3 commits intodat-ecosystem:masterfrom ekg:master
ekg wants to merge 3 commits intodat-ecosystem:masterfrom
ekg:master
Conversation
By adding a reference to the change index id to data table keys, we can quickly revert the current table of the repository to a particular checkpoint. These changes only enable the storage of the change index keys. This is not a stable commit. A majority of tests now pass, but there are still significant issues.
The addition of change log ids to the data keys (after the versions), will allow us to quickly extract the state of the data at particular point in the change log. This can be accomplished via a linear scan of the keys in the data table, requiring that the change id of a particular object is <= the target point in the log. If we did not include this data alongside the data, we would be forced to complete a reconstruction of the dataset via the change log. This would complicate the process of rolling back particular subsets of the data to predetermined points in the history. Additionally, it wouldn't be possible to quickly determine the relative age of two objects, which has a number of possible applications in reproducibility and logging. The level-dat backend will support these change ids as of 4.5.0. No functionality is yet tested which is based on the change ids, but the next step should be to implement a commit/checkout or checkpoint/rollback model on top of it. With this update we now pass 616/616 tests.
Contributor
Author
|
This depends on mafintosh/level-dat#1 |
Contributor
Author
|
It looks like the build error results from the requirement that we update the level-dat version. |
Collaborator
|
Just commenting here for posterity, we have discussed this pull request and are still investigating if it is the right approach. Keeping it open for now |
Collaborator
|
gonna close for now, but we will definitely make sure this gets in after @mafintosh refactors the storage stuff |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The addition of change log ids to the data keys (after the versions), will allow us to quickly extract the state of the data at particular point in the change log. This can be accomplished via a linear scan of the keys in the data table, requiring that the change id of a particular object is <= the target point in the log.
If we did not include this data alongside the data, we would be forced to complete a reconstruction of the dataset via the change log. This would complicate the process of rolling back particular subsets of the data to predetermined points in the history. Additionally, it wouldn't be possible to quickly determine the relative age of two objects, which has a number of possible applications in reproducibility and logging.
No functionality is yet tested which is based on the change ids, but the next step should be to implement a commit/checkout or checkpoint/rollback model on top of it.