-
Notifications
You must be signed in to change notification settings - Fork 9
Implement snapshots for union tables #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@recursion-ninja in particular, this means you now have a |
| -- This will set the correct nominal credits, but it will not do any more | ||
| -- merging work because fromSnapMergingRunState already supplies all the | ||
| -- merging credits already. | ||
| supplyCreditsIncomingRun conf ln ir nominalCredits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand it correctly:
it will not do any more merging work is based on the invariant that the snapshotted merging run's physical/merge credits were at least as high as dictated by the incoming run's nominal credit. If that assumption wasn't true for some reason, we would fix that here by doing some merging work.
Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct.
These functions and data-type definitions should work well for de-duplicating the code. I am having minor trouble with the |
Yeah, I was talking to @mheinzel about that yesterday. I think it's ok to fake it for now and we can think about tidying it up. It's just a proxy for an order of magnitude size. |
9afc1fa to
55b7225
Compare
|
I've rebased and accepted the changes to the golden files for snapshot format. So hopefully this will now pass CI. |
b7ad924 to
9b8cf74
Compare
@recursion-ninja I have now updated this again to avoid the need for any This required a bit of additional refactoring, so there's an extra patch in here now to introduce the |
9b8cf74 to
8c3cab1
Compare
mheinzel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, that goes in the right direction. Makes sense to store credits this way and good that the level number is not needed for the merging trees.
| SnapCompletedMerge !r | ||
| | SnapOngoingMerge !(V.Vector r) !t | ||
| SnapCompletedMerge !NumRuns !MergeDebt !r | ||
| | SnapOngoingMerge !RunParams !MergeCredits !(V.Vector r) !t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long-term it might make sense to re-construct as much as possible of the run params from the config when opening a snapshot, so users can change (part of) the config, re-open the DB and have the restored tables use the new config. But in the meantime I think it's totally sensible to just take the params from the snapshot. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but I think if possible a nicer design is to load a snapshot and then when the table is online, change the parameters. That's more general since it allows changing online too.
| bracketOnError | ||
| (do uniq <- incrUniqCounter uc | ||
| let runPaths = runPath dir (uniqueToRunNumber uniq) | ||
| MR.new hfs hbio resolve runParams mergeType runPaths rs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long-term it might make sense to re-construct as much as possible of the run params from the config when opening a snapshot, so users can change (part of) the config, re-open the DB and have the restored tables use the new config. But in the meantime I think it's totally sensible to just take the params from the snapshot. 👍
1655f32 to
1527fcf
Compare
|
I think this is in a pretty good state, however I still suspect the ref counting for the snapshots for merging trees is wrong. There isn't yet a test that demonstrates that. I'm trying to (partially) enable the StateMachine tests to show it. I think a simple unit test for a table with unions and then saving and restoring a snapshot would demonstrate the ref counting errors. But I'd also like to make sure the state machine tests would give coverage of it too. |
|
In fact the state machine tests do find a nice simple reproducer, so that's ok: So this is clearly a ref counting mistake. I'll try and fix that now... |
|
Ok, pushed a patch to fix all the referencing counting bugs I can find from the state machine tests. |
90e0481 to
5261940
Compare
|
Ok, the only thing remaining here is to restore the ability to override the config with We'll do this selectively so that it's overridden for runs and merges in levels but left as was for tree merges (where the policy is currently ill defined). |
|
Ok, I think the best thing to do is to merge this PR with its current feature set. This includes making the Something like: |
These params are (almost) always used together, so this helps to reduce verbosity and the number of separate args that have to be passed around. We also slightly simplify when args are provided. Now all these params are provided to the RunBuilder, and none extra are needed when the RunBuilder is turned into the Run. Previously, two were passed to the RunBuilder and one more passed to the Run (though actually finalising the RunBuidler also needed the third param anyway). However, the real motivation for all this is to improve restoring snapshots of merging runs. At the moment, deserialising a merging run requires us to supply all three of these params from the surrounding context. This is ok in the context of levels, but becomes awkward for merging runs embedded in a merging tree. The obvious solution is to serialise this set of parameters with the merging run, and then they're directly available to use when reconstructing the merge during snapshot restore. Changes to serialisation are not in this patch, but the next.
The goal is to make the representation of the snapshot of the merging run closer to being sufficient information to reconstruct the merging run upon snapshot restore. Previously we only embed the merging run within an incoming run and we use information from that context to reconstruct the merging run. In future we want to reuse the representation of snapshots of merging runs within merging trees, and those do not contain the same context information as an incoming run does. So to enable reuse, it makes sense to make the merging run snapshot representation more standalone. In particular: * SnapIncomingRun no longer contains NumRuns or MergeDebt, and these get moved down to the SnapMergingRunState. * SnapIncomingRun now stores the NominalDebt * SnapMergingRunState also gains the RunParams and MergeCredits
Again, the goal is to make the representation more self-contained so that less information needs to be supplied from the context during snapshot restore. And again the intnded use case is merging trees, and runs within them. In the case of a run it is the disk caching policy, and the index type. So overall for a run we store the run number (to find the disk files), the index type (so we know how to restore the index) and the caching policy. Arguably the index type should not need to be known in advance and the index file format should simply say which kind of index it is.
It is just a simple `traverse` to lift it to Levels. This makes it simpler to reuse with merging trees of runs, which can also just use `traverse`.
Put hfs hbio first since that's more global and what we use elsewhere. Roughly in order of lest frequently varying to most frequently varying, or equivalently most global to most local.
The union level contains an optional MergingTree. This is now included in the snapshot representation. Update codecs and tests. This changes the snapshot file format. The golden test files are updated in the next patch. Co-authored-by: Duncan Coutts <duncan@well-typed.com> Co-authored-by: Joris Dral <joris@well-typed.com>
following the change of snapshot representation types.
It should duplicate its input reference for consistency, since it retains it.
Also tweak the MergingTree constructor utilities to make this more straightforward.
Use utils for maybe, list and vector. And use a single-item encoding for Maybe, rather than it being a pair of things. The pair style means one has to remember to count it as two when setting the tuple lengths.
Also update the credit docs for nominal credits.
The shrink tests were timing out.
6ef8b50 to
2080353
Compare
Feedback addressed. Can do any further review post-merge.
jorisdral
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Since #608, using a `TableConfigOverride` when opening a snapshot has no effect. This commit restores the feature that we can change the disk cache policy on snapshot open. Internally, instead of threading through a modified config to open the snapshot, we modify the whole `SnapshotMetaData` (replacing the disk cache policy everywhere) and open the snapshot from that modified metadata. A first idea we had was to modify a table config while its table is live, but there are some challenges. Changing a table config when the table is offline seem to be more straightforward, so that's the approach we took (for now). An explanation of the challenges can be found in the `*.Config.Override` module.
Since #608, using a `TableConfigOverride` when opening a snapshot has no effect. This commit restores the feature that we can change the disk cache policy on snapshot open. Internally, instead of threading through a modified config to open the snapshot, we modify the whole `SnapshotMetaData` (replacing the disk cache policy everywhere) and open the snapshot from that modified metadata. A first idea we had was to modify a table config while its table is live, but there are some challenges. Changing a table config when the table is offline seem to be more straightforward, so that's the approach we took (for now). An explanation of the challenges can be found in the `*.Config.Override` module.
Since #608, using a `TableConfigOverride` when opening a snapshot has no effect. This commit restores the feature that we can change the disk cache policy on snapshot open. Internally, instead of threading through a modified config to open the snapshot, we modify the whole `SnapshotMetaData` (replacing the disk cache policy everywhere) and open the snapshot from that modified metadata. A first idea we had was to modify a table config while its table is live, but there are some challenges. Changing a table config when the table is offline seem to be more straightforward, so that's the approach we took (for now). An explanation of the challenges can be found in the `*.Config.Override` module.
Since #608, using a `TableConfigOverride` when opening a snapshot has no effect. This commit restores the feature that we can change the disk cache policy on snapshot open. Internally, instead of threading through a modified config to open the snapshot, we modify the whole `SnapshotMetaData` (replacing the disk cache policy everywhere) and open the snapshot from that modified metadata. A first idea we had was to modify a table config while its table is live, but there are some challenges. Changing a table config when the table is offline seem to be more straightforward, so that's the approach we took (for now). An explanation of the challenges can be found in the `*.Config.Override` module.
Since #608, using a `TableConfigOverride` when opening a snapshot has no effect. This commit restores the feature that we can change the disk cache policy on snapshot open. Internally, instead of threading through a modified config to open the snapshot, we modify the whole `SnapshotMetaData` (replacing the disk cache policy everywhere) and open the snapshot from that modified metadata. A first idea we had was to modify a table config while its table is live, but there are some challenges. Changing a table config when the table is offline seem to be more straightforward, so that's the approach we took (for now). An explanation of the challenges can be found in the `*.Config.Override` module.
This includes quite a bit of preparatory refactoring, but the main feature is including merging trees (resulting from table unions) into the snapshots, and corresponding tests.