Skip to content

Conversation

@recursion-ninja
Copy link
Collaborator

Adding snapshotting functionality for MergingTree

Summary

This PR contains the snapshotting functionality for the MergingTree data-type. In order to create a snapshot of a MergingTree, additional prerequisite functionality is also included:

  • Serialization via Encode.
  • Deserialization via DecodeVersioned.

As with creating the "metadata" snapshot, Intermediate "Snap-" types are created to media the conversion between a disk-representable format and the in-memory representation.

This PR includes test cases for:

  • The new Snap- data-types's Arbitrary instances.
  • Golden tests for the SnapshotMergingTree data to detect breakage of the format's encoding.
  • Property tests for "round-tripping" SnapshotMergingTree in memory
  • Property tests for "round-tripping" SnapshotMergingTree to and from disk via fs-api.

Potential issues:

  1. While the SnapshotMergingTree data successfully round-trips to/from disk, I am not certain that conversion from a SnapshotMergingTree back to the in-memory representation of MergingTreeis correct. The Database.LSMTree.Internal.Snapshot.fromSnapMergingTree function defines this functionality, but I'm not sure I understand the usage of Ref and Refcounter well enough to ensure that the conversion is done correctly. Careful review of this function and/or suggestions for testing this conversion.

  2. The functionality needs to be integrated into the Database.LSMTree.Internal.createSnapshot function after the questions below is resolved.

  3. Currently, the MergingTree data snapshots are separate from the SnapshotMetadata, with each data-type having their own duplicate functions:
    - writeFileSnapshotMergingTree
    - readFileSnapshotMergingTree
    - encodeSnapshotMergingTree
    - decodeSnapshotMergingTree
    The functionality for creating snapshots of the MergingTree data-type was created separately for ease of debugging and testing. However, now that the functionality is stable, this separation may be undesirable. It appears that createSnapshot is an "atomic" operation for the whole Table and any MergingTree within the Table should be serialized together within the SnapshotMetadata.

Questions

  1. Should we have the SnapshotMergingTree be an entry within the SnapshotMetadata record, effectively combining the two types?

  2. Does every use case for creating a SnapshotMetadata snapshot necessitate snapshots of any MergingTree(s)?

  3. Does there exist a use case to snapshot a MergingTree without also creating a SnapshotMetadata` snapshot?

Copy link
Collaborator

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good.

I think we should rebase this after #588 since there's some related churn to the run/merging run/incoming run types and code.

We need to review the reference handling pattern. It should be callee duplicates when it retains references not caller. We should avoid functions that consume a reference / take ownership of a reference.

And I think we should extend this within the same PR to include the integration into the top level src/Database/LSMTree/Internal.hs, since otherwise we're not able to integrate it into the SnapshotMetaData type as we expect to do.

@dcoutts dcoutts force-pushed the recursion-ninja/merge-tree-serialize branch from e5acb06 to 3acc5da Compare March 3, 2025 16:40
@dcoutts
Copy link
Collaborator

dcoutts commented Mar 3, 2025

Rebased on #588.

@dcoutts dcoutts changed the base branch from main to dcoutts/merge-credits-scaling-impl March 3, 2025 16:47
Base automatically changed from dcoutts/merge-credits-scaling-impl to main March 3, 2025 17:32
-> SnapMergingRunState t (Ref (Run m h))
-> m (Ref (MR.MergingRun t m h))
getSnapMergingRunState nr md mc smrs = do
let ln = LevelNo 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This is fishy. We should figure out proper level numbers. Perhaps that means we need to include them in the snapshot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this problem has "moved" from the old call site, to the definitions of

  • toSnapMergingTreeState
  • toSnapPreExistingRun

The thing is, I don't know where to get the "correct" LevelNo from. The two call sites supplying LevelNo 0 to the getSnapMergingRunState function involve converting the MergingTreeState and PreExistingRun constructors:

  • SnapOngoingTreeMerge
  • SnapPreExistingMergingRun

I can't find any reasonable place form within the MergingTree data-type to extract a LevelNo from. Should the supplied LevelNo for the MergingTree be one grater than the number of Levels in the table?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after rebasing on main the problem site has moved again. The problem now manifests in fromSnapMergingRunState and is annotated with a TODO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after rebasing on main the problem site has moved again. The problem now manifests in fromSnapMergingRunState and is annotated with a TODO.

Is this still a live issue? I don't see the TODO on fromSnapMergingRunState. And in the latest #608 there is no need for a LevelNo in fromSnapMergingRunState. So I hope this is now resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't clear enough about this. So in fromSnapMergingRunState there is this definition:

        --TODO: the threshold should be stored with the MergingRun
        -- here we want to supply the credits now, so we can use a threshold of 1
        let thresh = MR.CreditThreshold (MR.UnspentCredits 1)
        _ <- MR.supplyCreditsAbsolute mr thresh mergeCredits
        return mr

The "correct" way to create a CreditThreshhold, from what I can intuit from the exposed APIs, is to call Database.LSMTree.Internal.MergeSchedule.creditThresholdForLevel, and that function requires a LevelNo. That is what I meant by saying that the problem call site has just moved to the fromSnapMergingRunState definition. Is there another, better way to set the CreditThreshhold?

@recursion-ninja recursion-ninja force-pushed the recursion-ninja/merge-tree-serialize branch 2 times, most recently from 600b813 to 94d9211 Compare March 6, 2025 22:00
@recursion-ninja
Copy link
Collaborator Author

I believe that, at long last, this is ready to merge.

@dcoutts dcoutts force-pushed the recursion-ninja/merge-tree-serialize branch from 94d9211 to 837e891 Compare March 6, 2025 23:55
@dcoutts
Copy link
Collaborator

dcoutts commented Mar 6, 2025

Looks like a minor rebase mistake (one of my patches appears twice, the second with minimal content). Removing this second one still works fine so I'll just force push it.

@recursion-ninja
Copy link
Collaborator Author

recursion-ninja commented Mar 7, 2025

Removing this second one still works fine so I'll just force push it.

Thanks for doing that!

@recursion-ninja recursion-ninja force-pushed the recursion-ninja/merge-tree-serialize branch from 22e9d5a to 52899f1 Compare March 7, 2025 16:51
@recursion-ninja
Copy link
Collaborator Author

Phew, I finally got that rebase with main completed to resolve the merge conflicts.

@recursion-ninja recursion-ninja disabled auto-merge March 7, 2025 16:57
@dcoutts dcoutts force-pushed the recursion-ninja/merge-tree-serialize branch from 52899f1 to ae3d37f Compare March 7, 2025 18:44
@dcoutts dcoutts changed the base branch from main to dcoutts/snapshot-refactoring March 7, 2025 18:45
dcoutts and others added 3 commits March 7, 2025 21:48
Use utils for maybe, list and vector. And use a single-item encoding for
Maybe, rather than it being a pair of things. The pair style means one
has to remember to count it as two when setting the tuple lengths.
@dcoutts dcoutts force-pushed the recursion-ninja/merge-tree-serialize branch from e1fbf2a to ca1a591 Compare March 7, 2025 21:48
@dcoutts dcoutts merged commit 1655f32 into dcoutts/snapshot-refactoring Mar 7, 2025
@dcoutts dcoutts deleted the recursion-ninja/merge-tree-serialize branch March 7, 2025 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants