Skip to content

Conversation

@wenkokke
Copy link
Collaborator

This PR:

  • Moves ChecksumHandle and its core functions to their own module (Database.LSMTree.Internal.ChecksumHandle).
  • Generalises the helper functions in RunBuilder (writeRawPage, writeRawOverflowPages, writeBlob, copyBlob, writeFilter, writeIndexHeader, writeIndexChunk, writeIndexFinal) to only take the arguments they need, rather than the entire RunBuilder.
  • Moves those helper functions to the new ChecksumHandle module. This enables them to be reused by the WriteBufferWriter. (Technically, the latter four are not needed, but it seemed better to generalise and move them for consistency.)

Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general! I have some suggestions

Comment on lines 54 to 58
{- | $checksum-handles
A handle ('ChecksumHandle') that maintains a running CRC32 checksum.
-}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{- | $checksum-handles
A handle ('ChecksumHandle') that maintains a running CRC32 checksum.
-}
{- $checksum-handles
A handle ('ChecksumHandle') that maintains a running CRC32 checksum.
-}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm merely following the syntax of the block right above this one, so I think either zero or two changes are warranted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which block do you mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one:

{- | $checksum-files
We use @.checksum@ files to help verify the integrity of on disk snapshots.
Each .checksum file lists the CRC-32C (Castagnoli) of other files. For further
details see @doc/format-directory.md@.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which does indeed render weird:
Screenshot 2024-11-15 at 13 17 40

writePrimVar checksum crc'

{-------------------------------------------------------------------------------
Specialised Writers for ChecksumHandle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where functions like writeRawPage would previously select the correct handle from the RunBuilder such as forRunKOps runBuilderHandles, the new newRawPage could in theory be passed any ChecksumHandle. It's probably not a problem in practice, as the tests would catch incorrect usage. Still, if you think it's a good idea we could introduce some type safety by adding some newtypes, such as newtype KOpsHandle s h = KOpsHandle (ChecksumHandle s h)

Thinking along those same lines, writeBlob and copyBlob are currently passed a variable holding the offset into the file: this could also be a bogus variable with no connection to the blob file. Maybe for the blob handle, we can introduce data BlobHandle s h = BlobHandle !(PrimVar s Word64) !(ChecksumHandle s h) instead of a newtype around just the ChecksumHandle part

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be useful. I took them out of the wrapper because otherwise I'd need additional handles for two nonexistent files. Wrapping them up individually would indeed solve that. Would you like me to implement that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement it, or leave a TODO, what you prefer

@jorisdral
Copy link
Collaborator

BTW, the suggestion to change FS.Handle h to h and PrimState IO to RealWorld should be applied to all specialisations, not only the ones I put a comment on

@wenkokke wenkokke force-pushed the wenkokke/checksum-handle branch 2 times, most recently from eafc509 to f50d254 Compare November 15, 2024 13:17
Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's squash the first 3 commits together, and the last 2 commits together

@wenkokke wenkokke force-pushed the wenkokke/checksum-handle branch from 7664982 to 588171b Compare November 15, 2024 17:23
@wenkokke wenkokke enabled auto-merge November 15, 2024 17:23
@wenkokke wenkokke added this pull request to the merge queue Nov 15, 2024
Merged via the queue into main with commit 9cf968f Nov 15, 2024
24 checks passed
@wenkokke wenkokke deleted the wenkokke/checksum-handle branch November 15, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants