-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swap the location of tags in the BFLATN encoding #1063
Conversation
BFLATN never leaks (yet) out of the db to consumers either in modules, queries, or subscriptions, so it's not public and thus this change isn't breaking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The swap itself looks sound and correctly implemented. Here are some nits, requests for additional test cases, and an idea on how to implement the optimization painlessly in this PR.
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com> Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io> Signed-off-by: james gilles <jameshgilles@gmail.com>
Okay, the optimization is implemented.
Do we not consider stuff written to the filesystem as ABI? If this was merged after snapshots it would result in a change of our on-disk format, so that should be ABI right? (Whether or not it actually gets merged after snapshots is a different question.) |
Well, given that you're implementing this now, and no one has started working on snapshots yet, I think we're good. |
Okay I think this is ready |
Per discussion on the snapshotting proposal, this PR changes the type of `Page.row_data` to `[u8; _]`, where previously it was `[MaybeUninit<u8>; _]`. This turns out to be shockingly easy, as our serialization codepaths never write padding bytes into a page. The only place pages ever became `poison` was the initial allocation; changing this to `alloc_zeroed` causes the `row_data` to always be valid at `[u8; _]`. The majority of this diff is replacing `MaybeUninit`-specific operators with their initialized equivalents, and updating comments and documentation to reflect the new requirements. This change also revealed a bug in the benchmarks introduced when we swapped the order of sum tags and payloads ( #1063 ), where benchmarks used a hardcoded offset for the tag which had not been updated.
* Make `Page` always fully init Per discussion on the snapshotting proposal, this PR changes the type of `Page.row_data` to `[u8; _]`, where previously it was `[MaybeUninit<u8>; _]`. This turns out to be shockingly easy, as our serialization codepaths never write padding bytes into a page. The only place pages ever became `poison` was the initial allocation; changing this to `alloc_zeroed` causes the `row_data` to always be valid at `[u8; _]`. The majority of this diff is replacing `MaybeUninit`-specific operators with their initialized equivalents, and updating comments and documentation to reflect the new requirements. This change also revealed a bug in the benchmarks introduced when we swapped the order of sum tags and payloads ( #1063 ), where benchmarks used a hardcoded offset for the tag which had not been updated. * Update blake3 Blake3 only supports running under Miri as of 1.15.1, the latest version. Prior versions hard-depended on SIMD intrinsics which Miri doesn't support. * Address Mazdak's review. Still pending his agreeing with me that `poison` is a better name than `uninit`. * "Poison" -> "uninit" Against my best wishes, for consistency with the broader Rust community's poor choices. * Remove unnecessary `unsafe` blocks * More unnecessary `unsafe`; remove forgotten SAFETY comments
Description of Changes
Partially implements #1006. I haven't ensured we're actually doing the relevant memcpy optimizations yet.
API and ABI breaking changes
Is BFLATN public ABI yet? It will be once we have snapshots I suppose.
So, a breaking change on the BFLATN ABI.
Expected complexity level and risk
3
Testing
I want to run this under miri but haven't done so yet.