Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap the location of tags in the BFLATN encoding #1063

Merged
merged 5 commits into from
Apr 11, 2024
Merged

Conversation

kazimuth
Copy link
Contributor

@kazimuth kazimuth commented Apr 9, 2024

Description of Changes

Partially implements #1006. I haven't ensured we're actually doing the relevant memcpy optimizations yet.

API and ABI breaking changes

Is BFLATN public ABI yet? It will be once we have snapshots I suppose.
So, a breaking change on the BFLATN ABI.

Expected complexity level and risk

3

Testing

I want to run this under miri but haven't done so yet.

@kazimuth kazimuth requested review from Centril and gefjon April 9, 2024 19:19
@Centril
Copy link
Contributor

Centril commented Apr 10, 2024

Is BFLATN public ABI yet? It will be once we have snapshots I suppose. So, a breaking change on the BFLATN ABI.

BFLATN never leaks (yet) out of the db to consumers either in modules, queries, or subscriptions, so it's not public and thus this change isn't breaking.

Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The swap itself looks sound and correctly implemented. Here are some nits, requests for additional test cases, and an idea on how to implement the optimization painlessly in this PR.

crates/table/src/layout.rs Outdated Show resolved Hide resolved
crates/table/src/layout.rs Outdated Show resolved Hide resolved
crates/table/src/row_type_visitor.rs Show resolved Hide resolved
crates/table/src/bflatn_to_bsatn_fast_path.rs Outdated Show resolved Hide resolved
crates/table/src/bflatn_to_bsatn_fast_path.rs Outdated Show resolved Hide resolved
crates/table/src/bflatn_to_bsatn_fast_path.rs Outdated Show resolved Hide resolved
crates/table/src/bflatn_to_bsatn_fast_path.rs Outdated Show resolved Hide resolved
crates/table/src/bflatn_to_bsatn_fast_path.rs Show resolved Hide resolved
crates/table/src/bflatn_to_bsatn_fast_path.rs Show resolved Hide resolved
crates/table/src/bflatn_to_bsatn_fast_path.rs Show resolved Hide resolved
kazimuth and others added 2 commits April 10, 2024 11:50
Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
Co-authored-by: Phoebe Goldman <phoebe@clockworklabs.io>
Signed-off-by: james gilles <jameshgilles@gmail.com>
@kazimuth
Copy link
Contributor Author

Okay, the optimization is implemented.

Is BFLATN public ABI yet? It will be once we have snapshots I suppose. So, a breaking change on the BFLATN ABI.

BFLATN never leaks (yet) out of the db to consumers either in modules, queries, or subscriptions, so it's not public and thus this change isn't breaking.

Do we not consider stuff written to the filesystem as ABI? If this was merged after snapshots it would result in a change of our on-disk format, so that should be ABI right? (Whether or not it actually gets merged after snapshots is a different question.)

@gefjon
Copy link
Contributor

gefjon commented Apr 10, 2024

Do we not consider stuff written to the filesystem as ABI? If this was merged after snapshots it would result in a change of our on-disk format, so that should be ABI right? (Whether or not it actually gets merged after snapshots is a different question.)

Well, given that you're implementing this now, and no one has started working on snapshots yet, I think we're good.

@kazimuth
Copy link
Contributor Author

Okay I think this is ready

@kazimuth kazimuth added this pull request to the merge queue Apr 11, 2024
Merged via the queue into master with commit b9cee3d Apr 11, 2024
6 checks passed
@Centril Centril deleted the jgilles/swaptags branch April 11, 2024 21:50
gefjon added a commit that referenced this pull request May 2, 2024
Per discussion on the snapshotting proposal,
this PR changes the type of `Page.row_data` to `[u8; _]`,
where previously it was `[MaybeUninit<u8>; _]`.

This turns out to be shockingly easy,
as our serialization codepaths never write padding bytes into a page.
The only place pages ever became `poison` was the initial allocation;
changing this to `alloc_zeroed` causes the `row_data` to always be valid at `[u8; _]`.

The majority of this diff is replacing `MaybeUninit`-specific operators
with their initialized equivalents,
and updating comments and documentation to reflect the new requirements.

This change also revealed a bug in the benchmarks
introduced when we swapped the order of sum tags and payloads
( #1063 ),
where benchmarks used a hardcoded offset for the tag which had not been updated.
@gefjon gefjon mentioned this pull request May 2, 2024
2 tasks
github-merge-queue bot pushed a commit that referenced this pull request May 2, 2024
* Make `Page` always fully init

Per discussion on the snapshotting proposal,
this PR changes the type of `Page.row_data` to `[u8; _]`,
where previously it was `[MaybeUninit<u8>; _]`.

This turns out to be shockingly easy,
as our serialization codepaths never write padding bytes into a page.
The only place pages ever became `poison` was the initial allocation;
changing this to `alloc_zeroed` causes the `row_data` to always be valid at `[u8; _]`.

The majority of this diff is replacing `MaybeUninit`-specific operators
with their initialized equivalents,
and updating comments and documentation to reflect the new requirements.

This change also revealed a bug in the benchmarks
introduced when we swapped the order of sum tags and payloads
( #1063 ),
where benchmarks used a hardcoded offset for the tag which had not been updated.

* Update blake3

Blake3 only supports running under Miri as of 1.15.1, the latest version.
Prior versions hard-depended on SIMD intrinsics which Miri doesn't support.

* Address Mazdak's review.

Still pending his agreeing with me that `poison` is a better name than `uninit`.

* "Poison" -> "uninit"

Against my best wishes, for consistency with the broader Rust community's poor choices.

* Remove unnecessary `unsafe` blocks

* More unnecessary `unsafe`; remove forgotten SAFETY comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants