Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Page always fully init #1193

Merged
merged 6 commits into from
May 2, 2024
Merged

Make Page always fully init #1193

merged 6 commits into from
May 2, 2024

Conversation

gefjon
Copy link
Contributor

@gefjon gefjon commented May 2, 2024

Description of Changes

Per discussion on the snapshotting proposal, this PR changes the type of Page.row_data to [u8; _], where previously it was [MaybeUninit<u8>; _].

This turns out to be shockingly easy, as our serialization codepaths never write padding bytes into a page. The only place pages ever became poison was the initial allocation; changing this to alloc_zeroed causes the row_data to always be valid at [u8; _].

The majority of this diff is replacing MaybeUninit-specific operators with their initialized equivalents, and updating comments and documentation to reflect the new requirements.

This change also revealed a bug in the benchmarks introduced when we swapped the order of sum tags and payloads (#1063 ), where benchmarks used a hardcoded offset for the tag which had not been updated.

API and ABI breaking changes

N/a.

Expected complexity level and risk

3

  • Minimal changes to behavior.
  • Interacts deeply with Rust/LLVM's safety model.
  • Could use more eyes on the docs to see if I missed any references to things being uninit.

Testing

  • Tests, including benchmarks run as tests, pass in normal execution.
    • cargo test --benches in the table directory.
  • Tests pass under Miri.

Per discussion on the snapshotting proposal,
this PR changes the type of `Page.row_data` to `[u8; _]`,
where previously it was `[MaybeUninit<u8>; _]`.

This turns out to be shockingly easy,
as our serialization codepaths never write padding bytes into a page.
The only place pages ever became `poison` was the initial allocation;
changing this to `alloc_zeroed` causes the `row_data` to always be valid at `[u8; _]`.

The majority of this diff is replacing `MaybeUninit`-specific operators
with their initialized equivalents,
and updating comments and documentation to reflect the new requirements.

This change also revealed a bug in the benchmarks
introduced when we swapped the order of sum tags and payloads
( #1063 ),
where benchmarks used a hardcoded offset for the tag which had not been updated.
@gefjon gefjon requested review from kazimuth and Centril May 2, 2024 16:06
@gefjon
Copy link
Contributor Author

gefjon commented May 2, 2024

benchmarks please

Copy link

github-actions bot commented May 2, 2024

Criterion benchmark results

Criterion benchmark report

YOU SHOULD PROBABLY IGNORE THESE RESULTS.

Criterion is a wall time based benchmarking system that is extremely noisy when run on CI. We collect these results for longitudinal analysis, but they are not reliable for comparing individual PRs.

Go look at the callgrind report instead.

empty

db on disk new latency old latency new throughput old throughput
sqlite 💿 - 418.0±1.70ns - -
sqlite 🧠 - 415.1±1.67ns - -
stdb_raw 💿 711.8±1.01ns 721.5±2.43ns - -
stdb_raw 🧠 683.9±0.92ns 692.5±0.68ns - -

insert_1

db on disk schema indices preload new latency old latency new throughput old throughput

insert_bulk

db on disk schema indices preload count new latency old latency new throughput old throughput
sqlite 💿 u32_u64_str btree_each_column 2048 256 - 521.3±42.57µs - 1918 tx/sec
sqlite 💿 u32_u64_str unique_0 2048 256 - 133.1±0.54µs - 7.3 Ktx/sec
sqlite 💿 u32_u64_u64 btree_each_column 2048 256 - 417.5±0.59µs - 2.3 Ktx/sec
sqlite 💿 u32_u64_u64 unique_0 2048 256 - 120.9±0.71µs - 8.1 Ktx/sec
sqlite 🧠 u32_u64_str btree_each_column 2048 256 - 442.6±0.30µs - 2.2 Ktx/sec
sqlite 🧠 u32_u64_str unique_0 2048 256 - 119.1±0.44µs - 8.2 Ktx/sec
sqlite 🧠 u32_u64_u64 btree_each_column 2048 256 - 363.1±0.75µs - 2.7 Ktx/sec
sqlite 🧠 u32_u64_u64 unique_0 2048 256 - 104.1±0.51µs - 9.4 Ktx/sec
stdb_raw 💿 u32_u64_str btree_each_column 2048 256 640.2±30.97µs 591.1±20.25µs 1562 tx/sec 1691 tx/sec
stdb_raw 💿 u32_u64_str unique_0 2048 256 491.0±42.06µs 495.1±33.10µs 2036 tx/sec 2019 tx/sec
stdb_raw 💿 u32_u64_u64 btree_each_column 2048 256 398.9±9.06µs 392.4±8.57µs 2.4 Ktx/sec 2.5 Ktx/sec
stdb_raw 💿 u32_u64_u64 unique_0 2048 256 372.5±8.77µs 315.5±17.77µs 2.6 Ktx/sec 3.1 Ktx/sec
stdb_raw 🧠 u32_u64_str btree_each_column 2048 256 341.5±0.81µs 324.9±0.22µs 2.9 Ktx/sec 3.0 Ktx/sec
stdb_raw 🧠 u32_u64_str unique_0 2048 256 266.6±0.42µs 252.9±0.21µs 3.7 Ktx/sec 3.9 Ktx/sec
stdb_raw 🧠 u32_u64_u64 btree_each_column 2048 256 269.3±0.11µs 263.3±0.19µs 3.6 Ktx/sec 3.7 Ktx/sec
stdb_raw 🧠 u32_u64_u64 unique_0 2048 256 246.6±0.16µs 231.4±0.28µs 4.0 Ktx/sec 4.2 Ktx/sec

iterate

db on disk schema indices new latency old latency new throughput old throughput
sqlite 💿 u32_u64_str unique_0 - 20.3±0.07µs - 48.0 Ktx/sec
sqlite 💿 u32_u64_u64 unique_0 - 19.2±0.20µs - 50.8 Ktx/sec
sqlite 🧠 u32_u64_str unique_0 - 19.6±0.36µs - 49.9 Ktx/sec
sqlite 🧠 u32_u64_u64 unique_0 - 18.1±0.24µs - 53.8 Ktx/sec
stdb_raw 💿 u32_u64_str unique_0 4.7±0.00µs 4.7±0.00µs 209.3 Ktx/sec 208.1 Ktx/sec
stdb_raw 💿 u32_u64_u64 unique_0 4.6±0.00µs 4.6±0.00µs 214.4 Ktx/sec 213.8 Ktx/sec
stdb_raw 🧠 u32_u64_str unique_0 4.6±0.00µs 4.7±0.00µs 210.2 Ktx/sec 209.1 Ktx/sec
stdb_raw 🧠 u32_u64_u64 unique_0 4.5±0.00µs 4.5±0.00µs 215.6 Ktx/sec 214.8 Ktx/sec

find_unique

db on disk key type preload new latency old latency new throughput old throughput

filter

db on disk key type index strategy load count new latency old latency new throughput old throughput
sqlite 💿 string index 2048 256 - 70.2±0.18µs - 13.9 Ktx/sec
sqlite 💿 u64 index 2048 256 - 66.5±0.27µs - 14.7 Ktx/sec
sqlite 🧠 string index 2048 256 - 68.9±0.16µs - 14.2 Ktx/sec
sqlite 🧠 u64 index 2048 256 - 63.7±0.44µs - 15.3 Ktx/sec
stdb_raw 💿 string index 2048 256 5.1±0.00µs 5.1±0.00µs 192.8 Ktx/sec 192.0 Ktx/sec
stdb_raw 💿 u64 index 2048 256 5.0±0.00µs 5.1±0.00µs 195.7 Ktx/sec 190.0 Ktx/sec
stdb_raw 🧠 string index 2048 256 5.0±0.00µs 5.1±0.00µs 194.3 Ktx/sec 193.3 Ktx/sec
stdb_raw 🧠 u64 index 2048 256 5.0±0.00µs 5.1±0.00µs 197.1 Ktx/sec 191.2 Ktx/sec

serialize

schema format count new latency old latency new throughput old throughput
u32_u64_str bflatn_to_bsatn_fast_path 100 3.7±0.00µs 3.4±0.00µs 25.6 Mtx/sec 28.1 Mtx/sec
u32_u64_str bflatn_to_bsatn_slow_path 100 3.5±0.01µs 3.6±0.00µs 27.5 Mtx/sec 26.8 Mtx/sec
u32_u64_str bsatn 100 2.4±0.03µs 2.5±0.00µs 38.9 Mtx/sec 38.8 Mtx/sec
u32_u64_str json 100 4.9±0.02µs 5.1±0.02µs 19.6 Mtx/sec 18.8 Mtx/sec
u32_u64_str product_value 100 1014.3±0.74ns 1015.6±4.91ns 94.0 Mtx/sec 93.9 Mtx/sec
u32_u64_u64 bflatn_to_bsatn_fast_path 100 1299.5±8.58ns 1387.4±2.19ns 73.4 Mtx/sec 68.7 Mtx/sec
u32_u64_u64 bflatn_to_bsatn_slow_path 100 2.9±0.00µs 2.8±0.01µs 33.2 Mtx/sec 34.2 Mtx/sec
u32_u64_u64 bsatn 100 1709.0±33.97ns 1753.7±35.85ns 55.8 Mtx/sec 54.4 Mtx/sec
u32_u64_u64 json 100 3.2±0.07µs 3.5±0.09µs 29.9 Mtx/sec 27.6 Mtx/sec
u32_u64_u64 product_value 100 1008.9±0.63ns 1011.5±0.52ns 94.5 Mtx/sec 94.3 Mtx/sec
u64_u64_u32 bflatn_to_bsatn_fast_path 100 1089.3±2.66ns 1165.2±1.60ns 87.6 Mtx/sec 81.8 Mtx/sec
u64_u64_u32 bflatn_to_bsatn_slow_path 100 2.9±0.03µs 2.8±0.01µs 33.3 Mtx/sec 34.4 Mtx/sec
u64_u64_u32 bsatn 100 1613.4±29.69ns 1743.2±32.15ns 59.1 Mtx/sec 54.7 Mtx/sec
u64_u64_u32 json 100 3.2±0.04µs 3.5±0.02µs 29.4 Mtx/sec 27.1 Mtx/sec
u64_u64_u32 product_value 100 1010.0±0.58ns 1011.0±0.91ns 94.4 Mtx/sec 94.3 Mtx/sec

stdb_module_large_arguments

arg size new latency old latency new throughput old throughput
64KiB 85.2±4.54µs 99.7±10.57µs - -

stdb_module_print_bulk

line count new latency old latency new throughput old throughput
1 41.7±2.87µs 35.7±3.40µs - -
100 341.6±2.96µs 347.6±5.22µs - -
1000 2.8±0.26ms 2.9±0.30ms - -

remaining

name new latency old latency new throughput old throughput
sqlite/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 - 45.3±0.09µs - 21.6 Ktx/sec
sqlite/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 - 42.5±9.20µs - 23.0 Ktx/sec
sqlite/🧠/update_bulk/u32_u64_str/unique_0/load=2048/count=256 - 39.1±0.29µs - 25.0 Ktx/sec
sqlite/🧠/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 - 34.9±0.22µs - 28.0 Ktx/sec
stdb_module/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 1406.7±8.80µs 1386.9±10.01µs 710 tx/sec 721 tx/sec
stdb_module/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 1092.1±25.63µs 1062.1±19.48µs 915 tx/sec 941 tx/sec
stdb_raw/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 698.9±19.03µs 678.6±16.36µs 1430 tx/sec 1473 tx/sec
stdb_raw/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 534.8±16.61µs 528.7±9.18µs 1869 tx/sec 1891 tx/sec
stdb_raw/🧠/update_bulk/u32_u64_str/unique_0/load=2048/count=256 436.4±0.28µs 421.5±0.38µs 2.2 Ktx/sec 2.3 Ktx/sec
stdb_raw/🧠/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 396.2±0.57µs 385.6±0.74µs 2.5 Ktx/sec 2.5 Ktx/sec

Copy link

github-actions bot commented May 2, 2024

Callgrind benchmark results

Callgrind Benchmark Report

These benchmarks were run using callgrind,
an instruction-level profiler. They allow comparisons between sqlite (sqlite), SpacetimeDB running through a module (stdb_module), and the underlying SpacetimeDB data storage engine (stdb_raw). Callgrind emulates a CPU to collect the below estimates.

Measurement changes larger than five percent are in bold.

In-memory benchmarks

callgrind: empty transaction

db total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw 5995 6174 -2.90% 6789 7066 -3.92%
sqlite 5676 5564 2.01% 6102 6078 0.39%

callgrind: filter

db schema indices count preload _column data_type total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str no_index 64 128 2 string 120693 119591 0.92% 121443 120305 0.95%
stdb_raw u32_u64_str no_index 64 128 1 u64 78442 77263 1.53% 78902 77757 1.47%
stdb_raw u32_u64_str btree_each_column 64 128 2 string 25149 25311 -0.64% 25667 25777 -0.43%
stdb_raw u32_u64_str btree_each_column 64 128 1 u64 24110 24269 -0.66% 24514 24583 -0.28%
sqlite u32_u64_str no_index 64 128 2 string 143664 143685 -0.01% 145162 145345 -0.13%
sqlite u32_u64_str no_index 64 128 1 u64 123020 123026 -0.00% 124308 124314 -0.00%
sqlite u32_u64_str btree_each_column 64 128 1 u64 130322 130343 -0.02% 131720 131793 -0.06%
sqlite u32_u64_str btree_each_column 64 128 2 string 133527 133548 -0.02% 135131 135306 -0.13%

callgrind: insert bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 64 128 947160 798790 18.57% 966402 823744 17.32%
stdb_raw u32_u64_str btree_each_column 64 128 1081673 930689 16.22% 1110891 979475 13.42%
sqlite u32_u64_str unique_0 64 128 396307 396133 0.04% 413437 412365 0.26%
sqlite u32_u64_str btree_each_column 64 128 969380 969206 0.02% 1004990 1003628 0.14%

callgrind: iterate

db schema indices count total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 147927 148069 -0.10% 148031 148135 -0.07%
stdb_raw u32_u64_str unique_0 64 15750 15892 -0.89% 15850 15958 -0.68%
sqlite u32_u64_str unique_0 1024 1046895 1044679 0.21% 1050069 1048083 0.19%
sqlite u32_u64_str unique_0 64 75041 74745 0.40% 76015 75945 0.09%

callgrind: serialize_product_value

count format total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
64 bsatn 25717 26691 -3.65% 27995 28969 -3.36%
64 json 47438 48688 -2.57% 50022 51204 -2.31%
16 bsatn 8118 8373 -3.05% 9444 9733 -2.97%
16 json 12142 12434 -2.35% 13978 14202 -1.58%

callgrind: update bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 1024 22465085 21788673 3.10% 22908983 22386437 2.33%
stdb_raw u32_u64_str unique_0 64 128 1424636 1270619 12.12% 1489166 1315283 13.22%
sqlite u32_u64_str unique_0 1024 1024 1802084 1801858 0.01% 1811140 1811072 0.00%
sqlite u32_u64_str unique_0 64 128 128620 128394 0.18% 131266 131270 -0.00%
On-disk benchmarks

callgrind: empty transaction

db total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw 6361 6568 -3.15% 7167 7484 -4.24%
sqlite 5728 5606 2.18% 6218 6138 1.30%

callgrind: filter

db schema indices count preload _column data_type total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str no_index 64 128 2 string 121059 119985 0.90% 121797 120823 0.81%
stdb_raw u32_u64_str no_index 64 128 1 u64 78793 77657 1.46% 79333 78243 1.39%
stdb_raw u32_u64_str btree_each_column 64 128 1 u64 24476 24663 -0.76% 24908 25105 -0.78%
stdb_raw u32_u64_str btree_each_column 64 128 2 string 25715 25705 0.04% 26349 26255 0.36%
sqlite u32_u64_str no_index 64 128 2 string 145585 145606 -0.01% 147391 147478 -0.06%
sqlite u32_u64_str no_index 64 128 1 u64 124926 124947 -0.02% 126590 126535 0.04%
sqlite u32_u64_str btree_each_column 64 128 2 string 135577 135598 -0.02% 137631 137734 -0.07%
sqlite u32_u64_str btree_each_column 64 128 1 u64 132418 132439 -0.02% 134214 134379 -0.12%

callgrind: insert bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 64 128 898491 748123 20.10% 948213 775059 22.34%
stdb_raw u32_u64_str btree_each_column 64 128 1026568 878349 16.87% 1084646 928843 16.77%
sqlite u32_u64_str unique_0 64 128 413855 413681 0.04% 430517 429529 0.23%
sqlite u32_u64_str btree_each_column 64 128 1019955 1019781 0.02% 1054553 1052973 0.15%

callgrind: iterate

db schema indices count total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 148291 148463 -0.12% 148387 148533 -0.10%
stdb_raw u32_u64_str unique_0 64 16107 16286 -1.10% 16203 16356 -0.94%
sqlite u32_u64_str unique_0 1024 1049963 1047747 0.21% 1053645 1051511 0.20%
sqlite u32_u64_str unique_0 64 76813 76517 0.39% 78051 77957 0.12%

callgrind: serialize_product_value

count format total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
64 bsatn 25717 26691 -3.65% 27995 28969 -3.36%
64 json 47438 48688 -2.57% 50022 51204 -2.31%
16 bsatn 8118 8373 -3.05% 9444 9733 -2.97%
16 json 12142 12434 -2.35% 13978 14202 -1.58%

callgrind: update bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 1024 21420238 20692507 3.52% 21942784 21398083 2.55%
stdb_raw u32_u64_str unique_0 64 128 1380488 1223133 12.86% 1443318 1274151 13.28%
sqlite u32_u64_str unique_0 1024 1024 1809880 1809648 0.01% 1818552 1818276 0.02%
sqlite u32_u64_str unique_0 64 128 132768 132536 0.18% 135630 135582 0.04%

Copy link
Contributor

@kazimuth kazimuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, it makes safety arguments easier & gets rid of a potential source of nondeterminism if non-zeroed data ever did end up visible somehow. Alloc_zeroed ought to be a tiny perf hit anyway.

Hmm, I guess it's a bigger perf hit than I thought looking at the benches, but I think this is still manageable. Inserts would be where it shows up.

crates/table/benches/page.rs Show resolved Hide resolved
crates/table/benches/page.rs Show resolved Hide resolved
crates/table/benches/page_manager.rs Show resolved Hide resolved
crates/table/benches/page.rs Show resolved Hide resolved
crates/table/benches/page_manager.rs Show resolved Hide resolved
crates/table/src/var_len.rs Outdated Show resolved Hide resolved
crates/table/src/var_len.rs Outdated Show resolved Hide resolved
crates/table/src/var_len.rs Outdated Show resolved Hide resolved
crates/table/src/var_len.rs Outdated Show resolved Hide resolved
crates/table/src/var_len.rs Outdated Show resolved Hide resolved
gefjon added 5 commits May 2, 2024 14:14
Blake3 only supports running under Miri as of 1.15.1, the latest version.
Prior versions hard-depended on SIMD intrinsics which Miri doesn't support.
Still pending his agreeing with me that `poison` is a better name than `uninit`.
Against my best wishes, for consistency with the broader Rust community's poor choices.
@gefjon gefjon added this pull request to the merge queue May 2, 2024
Merged via the queue into master with commit 484ba82 May 2, 2024
7 checks passed
@Centril Centril deleted the phoebe/page-no-poison branch May 27, 2024 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants