Use a custom `FixedBitSet` + optimize `Page::iter_fixed_len` #1160

Centril · 2024-04-26T00:12:55Z

Description of Changes

Shrinks present_rows to only take up 10 bytes. Fixes Reduce footprint of present_rows to 10 bytes #1152.
Drops the bit-vec dependency in favor of FixedBitSet so we can achieve the above.
Changes the algorithm of Page::iter_fixed_len to use a trailing_zeros approach.
This improves performance of the iter_read_fixed_len for all cases significantly (see results below).
Removes some dead code and addresses lints to make the nightly compiler happier (to pacify miri).

Benchmarks as compared to HEAD~1 (branched master) on i7-7700K, 64GB RAM.

iter_read_fixed_len 8_byte/iter_read_fixed_len 8_byte/0.1
                        time:   [1.2453 µs 1.2476 µs 1.2504 µs]
                        thrpt:  [4.9933 GiB/s 5.0044 GiB/s 5.0138 GiB/s]
                 change:
                        time:   [-90.640% -90.594% -90.557%] (p = 0.00 < 0.05)
                        thrpt:  [+958.94% +963.11% +968.42%]
                        Performance has improved.

iter_read_fixed_len 8_byte/iter_read_fixed_len 8_byte/0.25
                        time:   [3.1358 µs 3.1572 µs 3.1847 µs]
                        thrpt:  [4.8825 GiB/s 4.9251 GiB/s 4.9586 GiB/s]
                 change:
                        time:   [-84.648% -84.560% -84.426%] (p = 0.00 < 0.05)
                        thrpt:  [+542.11% +547.68% +551.37%]
                        Performance has improved.

iter_read_fixed_len 8_byte/iter_read_fixed_len 8_byte/0.5
                        time:   [5.4504 µs 5.4618 µs 5.4775 µs]
                        thrpt:  [5.6068 GiB/s 5.6230 GiB/s 5.6347 GiB/s]
                 change:
                        time:   [-84.835% -84.755% -84.652%] (p = 0.00 < 0.05)
                        thrpt:  [+551.56% +555.96% +559.43%]
                        Performance has improved.

iter_read_fixed_len 8_byte/iter_read_fixed_len 8_byte/0.75
                        time:   [7.6559 µs 7.6630 µs 7.6772 µs]
                        thrpt:  [5.9500 GiB/s 5.9610 GiB/s 5.9666 GiB/s]
                 change:
                        time:   [-71.593% -71.571% -71.540%] (p = 0.00 < 0.05)
                        thrpt:  [+251.37% +251.75% +252.03%]
                        Performance has improved.

iter_read_fixed_len 8_byte/iter_read_fixed_len 8_byte/0.9
                        time:   [9.0964 µs 9.1057 µs 9.1182 µs]
                        thrpt:  [6.0172 GiB/s 6.0255 GiB/s 6.0316 GiB/s]
                 change:
                        time:   [-61.702% -61.475% -61.299%] (p = 0.00 < 0.05)
                        thrpt:  [+158.39% +159.57% +161.11%]
                        Performance has improved.

iter_read_fixed_len 8_byte/iter_read_fixed_len 8_byte/1
                        time:   [10.002 µs 10.007 µs 10.012 µs]
                        thrpt:  [6.0903 GiB/s 6.0932 GiB/s 6.0960 GiB/s]
                 change:
                        time:   [-54.012% -53.966% -53.916%] (p = 0.00 < 0.05)
                        thrpt:  [+116.99% +117.23% +117.45%]
                        Performance has improved.

iter_read_fixed_len 32_byte/iter_read_fixed_len 32_byte/0.1
                        time:   [261.06 ns 261.42 ns 261.80 ns]
                        thrpt:  [24.816 GiB/s 24.853 GiB/s 24.887 GiB/s]
                 change:
                        time:   [-90.917% -90.870% -90.838%] (p = 0.00 < 0.05)
                        thrpt:  [+991.51% +995.34% +1001.0%]
                        Performance has improved.

iter_read_fixed_len 32_byte/iter_read_fixed_len 32_byte/0.25
                        time:   [757.95 ns 758.98 ns 760.35 ns]
                        thrpt:  [20.813 GiB/s 20.850 GiB/s 20.879 GiB/s]
                 change:
                        time:   [-74.311% -74.261% -74.208%] (p = 0.00 < 0.05)
                        thrpt:  [+287.72% +288.51% +289.27%]
                        Performance has improved.

iter_read_fixed_len 32_byte/iter_read_fixed_len 32_byte/0.5
                        time:   [1.3360 µs 1.3364 µs 1.3368 µs]
                        thrpt:  [22.628 GiB/s 22.636 GiB/s 22.642 GiB/s]
                 change:
                        time:   [-61.588% -61.510% -61.409%] (p = 0.00 < 0.05)
                        thrpt:  [+159.13% +159.81% +160.33%]
                        Performance has improved.
                        
iter_read_fixed_len 32_byte/iter_read_fixed_len 32_byte/0.75
                        time:   [1.9166 µs 1.9174 µs 1.9183 µs]
                        thrpt:  [23.800 GiB/s 23.812 GiB/s 23.822 GiB/s]
                 change:
                        time:   [-50.428% -50.281% -50.131%] (p = 0.00 < 0.05)
                        thrpt:  [+100.53% +101.13% +101.73%]
                        Performance has improved.
                        
iter_read_fixed_len 32_byte/iter_read_fixed_len 32_byte/0.9
                        time:   [2.2676 µs 2.2698 µs 2.2735 µs]
                        thrpt:  [24.133 GiB/s 24.173 GiB/s 24.196 GiB/s]
                 change:
                        time:   [-46.345% -46.272% -46.199%] (p = 0.00 < 0.05)
                        thrpt:  [+85.870% +86.123% +86.375%]
                        Performance has improved.
                        
iter_read_fixed_len 32_byte/iter_read_fixed_len 32_byte/1
                        time:   [2.4979 µs 2.4998 µs 2.5028 µs]
                        thrpt:  [24.363 GiB/s 24.392 GiB/s 24.410 GiB/s]
                 change:
                        time:   [-39.172% -39.109% -39.052%] (p = 0.00 < 0.05)
                        thrpt:  [+64.074% +64.227% +64.397%]
                        Performance has improved.
                        

iter_read_fixed_len 256_byte/iter_read_fixed_len 256_byte/0.1
                        time:   [31.214 ns 31.237 ns 31.264 ns]
                        thrpt:  [175.40 GiB/s 175.55 GiB/s 175.68 GiB/s]
                 change:
                        time:   [-89.765% -89.756% -89.747%] (p = 0.00 < 0.05)
                        thrpt:  [+875.28% +876.16% +877.00%]
                        Performance has improved.
                        
iter_read_fixed_len 256_byte/iter_read_fixed_len 256_byte/0.25
                        time:   [67.296 ns 67.340 ns 67.424 ns]
                        thrpt:  [194.49 GiB/s 194.73 GiB/s 194.85 GiB/s]
                 change:
                        time:   [-79.814% -79.798% -79.777%] (p = 0.00 < 0.05)
                        thrpt:  [+394.50% +395.00% +395.38%]
                        Performance has improved.
                        
iter_read_fixed_len 256_byte/iter_read_fixed_len 256_byte/0.5
                        time:   [163.99 ns 164.26 ns 164.56 ns]
                        thrpt:  [188.35 GiB/s 188.69 GiB/s 189.00 GiB/s]
                 change:
                        time:   [-58.921% -58.858% -58.791%] (p = 0.00 < 0.05)
                        thrpt:  [+142.66% +143.06% +143.44%]
                        Performance has improved.
iter_read_fixed_len 256_byte/iter_read_fixed_len 256_byte/0.75
                        time:   [247.31 ns 247.47 ns 247.78 ns]
                        thrpt:  [189.56 GiB/s 189.80 GiB/s 189.92 GiB/s]
                 change:
                        time:   [-46.170% -46.134% -46.080%] (p = 0.00 < 0.05)
                        thrpt:  [+85.460% +85.645% +85.770%]
                        Performance has improved.
                        
iter_read_fixed_len 256_byte/iter_read_fixed_len 256_byte/0.9
                        time:   [293.82 ns 294.04 ns 294.30 ns]
                        thrpt:  [187.95 GiB/s 188.11 GiB/s 188.25 GiB/s]
                 change:
                        time:   [-40.039% -39.889% -39.644%] (p = 0.00 < 0.05)
                        thrpt:  [+65.685% +66.358% +66.776%]
                        Performance has improved.
                        
iter_read_fixed_len 256_byte/iter_read_fixed_len 256_byte/1
                        time:   [318.79 ns 319.17 ns 319.92 ns]
                        thrpt:  [190.04 GiB/s 190.49 GiB/s 190.71 GiB/s]
                 change:
                        time:   [-38.455% -38.353% -38.268%] (p = 0.00 < 0.05)
                        thrpt:  [+61.991% +62.213% +62.482%]
                        Performance has improved.

API and ABI breaking changes

None

Expected complexity level and risk

2, contained changes, but some unsafe included.

Testing

Proptests are included and they have been run in miri

gefjon

Neat! Minor comment-ey suggestions.

Why did you choose to make FixedBitSet generic over B: BitBlock, rather than just using u64 directly? Did you run any experiments with different block sizes? Do you foresee us wanting to choose a different block size?

crates/table/src/fixed_bit_set.rs

Centril · 2024-04-30T21:27:47Z

Why did you choose to make FixedBitSet generic over B: BitBlock, rather than just using u64 directly?

Initially, I started off with a type alias type BitBlock = u32;, but eventually I felt that B being generic helped enforce "unit correctness" and I also heard that James might want a custom bitset for other places.

Did you run any experiments with different block sizes?

Initially, I tried u32 just because that was the previous block size for BitVec. However, changing to u64 did give some slight improvements in benchmarks. They were not huge, but they were notable. I don't recall how significant they were however.

Do you foresee us wanting to choose a different block size?

Might be that James's use case wants u32 to be more dense, but u64 might also be all we want.

crates/table/src/page.rs

Shubham8287 · 2024-05-02T12:24:22Z

Just went through code, It was fun to read FixedBitSet code. I really appreciate how well the code is written. Thanks! 🙌

Centril requested review from kazimuth and gefjon April 26, 2024 00:13

Centril changed the title ~~Use a custom FixedBitSet + optimize Page::iter_fixed_len~~ Use a custom FixedBitSet + optimize Page::iter_fixed_len Apr 26, 2024

bfops added the release-any To be landed in any release window label Apr 29, 2024

gefjon approved these changes Apr 30, 2024

View reviewed changes

crates/table/src/fixed_bit_set.rs Show resolved Hide resolved

crates/table/src/fixed_bit_set.rs Show resolved Hide resolved

crates/table/src/fixed_bit_set.rs Show resolved Hide resolved

crates/table/src/fixed_bit_set.rs Show resolved Hide resolved

Centril commented Apr 30, 2024

View reviewed changes

crates/table/src/page.rs Show resolved Hide resolved

use a custom FixedBitSet + optimize Page::iter_fixed_len

2482c62

Centril force-pushed the centril/faster-smaller-present_rows branch from e6a7719 to 2482c62 Compare April 30, 2024 21:40

Centril enabled auto-merge April 30, 2024 21:43

Centril added this pull request to the merge queue Apr 30, 2024

Merged via the queue into master with commit b55121c Apr 30, 2024
6 checks passed

Centril deleted the centril/faster-smaller-present_rows branch April 30, 2024 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a custom `FixedBitSet` + optimize `Page::iter_fixed_len` #1160

Use a custom `FixedBitSet` + optimize `Page::iter_fixed_len` #1160

Centril commented Apr 26, 2024

gefjon left a comment

Centril commented Apr 30, 2024

Shubham8287 commented May 2, 2024 •

edited

Loading

Use a custom FixedBitSet + optimize Page::iter_fixed_len #1160

Use a custom FixedBitSet + optimize Page::iter_fixed_len #1160

Conversation

Centril commented Apr 26, 2024

Description of Changes

API and ABI breaking changes

Expected complexity level and risk

Testing

gefjon left a comment

Choose a reason for hiding this comment

Centril commented Apr 30, 2024

Shubham8287 commented May 2, 2024 • edited Loading

Use a custom `FixedBitSet` + optimize `Page::iter_fixed_len` #1160

Use a custom `FixedBitSet` + optimize `Page::iter_fixed_len` #1160

Shubham8287 commented May 2, 2024 •

edited

Loading