Skip to content

Reduce per-byte overhead in VLQ integer decoding#9584

Open
Dandandan wants to merge 1 commit intoapache:mainfrom
Dandandan:pr/vlq-decoding
Open

Reduce per-byte overhead in VLQ integer decoding#9584
Dandandan wants to merge 1 commit intoapache:mainfrom
Dandandan:pr/vlq-decoding

Conversation

@Dandandan
Copy link
Contributor

Which issue does this PR close?

Closes #9580

Rationale

The current VLQ decoder calls get_aligned for each byte, which involves repeated offset calculations and bounds checks in the hot loop.

What changes are included in this PR?

Align to the byte boundary once, then iterate directly over the buffer slice, avoiding per-byte overhead from get_aligned.

Are there any user-facing changes?

No.

🤖 Generated with Claude Code

Read directly from the buffer slice instead of calling get_aligned for
each byte, avoiding repeated offset calculations and bounds checks in
the hot loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 19, 2026
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4093148017-469-769vg 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing pr/vlq-decoding (9098f72) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   pr_vlq-decoding
-----                                             ----                                   ---------------
arrow_reader_clickbench/async/Q1                  1.00   1086.3±5.25µs        ? ?/sec    1.00   1084.3±5.54µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.8±0.21ms        ? ?/sec    1.00      6.8±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.8±0.19ms        ? ?/sec    1.00      7.8±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.02     14.9±0.20ms        ? ?/sec    1.00     14.6±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.03     17.6±0.35ms        ? ?/sec    1.00     17.1±0.28ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.01     16.1±0.29ms        ? ?/sec    1.00     15.9±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.01      3.2±0.07ms        ? ?/sec    1.00      3.1±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     73.7±0.56ms        ? ?/sec    1.21    88.9±13.41ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.02     82.1±0.63ms        ? ?/sec    1.00     80.4±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    115.6±4.56ms        ? ?/sec    1.16   134.2±10.67ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.04    250.7±3.83ms        ? ?/sec    1.00    240.5±2.93ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.01     19.7±0.35ms        ? ?/sec    1.00     19.5±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.03     59.2±0.49ms        ? ?/sec    1.00     57.2±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.04     59.4±0.67ms        ? ?/sec    1.00     57.0±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.01     18.7±0.18ms        ? ?/sec    1.00     18.5±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.04     15.7±0.37ms        ? ?/sec    1.00     15.2±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.5±0.07ms        ? ?/sec    1.00      5.5±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.03     13.6±0.37ms        ? ?/sec    1.00     13.2±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.05     25.2±0.57ms        ? ?/sec    1.00     24.0±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.03      5.9±0.11ms        ? ?/sec    1.00      5.7±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.02      5.1±0.05ms        ? ?/sec    1.00      5.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.01      3.6±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1067.8±4.36µs        ? ?/sec    1.01   1079.9±8.70µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.7±0.16ms        ? ?/sec    1.00      6.6±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.7±0.18ms        ? ?/sec    1.00      7.7±0.11ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.03     14.8±0.20ms        ? ?/sec    1.00     14.4±0.21ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.03     17.3±0.36ms        ? ?/sec    1.00     16.8±0.27ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.01     16.1±0.31ms        ? ?/sec    1.00     16.0±0.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.03      3.0±0.05ms        ? ?/sec    1.00      2.9±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.02     72.0±0.66ms        ? ?/sec    1.00     70.7±0.36ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.03     81.3±0.73ms        ? ?/sec    1.00     79.2±0.30ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.04     99.4±0.84ms        ? ?/sec    1.00     95.8±0.49ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.01    228.1±1.62ms        ? ?/sec    1.00    226.3±2.77ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     19.4±0.34ms        ? ?/sec    1.00     19.4±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.03     57.6±0.70ms        ? ?/sec    1.00     55.9±0.37ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.04     57.9±0.71ms        ? ?/sec    1.00     55.7±0.50ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.3±0.22ms        ? ?/sec    1.00     18.3±0.12ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.04     14.9±0.43ms        ? ?/sec    1.00     14.4±0.55ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.4±0.07ms        ? ?/sec    1.00      5.4±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     12.9±0.40ms        ? ?/sec    1.00     12.9±0.35ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.06     24.2±0.67ms        ? ?/sec    1.00     22.9±0.43ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.03      5.7±0.13ms        ? ?/sec    1.00      5.5±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.01      4.9±0.07ms        ? ?/sec    1.00      4.8±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.01      3.5±0.04ms        ? ?/sec    1.00      3.5±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    866.4±2.49µs        ? ?/sec    1.01    872.6±2.22µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.01      5.2±0.07ms        ? ?/sec    1.00      5.2±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.02      6.2±0.07ms        ? ?/sec    1.00      6.1±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.03     22.2±0.65ms        ? ?/sec    1.00     21.5±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.17     28.8±1.08ms        ? ?/sec    1.00     24.6±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.01     23.2±0.28ms        ? ?/sec    1.00     23.1±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.03      2.8±0.04ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.04    124.8±0.78ms        ? ?/sec    1.00    120.3±0.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.03     99.0±1.03ms        ? ?/sec    1.00     96.1±0.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.04    147.3±1.23ms        ? ?/sec    1.00    141.9±1.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.02    278.5±8.61ms        ? ?/sec    1.00   272.7±14.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.01     27.5±0.45ms        ? ?/sec    1.00     27.2±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.03    109.7±0.89ms        ? ?/sec    1.00    106.1±0.64ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.05    107.8±0.87ms        ? ?/sec    1.00    103.0±0.49ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.03     19.2±0.16ms        ? ?/sec    1.00     18.7±0.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.02     22.6±0.40ms        ? ?/sec    1.00     22.3±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.04ms        ? ?/sec    1.01      7.0±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.01     11.5±0.19ms        ? ?/sec    1.00     11.3±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.02     21.0±0.34ms        ? ?/sec    1.00     20.7±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.02      5.4±0.10ms        ? ?/sec    1.00      5.3±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.05ms        ? ?/sec    1.00      5.7±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.4±0.04ms        ? ?/sec    1.00      4.4±0.03ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 783.1s
Peak memory 3.1 GiB
Avg memory 3.0 GiB
CPU user 705.5s
CPU sys 77.6s
Disk read 12.0 KiB
Disk write 1.3 GiB

branch

Metric Value
Wall time 789.6s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 722.8s
CPU sys 66.8s
Disk read 0 B
Disk write 171.4 MiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce per-byte overhead in VLQ integer decoding

2 participants