Skip to content

[Parquet] Improve dictionary decoder by unrolling loops#9662

Merged
Dandandan merged 9 commits intoapache:mainfrom
Dandandan:avoid-copy-bitreader-get-value
Apr 7, 2026
Merged

[Parquet] Improve dictionary decoder by unrolling loops#9662
Dandandan merged 9 commits intoapache:mainfrom
Dandandan:avoid-copy-bitreader-get-value

Conversation

@Dandandan
Copy link
Copy Markdown
Contributor

@Dandandan Dandandan commented Apr 4, 2026

Which issue does this PR close?

#9670

Rationale

Improve dictionary decoding by unrolling loads / OOB check (largest win -15% for i32) and optimizing the try_from_le_slice usage to load from u64 instead (small win, mostly readability).

  ┌──────────────────────────────────┬───────────────┬─────────┬─────────────┐
  │            Benchmark             │ upstream/main │ branch  │ Improvement │
  ├──────────────────────────────────┼───────────────┼─────────┼─────────────┤
  │ dictionary, mandatory, no NULLs  │ 58.5 µs       │ 49.5 µs │ -15.4%      │
  ├──────────────────────────────────┼───────────────┼─────────┼─────────────┤
  │ dictionary, optional, no NULLs   │ 61.0 µs       │ 52.2 µs │ -14.4%      │
  ├──────────────────────────────────┼───────────────┼─────────┼─────────────┤
  │ dictionary, optional, half NULLs │ 97.1 µs       │ 93.0 µs │ -4.2%       │
  └──────────────────────────────────┴───────────────┴─────────┴─────────────┘

What changes are included in this PR?

Are these changes tested?

Existing tests cover this path thoroughly (21 bit_util tests pass).

🤖 Generated with Claude Code

Add `from_u64` method to `FromBytes` trait to convert directly from the
u64 bit buffer, eliminating the intermediate `as_bytes()` → `try_from_le_slice()`
round-trip that copied through a byte slice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 4, 2026
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4186723531-770-2mx7w 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing avoid-copy-bitreader-get-value (0c1ec22) to ec771cc (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                             avoid-copy-bitreader-get-value         main
-----                                             ------------------------------         ----
arrow_reader_clickbench/async/Q1                  1.00   1098.9±3.79µs        ? ?/sec    1.00   1093.6±5.37µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.4±0.03ms        ? ?/sec    1.01      6.4±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.4±0.04ms        ? ?/sec    1.00      7.5±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.2±0.05ms        ? ?/sec    1.01     14.3±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.00     16.8±0.06ms        ? ?/sec    1.01     16.9±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.7±0.14ms        ? ?/sec    1.01     15.8±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.00      3.1±0.03ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     90.1±4.18ms        ? ?/sec    1.05     94.8±5.06ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.00    105.6±2.31ms        ? ?/sec    1.00   105.3±10.78ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.10    131.3±7.79ms        ? ?/sec    1.00    119.4±5.48ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.00    240.7±1.03ms        ? ?/sec    1.03    247.6±1.05ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.00     19.1±0.09ms        ? ?/sec    1.01     19.3±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.00     57.2±0.22ms        ? ?/sec    1.01     58.0±0.38ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.00     57.0±0.17ms        ? ?/sec    1.02     57.9±0.41ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.00     18.3±0.05ms        ? ?/sec    1.02     18.6±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.00     14.7±0.16ms        ? ?/sec    1.03     15.1±0.18ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.3±0.03ms        ? ?/sec    1.01      5.4±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.00     13.0±0.15ms        ? ?/sec    1.00     13.0±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.01     24.2±0.28ms        ? ?/sec    1.00     24.1±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.00      5.7±0.03ms        ? ?/sec    1.03      5.8±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.00      4.9±0.01ms        ? ?/sec    1.02      5.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.5±0.02ms        ? ?/sec    1.01      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1061.2±4.87µs        ? ?/sec    1.01   1070.9±8.57µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.2±0.03ms        ? ?/sec    1.02      6.3±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.2±0.04ms        ? ?/sec    1.01      7.3±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.00     14.1±0.03ms        ? ?/sec    1.02     14.3±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.00     16.6±0.05ms        ? ?/sec    1.01     16.8±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.00     15.6±0.04ms        ? ?/sec    1.01     15.8±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.00      3.0±0.03ms        ? ?/sec    1.01      3.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.00     71.0±0.25ms        ? ?/sec    1.02     72.3±0.45ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.00     79.9±0.25ms        ? ?/sec    1.01     81.0±0.43ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.00     97.0±0.31ms        ? ?/sec    1.02     99.0±0.64ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.03    229.5±0.45ms        ? ?/sec    1.00    221.8±3.76ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     18.8±0.06ms        ? ?/sec    1.01     19.0±0.15ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.00     56.4±0.28ms        ? ?/sec    1.02     57.3±0.45ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.00     56.4±0.21ms        ? ?/sec    1.01     57.2±0.32ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.0±0.06ms        ? ?/sec    1.01     18.2±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.00     14.3±0.16ms        ? ?/sec    1.02     14.6±0.15ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.00      5.3±0.02ms        ? ?/sec    1.00      5.3±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     12.5±0.16ms        ? ?/sec    1.03     12.9±0.14ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.00     22.7±0.18ms        ? ?/sec    1.04     23.6±0.26ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.5±0.07ms        ? ?/sec    1.02      5.6±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.8±0.02ms        ? ?/sec    1.01      4.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.01ms        ? ?/sec    1.02      3.5±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    870.2±3.32µs        ? ?/sec    1.00    871.3±2.43µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.00      5.1±0.03ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.0±0.02ms        ? ?/sec    1.00      6.0±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.00     21.5±0.05ms        ? ?/sec    1.01     21.6±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     29.1±1.13ms        ? ?/sec    1.03     30.0±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.01     23.0±0.05ms        ? ?/sec    1.00     22.8±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.00      2.7±0.02ms        ? ?/sec    1.01      2.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.00    123.4±0.30ms        ? ?/sec    1.01    124.4±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.00     92.1±0.21ms        ? ?/sec    1.02     93.7±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.01    140.4±3.72ms        ? ?/sec    1.00    139.5±0.27ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.00   279.4±15.31ms        ? ?/sec    1.02   285.2±16.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.00     26.8±0.08ms        ? ?/sec    1.01     27.0±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.00    106.6±0.25ms        ? ?/sec    1.02    108.4±0.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.00    104.9±0.51ms        ? ?/sec    1.01    105.6±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.00     18.7±0.04ms        ? ?/sec    1.00     18.7±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.00     22.1±0.05ms        ? ?/sec    1.00     22.1±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.8±0.04ms        ? ?/sec    1.01      6.9±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.00     11.3±0.03ms        ? ?/sec    1.00     11.3±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.00     20.7±0.07ms        ? ?/sec    1.00     20.7±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.2±0.02ms        ? ?/sec    1.02      5.3±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.03ms        ? ?/sec    1.01      5.7±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.3±0.02ms        ? ?/sec    1.02      4.4±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 784.3s
Peak memory 3.1 GiB
Avg memory 3.0 GiB
CPU user 707.3s
CPU sys 76.8s
Disk read 0 B
Disk write 825.3 MiB

branch

Metric Value
Wall time 780.8s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 710.2s
CPU sys 70.7s
Disk read 0 B
Disk write 171.4 MiB

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_reader

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4186793160-776-9ctgw 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing avoid-copy-bitreader-get-value (0c1ec22) to ec771cc (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to speed up bit-unpacking by avoiding intermediate byte-slice/array conversions in BitReader::get_value, introducing a direct conversion path from the internal u64 bit-buffer into the requested output type.

Changes:

  • Add FromBytes::from_u64(u64) -> Self to convert directly from the u64 bit buffer value.
  • Implement from_u64 for integer, float, and bool types (floats via from_bits).
  • Update BitReader::get_value to use Some(T::from_u64(v)) instead of converting via byte slices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jhorstmann
Copy link
Copy Markdown
Contributor

I would have expected the compiler to optimize these copies away.

Still might be a good idea, since it also simplifies the logic a bit. I don't quite like how the method is intentionally unsupported for some of the FromBytes implementations. What do you think of introducing a separate trait like FromBitpacked (similar to FromPrimitive, but also implemented for bool)?

@Dandandan
Copy link
Copy Markdown
Contributor Author

Dandandan commented Apr 4, 2026

I would have expected the compiler to optimize these copies away.

Still might be a good idea, since it also simplifies the logic a bit. I don't quite like how the method is intentionally unsupported for some of the FromBytes implementations. What do you think of introducing a separate trait like FromBitpacked (similar to FromPrimitive, but also implemented for bool)?

These methods are quite hot in DataFusion profiles, so I think they are likely not optimized away (I run some benchmarks and/or check generated code).

I don't quite like how the method is intentionally unsupported for some of the FromBytes implementations.
What do you think of introducing a separate trait like FromBitpacked (similar to FromPrimitive, but also implemented for bool)?

Agreed. I'll check!

Replace unreachable!() stubs for from_u64 on Int96/ByteArray/FixedLenByteArray
with a separate FromBitpacked trait that is only implemented for types that
can actually be converted from u64 (primitives, floats, bool). Also convert
assert! to debug_assert! for num_bits bounds checks in get_value/get_batch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor Author

Currently doesn't look like a performance change.
But it simplifies the code a bit I think :)

Use chunks_exact(8) to process dictionary index lookups in groups of 8,
allowing the compiler to unroll the inner loop and pipeline dependent
memory accesses. This gives ~12% improvement on dictionary-encoded reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan Dandandan changed the title Avoid byte slice copy in BitReader::get_value RLE decoder improvements Apr 4, 2026
Instead of bounds-checking each dictionary index individually, compute
the max index across the 8-element chunk and check once. This allows
using get_unchecked in the inner loop, improving dictionary-encoded
reads by ~17% (up from ~12% with per-element checks).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4187077648-788-xrxsg 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing avoid-copy-bitreader-get-value (ac31357) to ec771cc (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

Dandandan and others added 2 commits April 4, 2026 15:00
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace max()+assert with all(|i| i < len)+assert for the per-chunk
dictionary bounds check. This avoids the reduction and allows the
compiler to vectorize 8 independent comparisons, improving dictionary
reads by ~19% over baseline (vs ~17% with max).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                             avoid-copy-bitreader-get-value         main
-----                                             ------------------------------         ----
arrow_reader_clickbench/async/Q1                  1.00   1092.2±5.91µs        ? ?/sec    1.00   1091.2±5.33µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.5±0.05ms        ? ?/sec    1.03      6.7±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.5±0.04ms        ? ?/sec    1.02      7.6±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.3±0.09ms        ? ?/sec    1.03     14.7±0.12ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.00     16.8±0.07ms        ? ?/sec    1.04     17.5±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.7±0.06ms        ? ?/sec    1.03     16.3±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.00      3.0±0.02ms        ? ?/sec    1.04      3.1±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     91.0±1.33ms        ? ?/sec    1.06     96.3±3.21ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.00    98.7±11.04ms        ? ?/sec    1.11    109.4±0.85ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    133.5±7.82ms        ? ?/sec    1.03    137.3±8.92ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.00    242.0±0.75ms        ? ?/sec    1.03    249.2±3.26ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.00     19.3±0.11ms        ? ?/sec    1.02     19.7±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.00     57.1±0.17ms        ? ?/sec    1.04     59.4±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.00     57.1±0.14ms        ? ?/sec    1.04     59.5±0.54ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.00     18.4±0.07ms        ? ?/sec    1.01     18.7±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.00     15.2±0.13ms        ? ?/sec    1.04     15.8±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.4±0.02ms        ? ?/sec    1.01      5.4±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.00     13.5±0.12ms        ? ?/sec    1.03     13.9±0.28ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.00     24.0±0.17ms        ? ?/sec    1.07     25.8±0.59ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.00      5.7±0.04ms        ? ?/sec    1.05      6.0±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.00      5.0±0.03ms        ? ?/sec    1.01      5.1±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.5±0.02ms        ? ?/sec    1.01      3.6±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.01   1073.7±5.82µs        ? ?/sec    1.00   1059.7±4.38µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.4±0.05ms        ? ?/sec    1.01      6.5±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.4±0.04ms        ? ?/sec    1.02      7.5±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.00     14.4±0.08ms        ? ?/sec    1.01     14.5±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.00     16.9±0.08ms        ? ?/sec    1.01     17.0±0.13ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.00     15.8±0.13ms        ? ?/sec    1.01     15.9±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.00      2.9±0.03ms        ? ?/sec    1.04      3.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.00     71.1±0.18ms        ? ?/sec    1.03     73.2±0.58ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.00     79.6±0.17ms        ? ?/sec    1.03     81.7±0.66ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.00     96.7±0.23ms        ? ?/sec    1.04    100.3±0.75ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    230.5±0.66ms        ? ?/sec    1.03    238.1±1.53ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     19.3±0.09ms        ? ?/sec    1.02     19.6±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.00     57.1±0.12ms        ? ?/sec    1.02     58.2±0.67ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.00     57.2±0.12ms        ? ?/sec    1.02     58.1±0.74ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.2±0.08ms        ? ?/sec    1.01     18.5±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.00     14.9±0.11ms        ? ?/sec    1.01     15.1±0.30ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.00      5.3±0.02ms        ? ?/sec    1.01      5.3±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     13.2±0.08ms        ? ?/sec    1.01     13.4±0.25ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.00     23.3±0.12ms        ? ?/sec    1.05     24.5±0.62ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.4±0.03ms        ? ?/sec    1.04      5.6±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.8±0.03ms        ? ?/sec    1.03      4.9±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.03ms        ? ?/sec    1.01      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    874.4±2.73µs        ? ?/sec    1.01    879.5±4.06µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.01      5.1±0.05ms        ? ?/sec    1.00      5.1±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.0±0.02ms        ? ?/sec    1.01      6.1±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.00     21.7±0.06ms        ? ?/sec    1.00     21.8±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     29.6±1.26ms        ? ?/sec    1.04     30.8±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.00     23.2±0.08ms        ? ?/sec    1.01     23.4±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.00      2.6±0.02ms        ? ?/sec    1.04      2.8±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.00    123.2±0.23ms        ? ?/sec    1.03    126.4±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.00     92.0±0.11ms        ? ?/sec    1.04     95.4±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.00    140.5±3.74ms        ? ?/sec    1.00    140.6±0.88ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.00   279.7±15.22ms        ? ?/sec    1.03   286.7±16.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.00     27.3±0.08ms        ? ?/sec    1.01     27.6±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.00    107.1±0.28ms        ? ?/sec    1.04    111.3±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.00    104.3±0.11ms        ? ?/sec    1.04    109.0±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.00     18.7±0.05ms        ? ?/sec    1.02     19.2±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.00     22.6±0.09ms        ? ?/sec    1.00     22.5±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.02ms        ? ?/sec    1.00      6.8±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.01     11.6±0.03ms        ? ?/sec    1.00     11.5±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.00     21.1±0.03ms        ? ?/sec    1.01     21.3±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.2±0.03ms        ? ?/sec    1.03      5.3±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.7±0.02ms        ? ?/sec    1.00      5.7±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.3±0.02ms        ? ?/sec    1.02      4.4±0.04ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 792.0s
Peak memory 3.1 GiB
Avg memory 3.0 GiB
CPU user 703.2s
CPU sys 88.6s
Disk read 0 B
Disk write 2.0 GiB

branch

Metric Value
Wall time 789.1s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 715.6s
CPU sys 73.6s
Disk read 0 B
Disk write 171.3 MiB

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor Author

Dandandan commented Apr 4, 2026

This now yields -20% on my machine for dictionary decoding i32.

@Dandandan Dandandan changed the title RLE decoder improvements Dictionary decoder improvements Apr 4, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over this carefully and it makes sense to me. Thank you @Dandandan and @jhorstmann

@alamb alamb changed the title Dictionary decoder improvements [Parquet] Improve dictionary decoder by unrolling loops Apr 6, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 6, 2026

run benchmark arrow_reader_clickbench

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 6, 2026

run benchmark arrow_reader

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4194750170-910-5vs6f 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing avoid-copy-bitreader-get-value (bdbb3a6) to ec771cc (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4194750518-911-phvk8 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing avoid-copy-bitreader-get-value (bdbb3a6) to ec771cc (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                             avoid-copy-bitreader-get-value         main
-----                                             ------------------------------         ----
arrow_reader_clickbench/async/Q1                  1.00  1087.9±10.89µs        ? ?/sec    1.01   1094.4±3.98µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.5±0.06ms        ? ?/sec    1.00      6.5±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.6±0.07ms        ? ?/sec    1.01      7.7±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.4±0.06ms        ? ?/sec    1.00     14.4±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.00     16.9±0.09ms        ? ?/sec    1.01     17.1±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.8±0.11ms        ? ?/sec    1.03     16.3±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.00      3.0±0.02ms        ? ?/sec    1.05      3.1±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.03     96.4±1.42ms        ? ?/sec    1.00     93.7±1.18ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.01   103.2±11.93ms        ? ?/sec    1.00   102.1±11.34ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.02    134.1±7.08ms        ? ?/sec    1.00   131.0±10.29ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.00    241.6±1.30ms        ? ?/sec    1.01    245.1±0.98ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.01     19.5±0.08ms        ? ?/sec    1.00     19.4±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.00     57.2±0.17ms        ? ?/sec    1.01     57.7±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.00     57.3±0.13ms        ? ?/sec    1.01     57.6±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.00     18.3±0.05ms        ? ?/sec    1.01     18.5±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.00     15.0±0.12ms        ? ?/sec    1.06     16.0±0.33ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.4±0.08ms        ? ?/sec    1.00      5.4±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.00     13.5±0.10ms        ? ?/sec    1.02     13.8±0.30ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.00     24.0±0.20ms        ? ?/sec    1.05     25.2±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.00      5.6±0.05ms        ? ?/sec    1.03      5.7±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.00      4.9±0.03ms        ? ?/sec    1.02      5.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.5±0.03ms        ? ?/sec    1.02      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1060.2±4.35µs        ? ?/sec    1.01   1067.6±7.59µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.3±0.05ms        ? ?/sec    1.01      6.4±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.3±0.04ms        ? ?/sec    1.01      7.3±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.00     14.2±0.06ms        ? ?/sec    1.00     14.2±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.00     16.9±0.12ms        ? ?/sec    1.00     16.9±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.00     15.7±0.10ms        ? ?/sec    1.00     15.7±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.00      2.9±0.03ms        ? ?/sec    1.06      3.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.00     70.9±0.15ms        ? ?/sec    1.02     72.3±0.50ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.00     79.6±0.18ms        ? ?/sec    1.01     80.5±0.37ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.02     99.3±3.72ms        ? ?/sec    1.00     97.4±0.33ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    218.8±0.83ms        ? ?/sec    1.02    222.5±7.00ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     19.0±0.08ms        ? ?/sec    1.01     19.1±0.12ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.00     56.0±0.15ms        ? ?/sec    1.01     56.6±0.40ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.00     56.5±0.23ms        ? ?/sec    1.02     57.4±0.18ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.1±0.06ms        ? ?/sec    1.01     18.2±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.00     14.7±0.15ms        ? ?/sec    1.03     15.1±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.00      5.2±0.04ms        ? ?/sec    1.01      5.3±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     12.8±0.12ms        ? ?/sec    1.06     13.6±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.00     23.2±0.22ms        ? ?/sec    1.07     24.8±0.62ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.4±0.09ms        ? ?/sec    1.02      5.5±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.7±0.03ms        ? ?/sec    1.03      4.8±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.01ms        ? ?/sec    1.03      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    865.3±3.08µs        ? ?/sec    1.01    876.7±2.54µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.00      5.1±0.03ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.0±0.02ms        ? ?/sec    1.00      6.0±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.01     21.7±0.07ms        ? ?/sec    1.00     21.6±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     29.3±1.16ms        ? ?/sec    1.03     30.1±0.20ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.01     23.1±0.09ms        ? ?/sec    1.00     22.9±0.09ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.00      2.6±0.02ms        ? ?/sec    1.06      2.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.00    122.1±0.34ms        ? ?/sec    1.00    122.5±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.00     91.6±0.13ms        ? ?/sec    1.01     92.7±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.06    145.2±1.51ms        ? ?/sec    1.00    136.6±1.73ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.00   277.6±15.23ms        ? ?/sec    1.00   278.1±14.90ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.00     26.9±0.06ms        ? ?/sec    1.00     27.0±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.00    105.9±0.12ms        ? ?/sec    1.02    108.0±0.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.00    104.4±0.36ms        ? ?/sec    1.01    105.2±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.00     18.5±0.04ms        ? ?/sec    1.02     18.9±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.00     22.3±0.12ms        ? ?/sec    1.00     22.3±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.01      6.9±0.02ms        ? ?/sec    1.00      6.8±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.00     11.5±0.02ms        ? ?/sec    1.01     11.6±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.00     20.9±0.06ms        ? ?/sec    1.01     21.2±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.1±0.02ms        ? ?/sec    1.02      5.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.02ms        ? ?/sec    1.01      5.6±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.3±0.02ms        ? ?/sec    1.02      4.4±0.04ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 780.1s
Peak memory 3.1 GiB
Avg memory 3.0 GiB
CPU user 703.4s
CPU sys 76.5s
Peak spill 0 B

branch

Metric Value
Wall time 786.2s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 711.8s
CPU sys 74.3s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4198097179-922-7p4rl 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing avoid-copy-bitreader-get-value (995041f) to ec771cc (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

Change assert! to debug_assert! in read_num_bytes, RleEncoder::new_from_buf,
and RleDecoder::get_batch_with_dict where the checks are either redundant
(subsequent indexing already panics) or in cold constructor code. Verified
via cargo-asm that this removes instructions from hot paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                             avoid-copy-bitreader-get-value         main
-----                                             ------------------------------         ----
arrow_reader_clickbench/async/Q1                  1.00   1092.0±9.07µs        ? ?/sec    1.00   1092.4±5.24µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.03      6.6±0.13ms        ? ?/sec    1.00      6.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.04      7.8±0.12ms        ? ?/sec    1.00      7.5±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.3±0.15ms        ? ?/sec    1.00     14.3±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.01     17.1±0.20ms        ? ?/sec    1.00     16.9±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.6±0.08ms        ? ?/sec    1.01     15.8±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.00      3.0±0.03ms        ? ?/sec    1.05      3.1±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.03     96.3±1.15ms        ? ?/sec    1.00     93.1±1.73ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.02    109.6±6.58ms        ? ?/sec    1.00    107.3±1.51ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    136.4±1.50ms        ? ?/sec    1.01    138.2±5.72ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.00    243.2±1.43ms        ? ?/sec    1.03    249.2±1.47ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.00     19.2±0.25ms        ? ?/sec    1.02     19.7±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.00     57.3±0.30ms        ? ?/sec    1.02     58.7±0.58ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.00     57.1±0.26ms        ? ?/sec    1.02     58.3±0.65ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.00     18.3±0.10ms        ? ?/sec    1.02     18.5±0.13ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.00     15.5±0.33ms        ? ?/sec    1.00     15.5±0.63ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.4±0.04ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.00     13.3±0.27ms        ? ?/sec    1.01     13.5±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.00     24.2±0.35ms        ? ?/sec    1.04     25.1±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.00      5.6±0.03ms        ? ?/sec    1.03      5.8±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.00      5.0±0.04ms        ? ?/sec    1.02      5.1±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.5±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.01   1080.1±5.19µs        ? ?/sec    1.00   1067.6±6.87µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.4±0.03ms        ? ?/sec    1.04      6.6±0.16ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.4±0.09ms        ? ?/sec    1.01      7.5±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.00     14.2±0.18ms        ? ?/sec    1.01     14.3±0.21ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.00     16.5±0.04ms        ? ?/sec    1.03     16.9±0.26ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.00     15.4±0.05ms        ? ?/sec    1.01     15.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.00      2.9±0.02ms        ? ?/sec    1.05      3.0±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.00     71.2±0.40ms        ? ?/sec    1.02     72.5±0.52ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.00     80.1±0.40ms        ? ?/sec    1.02     81.5±0.65ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.00     96.7±0.50ms        ? ?/sec    1.03    100.0±0.48ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    212.9±1.43ms        ? ?/sec    1.07    227.1±0.58ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     18.9±0.11ms        ? ?/sec    1.00     18.8±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.00     56.8±0.42ms        ? ?/sec    1.01     57.3±0.62ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.00     56.4±0.47ms        ? ?/sec    1.02     57.6±0.67ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     17.8±0.08ms        ? ?/sec    1.03     18.3±0.14ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.02     14.9±0.38ms        ? ?/sec    1.00     14.6±0.36ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.3±0.04ms        ? ?/sec    1.00      5.3±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     13.0±0.43ms        ? ?/sec    1.03     13.4±0.35ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.00     23.3±0.53ms        ? ?/sec    1.02     23.8±0.65ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.4±0.09ms        ? ?/sec    1.02      5.5±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.8±0.03ms        ? ?/sec    1.01      4.8±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.02ms        ? ?/sec    1.01      3.4±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    871.2±2.30µs        ? ?/sec    1.00    873.3±6.56µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.00      5.1±0.03ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.0±0.04ms        ? ?/sec    1.00      6.0±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.00     21.5±0.16ms        ? ?/sec    1.01     21.8±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     29.6±1.19ms        ? ?/sec    1.02     30.1±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.00     22.9±0.14ms        ? ?/sec    1.01     23.1±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.00      2.6±0.02ms        ? ?/sec    1.06      2.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.00    122.0±0.36ms        ? ?/sec    1.03    125.1±0.64ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.00     92.6±0.24ms        ? ?/sec    1.01     93.7±0.41ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.02    143.3±1.42ms        ? ?/sec    1.00    140.5±3.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.00   281.3±15.12ms        ? ?/sec    1.00   281.7±14.78ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.00     26.9±0.29ms        ? ?/sec    1.00     26.9±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.00    107.0±0.56ms        ? ?/sec    1.02    109.0±1.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.00    104.2±0.65ms        ? ?/sec    1.01    105.4±0.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.00     18.5±0.18ms        ? ?/sec    1.00     18.5±0.09ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.03     22.7±0.33ms        ? ?/sec    1.00     22.1±0.20ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.02ms        ? ?/sec    1.00      6.9±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.00     11.5±0.20ms        ? ?/sec    1.00     11.5±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.02     21.0±0.32ms        ? ?/sec    1.00     20.5±0.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.1±0.02ms        ? ?/sec    1.02      5.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.02ms        ? ?/sec    1.00      5.6±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.3±0.02ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 786.1s
Peak memory 3.1 GiB
Avg memory 3.0 GiB
CPU user 704.6s
CPU sys 81.5s
Peak spill 0 B

branch

Metric Value
Wall time 785.8s
Peak memory 3.3 GiB
Avg memory 3.1 GiB
CPU user 711.0s
CPU sys 75.2s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan Dandandan merged commit 6f02fcf into apache:main Apr 7, 2026
19 checks passed
@Dandandan
Copy link
Copy Markdown
Contributor Author

Thanks @alamb improving Parquet decoding perf piece by piece 🚀

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 7, 2026

Thanks @alamb improving Parquet decoding perf piece by piece 🚀

Indeed -- step by step we'll have this thing screaming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants