Skip to content

perf(parquet): LevelInfoBuilder batch write when no repetition childs#10037

Open
mapleFU wants to merge 3 commits into
apache:mainfrom
mapleFU:minor-optimize-batching
Open

perf(parquet): LevelInfoBuilder batch write when no repetition childs#10037
mapleFU wants to merge 3 commits into
apache:mainfrom
mapleFU:minor-optimize-batching

Conversation

@mapleFU
Copy link
Copy Markdown
Member

@mapleFU mapleFU commented May 29, 2026

Which issue does this PR close?

Rationale for this change

Parquet writer writes lists element one by one, this is extremly slow. This patch batches writes.

What changes are included in this PR?

Batches write when writing list with maximum rep level.

Are these changes tested?

Covered by existing

Are there any user-facing changes?

No

@github-actions github-actions Bot added the parquet Changes to the parquet crate label May 29, 2026
/// Returns `true` if the child contains no nested repetition levels, meaning
/// each child element produces exactly one rep_level entry in the leaf.
/// This is true for `Primitive` children and `Struct` trees with no list descendants.
fn child_has_no_nested_rep(&self) -> bool {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be stored as an member of List element, avoiding querying nested.

// contiguous non-empty list slots into a single child.write() call, then
// fix up the rep_levels at list-slot boundaries using offsets directly.
if child.child_has_no_nested_rep() {
Self::write_list_last_level(child, ctx, offsets, nulls, range);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't works well on the case of List<Struct<a: List<i32>, b:i32, ...>, once any child has repetition level, the performance would be hurted and cannot batching the write.


// Classify each slot then detect run boundaries. On each transition
// (or end of iteration), flush the completed run.
#[derive(Clone, Copy, PartialEq)]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code here is much cleaner...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_primitive_sparse_99pct_null/bloom_filter 1.00 10.9±0.05ms 3.4 GB/sec 1.19 13.0±0.04ms 2.8 GB/sec
list_primitive_sparse_99pct_null/cdc 1.00 22.7±0.11ms 1645.6 MB/sec 1.09 24.7±0.07ms 1510.1 MB/sec
list_primitive_sparse_99pct_null/default 1.00 10.6±0.08ms 3.5 GB/sec 1.21 12.8±0.65ms 2.9 GB/sec
list_primitive_sparse_99pct_null/parquet_2 1.00 10.6±0.12ms 3.4 GB/sec 1.19 12.7±0.02ms 2.9 GB/sec
list_primitive_sparse_99pct_null/zstd 1.00 12.5±0.11ms 2.9 GB/sec 1.16 14.5±0.03ms 2.5 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2 1.00 10.8±0.08ms 3.4 GB/sec 1.18 12.8±0.03ms 2.8 GB/sec

I also reproduced this regression on my local env. Then I tried to change this back to the previous plain if-else chain flavor, and the regression seems gone. I wonder if this three-value enum breaks a tight if(false) loop that can be optimized/executed better 🥲

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, so this path can be optimized

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a fast path to LevelInfoBuilder::write_list that batches contiguous non-empty list slots into a single child write when the list child has no nested repetition levels (i.e., primitive children or struct trees of primitives). Previously, non-empty list slots were written one at a time with a reverse scan to mark list boundaries; the new path writes all leaf values in one call and stamps list-start markers at positions computed directly from offsets. This also benefits Map arrays (which internally dispatch to write_list) when their entries are primitive/struct.

Changes:

  • New helper child_has_no_nested_rep() to detect whether the list's child path contains list-like nesting.
  • New write_list_last_level() that classifies slots into Null/Empty/NonEmpty runs and emits each kind in a batch.
  • write_list dispatches to the new fast path when applicable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented May 29, 2026

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4576955840-369-qsnrm 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing minor-optimize-batching (dd6d44e) to 1377761 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   minor-optimize-batching
-----                                              ----                                   -----------------------
bool/bloom_filter                                  1.00     13.0±0.03ms    19.2 MB/sec    1.01     13.2±0.05ms    19.0 MB/sec
bool/cdc                                           1.00     15.7±0.05ms    15.9 MB/sec    1.02     16.1±0.05ms    15.6 MB/sec
bool/default                                       1.00     10.9±0.02ms    22.9 MB/sec    1.01     11.0±0.05ms    22.7 MB/sec
bool/parquet_2                                     1.00     14.7±0.04ms    17.0 MB/sec    1.01     14.8±0.04ms    16.9 MB/sec
bool/zstd                                          1.00     11.4±0.03ms    21.9 MB/sec    1.01     11.5±0.05ms    21.8 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.03ms    16.6 MB/sec    1.01     15.2±0.06ms    16.4 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.02ms    17.9 MB/sec    1.00      7.0±0.02ms    17.9 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.05ms    18.5 MB/sec    1.00      6.8±0.03ms    18.5 MB/sec
bool_non_null/default                              1.00      4.2±0.02ms    29.6 MB/sec    1.00      4.2±0.02ms    29.6 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    14.0 MB/sec    1.01      9.1±0.03ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.3 MB/sec    1.00      4.6±0.02ms    27.3 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.4±0.02ms    13.4 MB/sec    1.01      9.5±0.04ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     92.8±0.24ms   150.8 MB/sec    1.00     93.0±0.24ms   150.5 MB/sec
float_with_nans/cdc                                1.00     81.5±0.16ms   171.9 MB/sec    1.00     81.7±0.17ms   171.4 MB/sec
float_with_nans/default                            1.00     74.2±0.15ms   188.8 MB/sec    1.00     74.1±0.14ms   188.9 MB/sec
float_with_nans/parquet_2                          1.00     94.9±0.39ms   147.6 MB/sec    1.00     94.5±0.29ms   148.2 MB/sec
float_with_nans/zstd                               1.00    112.0±0.31ms   125.0 MB/sec    1.00    112.0±0.17ms   125.0 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.5±0.23ms   106.4 MB/sec    1.00    131.7±0.24ms   106.3 MB/sec
large_string_non_null/bloom_filter                 1.01     83.6±0.15ms     3.0 GB/sec    1.00     82.7±0.15ms     3.0 GB/sec
large_string_non_null/cdc                          1.00    244.1±1.16ms  1048.8 MB/sec    1.00    243.8±1.42ms  1049.9 MB/sec
large_string_non_null/default                      1.01     63.9±0.14ms     3.9 GB/sec    1.00     63.2±0.15ms     4.0 GB/sec
large_string_non_null/parquet_2                    1.01     63.9±0.16ms     3.9 GB/sec    1.00     63.2±0.14ms     4.0 GB/sec
large_string_non_null/zstd                         1.01     64.0±0.15ms     3.9 GB/sec    1.00     63.2±0.14ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2               1.01     63.9±0.13ms     3.9 GB/sec    1.00     63.3±0.17ms     4.0 GB/sec
list_primitive/bloom_filter                        1.09    354.3±1.16ms  1539.3 MB/sec    1.00    326.3±1.15ms  1671.4 MB/sec
list_primitive/cdc                                 1.11    372.2±9.28ms  1465.2 MB/sec    1.00    334.8±9.19ms  1629.1 MB/sec
list_primitive/default                             1.14    275.6±3.64ms  1978.7 MB/sec    1.00   242.8±10.96ms     2.2 GB/sec
list_primitive/parquet_2                           1.15    296.4±0.46ms  1839.7 MB/sec    1.00    258.0±6.18ms     2.1 GB/sec
list_primitive/zstd                                1.06    514.8±2.45ms  1059.5 MB/sec    1.00    486.1±2.77ms  1122.0 MB/sec
list_primitive/zstd_parquet_2                      1.07    499.6±0.47ms  1091.6 MB/sec    1.00    468.5±5.80ms  1164.1 MB/sec
list_primitive_non_null/bloom_filter               1.05   442.2±10.12ms  1230.7 MB/sec    1.00    421.6±8.76ms  1290.8 MB/sec
list_primitive_non_null/cdc                        1.06   435.1±10.09ms  1250.9 MB/sec    1.00    409.8±7.44ms  1328.1 MB/sec
list_primitive_non_null/default                    1.08   308.4±13.91ms  1764.7 MB/sec    1.00   284.4±13.46ms  1913.9 MB/sec
list_primitive_non_null/parquet_2                  1.23    357.1±4.16ms  1524.1 MB/sec    1.00   290.7±34.74ms  1872.1 MB/sec
list_primitive_non_null/zstd                       1.01   697.5±13.36ms   780.3 MB/sec    1.00   690.0±18.66ms   788.8 MB/sec
list_primitive_non_null/zstd_parquet_2             1.07   710.4±18.01ms   766.1 MB/sec    1.00    664.5±1.92ms   819.0 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     10.9±0.05ms     3.4 GB/sec    1.19     13.0±0.04ms     2.8 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.7±0.11ms  1645.6 MB/sec    1.09     24.7±0.07ms  1510.1 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.6±0.08ms     3.5 GB/sec    1.21     12.8±0.65ms     2.9 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.6±0.12ms     3.4 GB/sec    1.19     12.7±0.02ms     2.9 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.5±0.11ms     2.9 GB/sec    1.16     14.5±0.03ms     2.5 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.8±0.08ms     3.4 GB/sec    1.18     12.8±0.03ms     2.8 GB/sec
primitive/bloom_filter                             1.00    150.5±0.33ms   298.2 MB/sec    1.00    149.9±0.58ms   299.3 MB/sec
primitive/cdc                                      1.00    158.7±0.51ms   282.8 MB/sec    1.00    158.6±0.65ms   282.9 MB/sec
primitive/default                                  1.00    118.1±0.26ms   380.1 MB/sec    1.00    118.1±0.43ms   380.0 MB/sec
primitive/parquet_2                                1.00    132.9±0.19ms   337.6 MB/sec    1.00    133.1±0.28ms   337.1 MB/sec
primitive/zstd                                     1.00    147.6±0.25ms   304.0 MB/sec    1.00    147.7±0.41ms   303.9 MB/sec
primitive/zstd_parquet_2                           1.00    166.3±0.20ms   269.8 MB/sec    1.00    166.4±0.30ms   269.7 MB/sec
primitive_all_null/bloom_filter                    1.00    893.1±1.91µs    49.1 GB/sec    1.00    889.3±2.40µs    49.3 GB/sec
primitive_all_null/cdc                             1.00     19.1±0.38ms     2.3 GB/sec    1.00     19.2±0.26ms     2.3 GB/sec
primitive_all_null/default                         1.00    273.0±0.74µs   160.6 GB/sec    1.00    272.5±0.77µs   160.8 GB/sec
primitive_all_null/parquet_2                       1.01    280.7±0.91µs   156.1 GB/sec    1.00    279.1±1.15µs   157.0 GB/sec
primitive_all_null/zstd                            1.01    390.4±0.72µs   112.3 GB/sec    1.00    385.8±0.72µs   113.6 GB/sec
primitive_all_null/zstd_parquet_2                  1.00    357.2±0.85µs   122.7 GB/sec    1.00    356.7±1.07µs   122.9 GB/sec
primitive_non_null/bloom_filter                    1.02    109.1±0.47ms   403.2 MB/sec    1.00    106.8±0.40ms   412.1 MB/sec
primitive_non_null/cdc                             1.00     90.1±0.26ms   488.3 MB/sec    1.00     90.0±0.36ms   489.0 MB/sec
primitive_non_null/default                         1.01     67.3±0.18ms   653.4 MB/sec    1.00     66.8±0.22ms   658.7 MB/sec
primitive_non_null/parquet_2                       1.00     88.9±0.15ms   494.8 MB/sec    1.00     88.6±0.28ms   496.4 MB/sec
primitive_non_null/zstd                            1.00     98.1±0.15ms   448.7 MB/sec    1.00     97.7±0.25ms   450.3 MB/sec
primitive_non_null/zstd_parquet_2                  1.00    122.9±0.15ms   358.1 MB/sec    1.00    122.4±0.31ms   359.5 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     11.8±0.06ms     3.7 GB/sec    1.00     11.9±0.06ms     3.7 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     29.2±0.32ms  1538.5 MB/sec    1.00     29.3±0.29ms  1531.4 MB/sec
primitive_sparse_99pct_null/default                1.00     10.5±0.03ms     4.2 GB/sec    1.00     10.5±0.04ms     4.2 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     10.5±0.03ms     4.2 GB/sec    1.01     10.6±0.10ms     4.1 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     13.8±0.04ms     3.2 GB/sec    1.00     13.8±0.04ms     3.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     12.4±0.04ms     3.5 GB/sec    1.00     12.4±0.04ms     3.5 GB/sec
short_string_non_null/bloom_filter                 1.02     28.3±0.07ms   423.9 MB/sec    1.00     27.7±0.05ms   433.1 MB/sec
short_string_non_null/cdc                          1.00     19.9±0.06ms   604.2 MB/sec    1.00     19.8±0.07ms   606.1 MB/sec
short_string_non_null/default                      1.01     15.7±0.04ms   764.0 MB/sec    1.00     15.6±0.04ms   768.8 MB/sec
short_string_non_null/parquet_2                    1.00     25.6±0.07ms   468.5 MB/sec    1.00     25.6±0.04ms   468.7 MB/sec
short_string_non_null/zstd                         1.00     35.4±0.10ms   339.5 MB/sec    1.00     35.2±0.09ms   340.6 MB/sec
short_string_non_null/zstd_parquet_2               1.01     28.4±0.05ms   422.2 MB/sec    1.00     28.3±0.04ms   424.5 MB/sec
string/bloom_filter                                1.04   223.1±22.32ms     2.3 GB/sec    1.00    214.2±9.41ms     2.4 GB/sec
string/cdc                                         1.00    215.8±6.79ms     2.4 GB/sec    1.01    218.0±9.41ms     2.3 GB/sec
string/default                                     1.04   122.7±19.05ms     4.2 GB/sec    1.00   117.5±12.23ms     4.4 GB/sec
string/parquet_2                                   1.00    108.8±5.33ms     4.7 GB/sec    1.14    123.7±2.20ms     4.1 GB/sec
string/zstd                                        1.00    418.0±1.56ms  1254.1 MB/sec    1.07   447.0±18.64ms  1172.7 MB/sec
string/zstd_parquet_2                              1.00   411.2±11.80ms  1275.0 MB/sec    1.00   412.7±15.69ms  1270.3 MB/sec
string_and_binary_view/bloom_filter                1.00     65.1±0.50ms   495.4 MB/sec    1.01     66.0±0.40ms   488.7 MB/sec
string_and_binary_view/cdc                         1.00     58.5±0.13ms   551.2 MB/sec    1.01     59.2±0.15ms   544.9 MB/sec
string_and_binary_view/default                     1.00     48.6±0.10ms   663.9 MB/sec    1.00     48.4±0.11ms   665.7 MB/sec
string_and_binary_view/parquet_2                   1.00     59.2±0.14ms   544.6 MB/sec    1.01     59.9±0.18ms   538.0 MB/sec
string_and_binary_view/zstd                        1.00     84.5±0.18ms   381.6 MB/sec    1.01     85.0±0.13ms   379.3 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.8±0.14ms   442.9 MB/sec    1.01     73.7±0.15ms   437.6 MB/sec
string_dictionary/bloom_filter                     1.00     89.9±0.65ms     2.9 GB/sec    1.03     92.3±0.51ms     2.8 GB/sec
string_dictionary/cdc                              1.00     53.2±0.50ms     4.8 GB/sec    1.01     53.8±1.26ms     4.8 GB/sec
string_dictionary/default                          1.00     48.9±0.40ms     5.3 GB/sec    1.02     49.8±0.43ms     5.2 GB/sec
string_dictionary/parquet_2                        1.00     54.1±0.19ms     4.8 GB/sec    1.01     54.7±0.31ms     4.7 GB/sec
string_dictionary/zstd                             1.00    209.0±0.85ms  1263.6 MB/sec    1.00    209.7±0.38ms  1259.5 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.2±0.11ms  1332.5 MB/sec    1.01    199.5±0.19ms  1323.9 MB/sec
string_non_null/bloom_filter                       1.00   250.0±14.83ms     2.0 GB/sec    1.02   253.8±16.58ms     2.0 GB/sec
string_non_null/cdc                                1.01   273.5±12.93ms  1915.6 MB/sec    1.00   270.7±11.05ms  1935.7 MB/sec
string_non_null/default                            1.00   134.0±13.52ms     3.8 GB/sec    1.00   133.4±13.85ms     3.8 GB/sec
string_non_null/parquet_2                          1.05    138.2±5.35ms     3.7 GB/sec    1.00    132.0±4.49ms     3.9 GB/sec
string_non_null/zstd                               1.00   565.3±13.06ms   926.9 MB/sec    1.00   564.7±14.28ms   927.9 MB/sec
string_non_null/zstd_parquet_2                     1.00    504.7±4.41ms  1038.3 MB/sec    1.02    515.8±5.89ms  1015.8 MB/sec
struct_all_null/bloom_filter                       1.01    374.8±1.10µs    42.0 GB/sec    1.00    372.5±1.36µs    42.3 GB/sec
struct_all_null/cdc                                1.00      7.8±0.11ms     2.0 GB/sec    1.01      7.9±0.18ms  2034.3 MB/sec
struct_all_null/default                            1.00    118.8±0.31µs   132.5 GB/sec    1.00    118.4±0.24µs   133.0 GB/sec
struct_all_null/parquet_2                          1.00    120.4±0.52µs   130.8 GB/sec    1.00    120.2±0.32µs   131.1 GB/sec
struct_all_null/zstd                               1.00    166.4±0.91µs    94.7 GB/sec    1.00    166.5±0.61µs    94.6 GB/sec
struct_all_null/zstd_parquet_2                     1.00    153.3±0.54µs   102.8 GB/sec    1.00    153.6±0.65µs   102.6 GB/sec
struct_non_null/bloom_filter                       1.02     46.7±0.14ms   342.3 MB/sec    1.00     45.8±0.17ms   349.3 MB/sec
struct_non_null/cdc                                1.01     45.5±0.17ms   351.5 MB/sec    1.00     44.9±0.18ms   356.0 MB/sec
struct_non_null/default                            1.02     32.1±0.13ms   498.6 MB/sec    1.00     31.4±0.15ms   509.0 MB/sec
struct_non_null/parquet_2                          1.02     40.9±0.11ms   391.4 MB/sec    1.00     40.2±0.15ms   398.2 MB/sec
struct_non_null/zstd                               1.01     40.7±0.11ms   393.3 MB/sec    1.00     40.4±0.15ms   395.8 MB/sec
struct_non_null/zstd_parquet_2                     1.01     54.9±0.11ms   291.5 MB/sec    1.00     54.5±0.17ms   293.5 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      6.4±0.05ms     2.5 GB/sec    1.00      6.4±0.02ms     2.5 GB/sec
struct_sparse_99pct_null/cdc                       1.00     13.2±0.10ms  1222.0 MB/sec    1.01     13.3±0.08ms  1208.9 MB/sec
struct_sparse_99pct_null/default                   1.00      5.9±0.03ms     2.7 GB/sec    1.00      5.9±0.01ms     2.7 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      5.9±0.03ms     2.7 GB/sec    1.00      5.9±0.01ms     2.7 GB/sec
struct_sparse_99pct_null/zstd                      1.00      7.2±0.04ms     2.2 GB/sec    1.00      7.2±0.02ms     2.2 GB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      6.6±0.03ms     2.4 GB/sec    1.00      6.6±0.01ms     2.4 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 2090.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2018.6s
CPU sys 71.2s
Peak spill 0 B

branch

Metric Value
Wall time 2050.4s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 1971.4s
CPU sys 74.3s
Peak spill 0 B

File an issue against this benchmark runner

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 29, 2026

#10037 (comment)

Those look like nice improvements for list writing 🚀

group                                              main                                   minor-optimize-batching
-----                                              ----                                   -----------------------
...
list_primitive/bloom_filter                        1.09    354.3±1.16ms  1539.3 MB/sec    1.00    326.3±1.15ms  1671.4 MB/sec
list_primitive/cdc                                 1.11    372.2±9.28ms  1465.2 MB/sec    1.00    334.8±9.19ms  1629.1 MB/sec
list_primitive/default                             1.14    275.6±3.64ms  1978.7 MB/sec    1.00   242.8±10.96ms     2.2 GB/sec
list_primitive/parquet_2                           1.15    296.4±0.46ms  1839.7 MB/sec    1.00    258.0±6.18ms     2.1 GB/sec
list_primitive/zstd                                1.06    514.8±2.45ms  1059.5 MB/sec    1.00    486.1±2.77ms  1122.0 MB/sec
list_primitive/zstd_parquet_2                      1.07    499.6±0.47ms  1091.6 MB/sec    1.00    468.5±5.80ms  1164.1 MB/sec
list_primitive_non_null/bloom_filter               1.05   442.2±10.12ms  1230.7 MB/sec    1.00    421.6±8.76ms  1290.8 MB/sec
list_primitive_non_null/cdc                        1.06   435.1±10.09ms  1250.9 MB/sec    1.00    409.8±7.44ms  1328.1 MB/sec
list_primitive_non_null/default                    1.08   308.4±13.91ms  1764.7 MB/sec    1.00   284.4±13.46ms  1913.9 MB/sec
list_primitive_non_null/parquet_2                  1.23    357.1±4.16ms  1524.1 MB/sec    1.00   290.7±34.74ms  1872.1 MB/sec
list_primitive_non_null/zstd                       1.01   697.5±13.36ms   780.3 MB/sec    1.00   690.0±18.66ms   788.8 MB/sec
list_primitive_non_null/zstd_parquet_2             1.07   710.4±18.01ms   766.1 MB/sec    1.00    664.5±1.92ms   819.0 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     10.9±0.05ms     3.4 GB/sec    1.19     13.0±0.04ms     2.8 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.7±0.11ms  1645.6 MB/sec    1.09     24.7±0.07ms  1510.1 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.6±0.08ms     3.5 GB/sec    1.21     12.8±0.65ms     2.9 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.6±0.12ms     3.4 GB/sec    1.19     12.7±0.02ms     2.9 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.5±0.11ms     2.9 GB/sec    1.16     14.5±0.03ms     2.5 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.8±0.08ms     3.4 GB/sec    1.18     12.8±0.03ms     2.8 GB/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: parquet LevelInfoBuilder::write_list can be optimized?

6 participants