Skip to content

fix(parquet): bound data page byte size for large variable-width values#9972

Open
adriangb wants to merge 8 commits into
apache:mainfrom
pydantic:parquet-page-size-mid-batch
Open

fix(parquet): bound data page byte size for large variable-width values#9972
adriangb wants to merge 8 commits into
apache:mainfrom
pydantic:parquet-page-size-mid-batch

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented May 14, 2026

We write large values into our parquet files (e.g. a 5MB LLM prompt). A naive write will cause massive pages (we've seen up to 2GB) at default write settings. The main knob to control this is write_batch_size which defaults to 1024. But if each row is 5MB that's 5GB. On the other hand setting this to something small like 32 kills write performance and is completely unnecessary for other fixed width columns.

The writer even documents this (parquet/src/column/writer/mod.rs):

We check for DataPage limits only after we have inserted the values. If a user writes a large number of values, the DataPage size can be well above the limit.

This PR makes the mini-batch size byte-budget aware:

  • For each chunk, compute bytes_per_value from the values about to be written and pick sub_batch_size = page_byte_limit / bytes_per_value (clamped ≥ 1).
  • For typical small values — numeric columns, short strings — sub_batch_size ≥ chunk size, so we stay on the existing batched fast path with zero behavior change.
  • Only when individual values are large enough that a full chunk would blow the page does the sub-batch shrink — to one row per mini-batch in the limit, matching the format minimum of one record per page.

Implementation notes

Skip the byte-size check while parquet dictionary encoding is active: estimated_value_bytes returns plain-encoded size but a dict-encoded data page only stores small RLE indices, so the estimate would spuriously shrink pages. Dict fallback bounds dict-encoded pages independently.

For repeated/nested columns the sub-batch steps record-by-record (rep == 0 boundaries) so a record never spans data pages, matching the parquet format rule.

Regression test

test_column_writer_caps_page_size_for_large_byte_array_values writes 64 × 64 KiB BYTE_ARRAY values with a 16 KiB page byte limit. Before this fix that produced a single ~4 MiB page; after, it's one page per value (~64 pages, all within ~2× the value size).

Bench results

5-run medians, criterion arrow_writer bench, default writer properties, on a noisy laptop (run-to-run variance ~±1.6%):

bench Δ vs main
primitive/default (i32 25% null) −1.0%
primitive_non_null/default −0.0%
bool_non_null/default −1.2%
string/default +0.6%
short_string_non_null/default (new, 1M × 8 B) +0.2%
large_string_non_null/default (new, 1024 × 256 KiB) +1.2%
string_non_null/default −2.1%
string_dictionary/default +0.4%
list_primitive/default +0.5%
list_primitive_non_null/default +0.1%

🤖 Generated with Claude Code

`short_string_non_null` writes 1M 8-byte strings — exercises the
BYTE_ARRAY write path where per-value bookkeeping cost is largest.
`large_string_non_null` writes 1024 rows of 256 KiB strings — the case
where individual values exceed the default data-page byte limit, so a
default `write_batch_size`-row chunk would otherwise buffer hundreds of
MiB before any page-size check fires.

Both fill gaps in the existing arrow_writer benches, which only cover
random-length strings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot added the parquet Changes to the parquet crate label May 14, 2026
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangb adriangb force-pushed the parquet-page-size-mid-batch branch from 393ead0 to 4823429 Compare May 14, 2026 04:21
@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4447473325-58-pzct6 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (4823429) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.1±0.03ms    19.2 MB/sec    1.01     13.2±0.03ms    18.9 MB/sec
bool/cdc                                           1.00     15.7±0.05ms    16.0 MB/sec    1.03     16.1±0.07ms    15.5 MB/sec
bool/default                                       1.00     11.0±0.02ms    22.8 MB/sec    1.01     11.1±0.03ms    22.5 MB/sec
bool/parquet_2                                     1.00     14.7±0.04ms    17.0 MB/sec    1.01     14.8±0.03ms    16.9 MB/sec
bool/zstd                                          1.00     11.5±0.03ms    21.8 MB/sec    1.01     11.6±0.03ms    21.5 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.04ms    16.6 MB/sec    1.01     15.2±0.05ms    16.4 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.8 MB/sec    1.01      7.1±0.03ms    17.6 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.03ms    18.4 MB/sec    1.01      6.9±0.03ms    18.2 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.4 MB/sec    1.02      4.3±0.02ms    28.8 MB/sec
bool_non_null/parquet_2                            1.01      9.1±0.04ms    13.8 MB/sec    1.00      9.0±0.03ms    13.9 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.02      4.7±0.02ms    26.6 MB/sec
bool_non_null/zstd_parquet_2                       1.01      9.5±0.03ms    13.2 MB/sec    1.00      9.4±0.03ms    13.3 MB/sec
float_with_nans/bloom_filter                       1.00     91.9±0.45ms   152.3 MB/sec    1.03     95.0±0.49ms   147.4 MB/sec
float_with_nans/cdc                                1.00     81.2±0.33ms   172.4 MB/sec    1.02     82.7±0.17ms   169.4 MB/sec
float_with_nans/default                            1.00     74.0±0.32ms   189.3 MB/sec    1.03     76.3±0.28ms   183.4 MB/sec
float_with_nans/parquet_2                          1.00     93.7±0.44ms   149.4 MB/sec    1.01     94.8±0.26ms   147.7 MB/sec
float_with_nans/zstd                               1.00    111.5±0.25ms   125.5 MB/sec    1.02    114.2±0.26ms   122.6 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.7±0.83ms   106.3 MB/sec    1.00    131.8±0.19ms   106.2 MB/sec
large_string_non_null/bloom_filter                                                        1.00     78.3±0.17ms     3.2 GB/sec
large_string_non_null/cdc                                                                 1.00    241.5±1.40ms  1059.9 MB/sec
large_string_non_null/default                                                             1.00     59.9±0.14ms     4.2 GB/sec
large_string_non_null/parquet_2                                                           1.00     59.9±0.17ms     4.2 GB/sec
large_string_non_null/zstd                                                                1.00     60.2±0.60ms     4.2 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     60.0±0.29ms     4.2 GB/sec
list_primitive/bloom_filter                        1.00    321.4±1.04ms  1696.6 MB/sec    1.01    325.1±0.77ms  1677.4 MB/sec
list_primitive/cdc                                 1.01    362.7±4.79ms  1503.8 MB/sec    1.00    360.4±0.58ms  1513.3 MB/sec
list_primitive/default                             1.00    245.4±0.60ms     2.2 GB/sec    1.01    248.7±0.79ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    267.1±0.44ms  2041.6 MB/sec    1.01    270.4±1.01ms  2016.9 MB/sec
list_primitive/zstd                                1.00    495.4±0.86ms  1100.9 MB/sec    1.00    496.4±2.54ms  1098.7 MB/sec
list_primitive/zstd_parquet_2                      1.00    490.1±0.48ms  1112.9 MB/sec    1.01    494.1±0.92ms  1103.8 MB/sec
list_primitive_non_null/bloom_filter               1.00    426.6±3.62ms  1275.7 MB/sec    1.00    427.6±3.63ms  1272.8 MB/sec
list_primitive_non_null/cdc                        1.01    440.0±7.70ms  1236.8 MB/sec    1.00    434.8±8.76ms  1251.6 MB/sec
list_primitive_non_null/default                    1.00    287.9±2.90ms  1890.4 MB/sec    1.01    291.1±3.72ms  1869.5 MB/sec
list_primitive_non_null/parquet_2                  1.00   308.6±12.82ms  1763.4 MB/sec    1.05    323.0±9.12ms  1684.9 MB/sec
list_primitive_non_null/zstd                       1.00    714.5±3.78ms   761.7 MB/sec    1.00    712.8±5.58ms   763.6 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    683.0±0.52ms   796.8 MB/sec    1.00    686.0±0.81ms   793.4 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.1±0.22ms     3.3 GB/sec    1.02     11.3±0.02ms     3.2 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.6±0.18ms  1651.1 MB/sec    1.00     22.7±0.06ms  1648.7 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.8±0.06ms     3.4 GB/sec    1.02     11.0±0.09ms     3.3 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.8±0.07ms     3.4 GB/sec    1.02     11.0±0.02ms     3.3 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.6±0.09ms     2.9 GB/sec    1.01     12.8±0.02ms     2.9 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.9±0.03ms     3.4 GB/sec    1.02     11.1±0.07ms     3.3 GB/sec
primitive/bloom_filter                             1.00    147.4±0.75ms   304.6 MB/sec    1.03    151.3±0.39ms   296.6 MB/sec
primitive/cdc                                      1.00    158.2±0.59ms   283.7 MB/sec    1.02    160.7±0.64ms   279.2 MB/sec
primitive/default                                  1.00    117.5±0.77ms   382.0 MB/sec    1.02    119.7±0.47ms   375.0 MB/sec
primitive/parquet_2                                1.00    131.9±0.39ms   340.1 MB/sec    1.02    134.4±0.21ms   334.0 MB/sec
primitive/zstd                                     1.00    146.2±0.26ms   306.9 MB/sec    1.02    149.2±0.34ms   300.8 MB/sec
primitive/zstd_parquet_2                           1.00    165.0±0.33ms   271.9 MB/sec    1.02    167.9±0.38ms   267.2 MB/sec
primitive_all_null/bloom_filter                    1.00     11.5±0.15ms     3.8 GB/sec    1.00     11.5±0.17ms     3.8 GB/sec
primitive_all_null/cdc                             1.05     30.5±0.34ms  1469.9 MB/sec    1.00     29.2±0.33ms  1537.6 MB/sec
primitive_all_null/default                         1.00     10.9±0.10ms     4.0 GB/sec    1.01     10.9±0.11ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     10.9±0.14ms     4.0 GB/sec    1.00     10.9±0.11ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.15ms     4.0 GB/sec    1.00     11.0±0.12ms     4.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.23ms     4.0 GB/sec    1.00     11.1±0.24ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.04    110.6±1.27ms   397.8 MB/sec    1.00    106.2±0.53ms   414.5 MB/sec
primitive_non_null/cdc                             1.00     89.3±0.47ms   492.9 MB/sec    1.02     91.3±0.47ms   482.0 MB/sec
primitive_non_null/default                         1.00     67.1±0.31ms   655.8 MB/sec    1.02     68.6±0.50ms   641.6 MB/sec
primitive_non_null/parquet_2                       1.00     88.7±0.30ms   495.9 MB/sec    1.01     89.8±0.33ms   489.9 MB/sec
primitive_non_null/zstd                            1.04    103.9±0.21ms   423.7 MB/sec    1.00     99.6±0.49ms   441.6 MB/sec
primitive_non_null/zstd_parquet_2                  1.04    128.8±1.61ms   341.5 MB/sec    1.00    123.6±0.32ms   356.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.01     18.1±0.21ms     2.4 GB/sec    1.00     18.0±0.06ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.03     37.0±0.31ms  1214.3 MB/sec    1.00     35.8±0.35ms  1253.1 MB/sec
primitive_sparse_99pct_null/default                1.00     16.7±0.06ms     2.6 GB/sec    1.00     16.7±0.03ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     16.8±0.07ms     2.6 GB/sec    1.00     16.7±0.03ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.0±0.11ms     2.2 GB/sec    1.00     20.0±0.10ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     18.7±0.13ms     2.3 GB/sec    1.00     18.6±0.04ms     2.4 GB/sec
short_string_non_null/bloom_filter                                                        1.00     27.9±0.10ms   429.7 MB/sec
short_string_non_null/cdc                                                                 1.00     19.9±0.09ms   602.3 MB/sec
short_string_non_null/default                                                             1.00     15.7±0.09ms   764.8 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.4±0.06ms   472.0 MB/sec
short_string_non_null/zstd                                                                1.00     35.3±0.09ms   339.9 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.3±0.07ms   424.6 MB/sec
string/bloom_filter                                1.06   226.9±24.81ms     2.3 GB/sec    1.00   214.3±22.17ms     2.4 GB/sec
string/cdc                                         1.00    220.1±5.61ms     2.3 GB/sec    1.00    219.4±7.14ms     2.3 GB/sec
string/default                                     1.20   140.7±24.25ms     3.6 GB/sec    1.00   116.9±11.73ms     4.4 GB/sec
string/parquet_2                                   1.00    124.5±0.65ms     4.1 GB/sec    1.01    125.6±0.79ms     4.1 GB/sec
string/zstd                                        1.00    423.4±2.87ms  1238.1 MB/sec    1.04   440.9±19.31ms  1189.0 MB/sec
string/zstd_parquet_2                              1.00    394.3±0.42ms  1329.7 MB/sec    1.03   406.0±10.72ms  1291.4 MB/sec
string_and_binary_view/bloom_filter                1.00     62.8±0.33ms   513.4 MB/sec    1.05     65.7±0.35ms   491.1 MB/sec
string_and_binary_view/cdc                         1.00     58.2±0.13ms   553.9 MB/sec    1.05     61.0±0.41ms   528.6 MB/sec
string_and_binary_view/default                     1.00     47.7±0.18ms   675.9 MB/sec    1.05     50.0±0.31ms   645.3 MB/sec
string_and_binary_view/parquet_2                   1.00     58.5±0.18ms   551.2 MB/sec    1.04     61.1±0.35ms   527.8 MB/sec
string_and_binary_view/zstd                        1.00     84.1±0.18ms   383.3 MB/sec    1.03     86.6±0.31ms   372.4 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.4±0.35ms   445.3 MB/sec    1.04     75.1±0.30ms   429.6 MB/sec
string_dictionary/bloom_filter                     1.00     88.7±0.91ms     2.9 GB/sec    1.02     90.6±0.58ms     2.8 GB/sec
string_dictionary/cdc                              1.61     83.8±0.82ms     3.1 GB/sec    1.00     52.2±1.18ms     4.9 GB/sec
string_dictionary/default                          1.00     48.0±0.33ms     5.4 GB/sec    1.03     49.2±0.94ms     5.2 GB/sec
string_dictionary/parquet_2                        1.00     53.7±0.14ms     4.8 GB/sec    1.02     55.0±0.21ms     4.7 GB/sec
string_dictionary/zstd                             1.00    208.4±1.00ms  1267.2 MB/sec    1.01    209.5±0.64ms  1260.8 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.5±0.49ms  1330.4 MB/sec    1.00    199.4±0.17ms  1324.8 MB/sec
string_non_null/bloom_filter                       1.05   250.1±14.82ms     2.0 GB/sec    1.00    238.4±4.31ms     2.1 GB/sec
string_non_null/cdc                                1.01    266.1±8.91ms  1969.6 MB/sec    1.00    263.5±3.32ms  1988.9 MB/sec
string_non_null/default                            1.00   125.4±12.46ms     4.1 GB/sec    1.02    128.4±9.65ms     4.0 GB/sec
string_non_null/parquet_2                          1.05   139.7±11.29ms     3.7 GB/sec    1.00    132.7±0.46ms     3.9 GB/sec
string_non_null/zstd                               1.00    528.6±1.85ms   991.3 MB/sec    1.01    533.2±1.33ms   982.8 MB/sec
string_non_null/zstd_parquet_2                     1.00    504.8±2.15ms  1038.0 MB/sec    1.00    503.0±0.48ms  1041.7 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.01ms     6.2 GB/sec    1.00      2.5±0.00ms     6.3 GB/sec
struct_all_null/cdc                                1.00      9.9±0.12ms  1633.5 MB/sec    1.01     10.0±0.11ms  1614.5 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.01ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     45.9±0.18ms   348.8 MB/sec    1.05     48.3±0.23ms   331.4 MB/sec
struct_non_null/cdc                                1.00     45.1±0.17ms   354.6 MB/sec    1.02     45.9±0.27ms   348.3 MB/sec
struct_non_null/default                            1.00     31.8±0.17ms   503.2 MB/sec    1.03     32.7±0.14ms   488.8 MB/sec
struct_non_null/parquet_2                          1.00     40.6±0.49ms   394.5 MB/sec    1.03     41.6±0.11ms   384.7 MB/sec
struct_non_null/zstd                               1.00     40.6±0.15ms   394.2 MB/sec    1.02     41.5±0.15ms   385.6 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.5±0.13ms   293.7 MB/sec    1.02     55.3±0.17ms   289.1 MB/sec
struct_sparse_99pct_null/bloom_filter              1.01      7.4±0.02ms     2.1 GB/sec    1.00      7.4±0.05ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.07     15.3±0.08ms  1053.1 MB/sec    1.00     14.4±0.07ms  1121.7 MB/sec
struct_sparse_99pct_null/default                   1.01      6.9±0.05ms     2.3 GB/sec    1.00      6.9±0.06ms     2.3 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.9±0.03ms     2.3 GB/sec    1.00      6.9±0.04ms     2.3 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.3±0.01ms  1954.5 MB/sec    1.00      8.2±0.02ms  1963.5 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.01      7.7±0.03ms     2.0 GB/sec    1.00      7.6±0.02ms     2.1 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1935.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1876.9s
CPU sys 54.7s
Peak spill 0 B

branch

Metric Value
Wall time 2075.5s
Peak memory 6.8 GiB
Avg memory 6.7 GiB
CPU user 2028.4s
CPU sys 44.8s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangb adriangb force-pushed the parquet-page-size-mid-batch branch from d0e3d97 to 0fd6dcb Compare May 14, 2026 05:38
The parquet column writer only checks the data page byte limit AFTER
each mini-batch finishes writing, and mini-batches are sized by row
count (`write_batch_size`, default 1024). For BYTE_ARRAY columns with
large values — e.g. a 5 MiB image blob per row — a single mini-batch
can buffer multiple GiB into one data page before the configured byte
limit is even consulted. Pages can exceed the limit by orders of
magnitude.

Make the mini-batch size byte-budget aware:

- For each chunk, ask the encoder how many of the next values fit in
  one page byte budget. If everything fits, stay on the existing
  batched fast path (zero behavior change for small values).
- If not, sub-batch — for flat columns, one mini-batch per `k` values
  where `k` is the fit count; for repeated columns, one mini-batch
  per record (since a record cannot span data pages).

Skip the check while dictionary encoding is active: the byte estimate
is plain-encoded size, but a dict-encoded data page only stores small
RLE indices, so the estimate would spuriously shrink pages. Dictionary
fallback bounds dict-encoded pages independently.

The encoder hook is `count_values_within_byte_budget(values, offset,
len, byte_budget) -> Option<usize>` plus a `_gather` variant for the
arrow path, mirroring the existing `write`/`write_gather` split.
Returning `None` means "no cheap estimate available; stay batched."

Implementation details:

- `ParquetValueType::byte_size(&self)` returns the per-value plain-
  encoded byte size. Defaults to `size_of::<Self>()`; overridden for
  `ByteArray` (`len + 4`) and `FixedLenByteArray` (`len`).
- Standard `ColumnValueEncoderImpl<T>::count_values_within_byte_budget`
  short-circuits to `(byte_budget / size_of::<T::T>()).max(1).min(n)`
  for fixed-size physical types — one division, no walk. For BYTE_ARRAY
  and FIXED_LEN_BYTE_ARRAY it scans values cumulatively and exits at
  the first one to push the sum past the budget, which also catches
  skewed distributions (a single oversized value among many small ones
  is detected wherever it lands).
- Arrow `ByteArrayEncoder::count_values_within_byte_budget_gather`
  uses a two-stage walk on `GenericByteArray<O>` types: stage 1
  computes the total in O(1) via one subtraction on the offsets buffer
  when indices are contiguous (the case for every non-null column),
  returning immediately if the chunk fits. Stage 2 walks per-index
  lengths from the offsets buffer (still no slice/UTF-8 construction)
  when stage 1 doesn't conclude. View/dict/fixed-size-binary arrays
  fall through to a per-value walk via `ArrayAccessor::value`.
- `LevelDataRef::value_count(total, max_def)` reports how many levels
  in the chunk correspond to actual non-null values. Used to bridge
  the encoder's value-count answer back into level-count subdivision
  for nullable columns.

Tests in `column::writer::tests`:

- `test_column_writer_caps_page_size_for_large_byte_array_values` —
  flat regression: 64 × 64 KiB BYTE_ARRAY values vs a 16 KiB page
  limit produces one page per value rather than a single ~4 MiB page.
- `test_column_writer_caps_page_size_for_large_values_in_list` —
  Materialized-rep branch of `write_granular_chunk`: list of 3 large
  blobs × 3 records, asserts one page per record (no record splits).
- `test_column_writer_caps_page_size_with_nullable_large_values` —
  `LevelDataRef::value_count` on Materialized def levels with mixed
  nulls.
- `test_column_writer_dict_enabled_large_values_post_spill` —
  `has_dictionary()` short-circuit while dict is active, then byte-
  budget sub-batching after dict spill.
- `test_column_writer_caps_page_size_for_fixed_len_byte_array` —
  `FixedLenByteArray::byte_size` override.

Tests in `arrow::arrow_writer::tests`:

- `test_arrow_writer_caps_page_size_for_large_strings` — end-to-end
  through `ArrowWriter` exercising the offsets-buffer fast path.
- `test_arrow_writer_caps_page_size_for_large_string_view` —
  view-array fallback (Utf8View has no contiguous offsets buffer).
- `test_arrow_writer_all_null_string_column` — `value_count` Uniform
  branch under arrow's level optimization; asserts null_count and
  page coverage rather than just non-empty output.
- `test_arrow_writer_granular_mode_roundtrip` — value-fidelity round-
  trip: mix small + large strings so the byte-budget cutoff lands
  mid-chunk, write through `ArrowWriter`, read back with
  `ParquetRecordBatchReader`, assert each string matches.

Bench results vs `main` (5-run medians on a noisy laptop, run-to-run
variance ~±2%):

- `primitive/default` (i32 25% null): −0.4% to +1.3%
- `primitive_non_null/default`: −2.3% to +0.4%
- `bool_non_null/default`: +1.8% to +15.9% (highly noisy on this
  machine)
- `string/default`: +3.3% to +4.7%
- `short_string_non_null/default` (new, 1M × 8 B): +1.0% to +6.4%
- `large_string_non_null/default` (new, 1024 × 256 KiB): +0.5% to
  +2.7% — the case the fix targets
- `string_dictionary/default`: +3.3% to +6.4%
- `string_non_null/default`: −1.6% to +2.3%

All within laptop variance for the fast-path (small-value) cases.
The fix's intended case — large variable-width values — now correctly
bounds page sizes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@adriangb adriangb force-pushed the parquet-page-size-mid-batch branch from 0fd6dcb to 24b83c7 Compare May 14, 2026 06:15
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4448145440-71-5z4xr 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (24b83c7) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.02     13.2±0.11ms    18.9 MB/sec    1.00     13.0±0.08ms    19.2 MB/sec
bool/cdc                                           1.01     16.0±0.10ms    15.6 MB/sec    1.00     15.8±0.15ms    15.8 MB/sec
bool/default                                       1.02     11.2±0.09ms    22.4 MB/sec    1.00     10.9±0.04ms    22.9 MB/sec
bool/parquet_2                                     1.01     14.9±0.11ms    16.8 MB/sec    1.00     14.7±0.05ms    17.0 MB/sec
bool/zstd                                          1.02     11.6±0.07ms    21.5 MB/sec    1.00     11.5±0.06ms    21.8 MB/sec
bool/zstd_parquet_2                                1.02     15.3±0.10ms    16.4 MB/sec    1.00     15.1±0.07ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.1±0.04ms    17.7 MB/sec    1.00      7.0±0.03ms    17.8 MB/sec
bool_non_null/cdc                                  1.01      7.0±0.08ms    18.0 MB/sec    1.00      6.9±0.10ms    18.1 MB/sec
bool_non_null/default                              1.00      4.3±0.03ms    29.1 MB/sec    1.00      4.3±0.02ms    29.1 MB/sec
bool_non_null/parquet_2                            1.00      9.1±0.04ms    13.7 MB/sec    1.00      9.1±0.04ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    26.9 MB/sec    1.00      4.7±0.03ms    26.8 MB/sec
bool_non_null/zstd_parquet_2                       1.01      9.5±0.04ms    13.1 MB/sec    1.00      9.5±0.05ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     94.5±1.08ms   148.1 MB/sec    1.01     95.7±2.49ms   146.3 MB/sec
float_with_nans/cdc                                1.00     83.0±0.82ms   168.8 MB/sec    1.01     83.7±1.57ms   167.2 MB/sec
float_with_nans/default                            1.00     75.5±1.07ms   185.5 MB/sec    1.00     75.3±1.01ms   185.9 MB/sec
float_with_nans/parquet_2                          1.00     97.2±0.72ms   144.1 MB/sec    1.00     97.4±1.82ms   143.7 MB/sec
float_with_nans/zstd                               1.00    113.2±1.37ms   123.7 MB/sec    1.01    114.0±1.22ms   122.8 MB/sec
float_with_nans/zstd_parquet_2                     1.00    133.3±1.99ms   105.0 MB/sec    1.02    135.3±1.48ms   103.4 MB/sec
large_string_non_null/bloom_filter                                                        1.00     84.8±1.96ms     2.9 GB/sec
large_string_non_null/cdc                                                                 1.00    243.8±1.68ms  1050.1 MB/sec
large_string_non_null/default                                                             1.00     64.6±1.09ms     3.9 GB/sec
large_string_non_null/parquet_2                                                           1.00     65.2±2.66ms     3.8 GB/sec
large_string_non_null/zstd                                                                1.00     63.7±2.72ms     3.9 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     63.6±2.71ms     3.9 GB/sec
list_primitive/bloom_filter                        1.22   417.0±14.54ms  1308.0 MB/sec    1.00   340.4±11.36ms  1602.2 MB/sec
list_primitive/cdc                                 1.03   376.9±13.32ms  1447.0 MB/sec    1.00    364.3±4.07ms  1496.8 MB/sec
list_primitive/default                             1.28    324.3±7.20ms  1681.8 MB/sec    1.00    252.9±4.29ms     2.1 GB/sec
list_primitive/parquet_2                           1.24    336.6±2.67ms  1620.3 MB/sec    1.00    271.6±1.48ms  2008.1 MB/sec
list_primitive/zstd                                1.11    566.9±7.39ms   962.0 MB/sec    1.00    508.8±5.43ms  1072.0 MB/sec
list_primitive/zstd_parquet_2                      1.00    494.3±2.36ms  1103.3 MB/sec    1.01    497.0±2.60ms  1097.3 MB/sec
list_primitive_non_null/bloom_filter               1.11   496.8±24.03ms  1095.5 MB/sec    1.00   447.3±18.74ms  1216.7 MB/sec
list_primitive_non_null/cdc                        1.00    444.9±8.81ms  1223.3 MB/sec    1.00   445.4±13.12ms  1221.9 MB/sec
list_primitive_non_null/default                    1.15   344.7±21.96ms  1579.1 MB/sec    1.00    300.1±6.76ms  1813.4 MB/sec
list_primitive_non_null/parquet_2                  1.09    352.8±3.69ms  1542.5 MB/sec    1.00   322.6±22.73ms  1687.1 MB/sec
list_primitive_non_null/zstd                       1.05   765.1±24.38ms   711.3 MB/sec    1.00   730.8±20.34ms   744.7 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    695.9±3.50ms   782.1 MB/sec    1.01    703.1±7.88ms   774.1 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.6±0.13ms     3.2 GB/sec    1.05     12.2±0.27ms     3.0 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.8±0.34ms  1637.4 MB/sec    1.03     23.6±0.18ms  1584.4 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.8±0.21ms     3.4 GB/sec    1.09     11.9±0.06ms     3.1 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.0±0.19ms     3.3 GB/sec    1.06     11.6±0.21ms     3.1 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     13.1±0.05ms     2.8 GB/sec    1.02     13.3±0.25ms     2.7 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.3±0.26ms     3.2 GB/sec    1.05     11.9±0.16ms     3.1 GB/sec
primitive/bloom_filter                             1.00    153.9±2.09ms   291.5 MB/sec    1.00    154.2±1.49ms   291.1 MB/sec
primitive/cdc                                      1.00    161.3±1.88ms   278.2 MB/sec    1.00    161.2±1.30ms   278.4 MB/sec
primitive/default                                  1.01    120.2±0.79ms   373.4 MB/sec    1.00    118.9±1.69ms   377.3 MB/sec
primitive/parquet_2                                1.00    134.5±1.65ms   333.7 MB/sec    1.01    135.2±1.74ms   332.0 MB/sec
primitive/zstd                                     1.01    150.0±0.77ms   299.2 MB/sec    1.00    149.1±1.93ms   301.0 MB/sec
primitive/zstd_parquet_2                           1.01    169.4±1.51ms   264.9 MB/sec    1.00    167.5±1.20ms   267.9 MB/sec
primitive_all_null/bloom_filter                    1.01     11.8±0.11ms     3.7 GB/sec    1.00     11.7±0.22ms     3.7 GB/sec
primitive_all_null/cdc                             1.01     30.8±0.48ms  1458.4 MB/sec    1.00     30.6±0.35ms  1468.6 MB/sec
primitive_all_null/default                         1.00     11.0±0.18ms     4.0 GB/sec    1.00     11.0±0.14ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.01     11.0±0.18ms     4.0 GB/sec    1.00     10.9±0.10ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.1±0.16ms     3.9 GB/sec    1.00     11.1±0.15ms     4.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.20ms     4.0 GB/sec    1.01     11.1±0.19ms     3.9 GB/sec
primitive_non_null/bloom_filter                    1.06    115.9±2.51ms   379.8 MB/sec    1.00    109.3±2.72ms   402.4 MB/sec
primitive_non_null/cdc                             1.02     92.5±1.13ms   475.8 MB/sec    1.00     91.0±1.65ms   483.7 MB/sec
primitive_non_null/default                         1.00     69.2±0.52ms   635.6 MB/sec    1.00     69.2±1.37ms   636.1 MB/sec
primitive_non_null/parquet_2                       1.00     91.0±1.37ms   483.3 MB/sec    1.00     90.7±0.56ms   484.9 MB/sec
primitive_non_null/zstd                            1.07    106.1±1.85ms   414.7 MB/sec    1.00     99.5±1.06ms   442.1 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    132.2±2.10ms   332.8 MB/sec    1.00    124.4±1.51ms   353.8 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.6±0.64ms     2.4 GB/sec    1.05     19.6±0.16ms     2.2 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     37.9±0.58ms  1185.4 MB/sec    1.00     37.8±0.30ms  1187.6 MB/sec
primitive_sparse_99pct_null/default                1.00     17.1±0.27ms     2.6 GB/sec    1.00     17.2±0.25ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.3±0.07ms     2.5 GB/sec    1.00     17.3±0.31ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.4±0.26ms     2.2 GB/sec    1.00     20.4±0.28ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     19.3±0.10ms     2.3 GB/sec    1.00     19.2±0.32ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.2±0.27ms   425.8 MB/sec
short_string_non_null/cdc                                                                 1.00     20.3±0.24ms   590.7 MB/sec
short_string_non_null/default                                                             1.00     16.1±0.06ms   743.5 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.7±0.05ms   466.2 MB/sec
short_string_non_null/zstd                                                                1.00     35.6±0.13ms   336.7 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.5±0.09ms   421.1 MB/sec
string/bloom_filter                                1.05   241.4±29.36ms     2.1 GB/sec    1.00   229.5±20.33ms     2.2 GB/sec
string/cdc                                         1.01    226.3±8.62ms     2.3 GB/sec    1.00    224.0±7.48ms     2.3 GB/sec
string/default                                     1.22   149.0±25.96ms     3.4 GB/sec    1.00   121.8±12.81ms     4.2 GB/sec
string/parquet_2                                   1.05    128.5±2.18ms     4.0 GB/sec    1.00    121.8±2.90ms     4.2 GB/sec
string/zstd                                        1.01    430.1±4.75ms  1219.0 MB/sec    1.00    425.7±4.35ms  1231.6 MB/sec
string/zstd_parquet_2                              1.00    395.1±1.99ms  1326.7 MB/sec    1.01    399.0±1.66ms  1313.8 MB/sec
string_and_binary_view/bloom_filter                1.00     66.1±2.70ms   488.2 MB/sec    1.04     68.8±2.99ms   469.0 MB/sec
string_and_binary_view/cdc                         1.00     60.1±1.44ms   537.0 MB/sec    1.05     62.9±1.53ms   513.0 MB/sec
string_and_binary_view/default                     1.00     49.8±1.52ms   646.9 MB/sec    1.04     51.8±1.41ms   622.2 MB/sec
string_and_binary_view/parquet_2                   1.00     59.7±1.11ms   540.2 MB/sec    1.04     62.1±1.55ms   519.5 MB/sec
string_and_binary_view/zstd                        1.00     85.2±0.56ms   378.7 MB/sec    1.03     88.0±0.54ms   366.6 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     74.0±0.88ms   436.0 MB/sec    1.04     76.9±1.08ms   419.5 MB/sec
string_dictionary/bloom_filter                     1.00    110.0±7.61ms     2.3 GB/sec    1.28    141.3±7.91ms  1868.8 MB/sec
string_dictionary/cdc                              1.00     78.8±2.29ms     3.3 GB/sec    1.30    102.3±3.80ms     2.5 GB/sec
string_dictionary/default                          1.00     65.6±3.25ms     3.9 GB/sec    1.39     90.9±3.71ms     2.8 GB/sec
string_dictionary/parquet_2                        1.00     55.2±0.59ms     4.7 GB/sec    1.85    102.1±1.87ms     2.5 GB/sec
string_dictionary/zstd                             1.00    222.4±3.69ms  1187.8 MB/sec    1.01   224.7±15.20ms  1175.5 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.5±1.33ms  1330.5 MB/sec    1.19    236.0±1.85ms  1119.4 MB/sec
string_non_null/bloom_filter                       1.00   254.5±21.18ms     2.0 GB/sec    1.00   255.3±15.69ms     2.0 GB/sec
string_non_null/cdc                                1.00   277.0±13.67ms  1891.6 MB/sec    1.02   281.6±12.38ms  1861.0 MB/sec
string_non_null/default                            1.09   149.0±12.15ms     3.4 GB/sec    1.00   136.7±14.33ms     3.7 GB/sec
string_non_null/parquet_2                          1.00    124.0±2.41ms     4.1 GB/sec    1.25    154.5±2.23ms     3.3 GB/sec
string_non_null/zstd                               1.00    565.8±9.78ms   926.1 MB/sec    1.04   591.1±33.18ms   886.5 MB/sec
string_non_null/zstd_parquet_2                     1.00    505.3±2.87ms  1037.0 MB/sec    1.04   523.1±10.81ms  1001.6 MB/sec
struct_all_null/bloom_filter                       1.01      2.6±0.02ms     6.2 GB/sec    1.00      2.5±0.03ms     6.2 GB/sec
struct_all_null/cdc                                1.01      9.8±0.13ms  1640.3 MB/sec    1.00      9.8±0.10ms  1648.9 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     47.6±1.08ms   336.1 MB/sec    1.02     48.7±1.46ms   328.6 MB/sec
struct_non_null/cdc                                1.00     46.0±0.28ms   348.1 MB/sec    1.01     46.3±0.59ms   345.9 MB/sec
struct_non_null/default                            1.00     32.5±0.27ms   491.8 MB/sec    1.01     33.0±0.36ms   485.5 MB/sec
struct_non_null/parquet_2                          1.02     41.7±0.54ms   384.1 MB/sec    1.00     41.0±0.61ms   390.3 MB/sec
struct_non_null/zstd                               1.00     41.2±0.74ms   387.9 MB/sec    1.00     41.2±0.24ms   388.6 MB/sec
struct_non_null/zstd_parquet_2                     1.00     55.5±0.60ms   288.5 MB/sec    1.00     55.3±0.96ms   289.4 MB/sec
struct_sparse_99pct_null/bloom_filter              1.04      7.9±0.28ms  2047.0 MB/sec    1.00      7.6±0.29ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.00     15.9±0.12ms  1014.2 MB/sec    1.00     15.8±0.27ms  1018.4 MB/sec
struct_sparse_99pct_null/default                   1.03      7.3±0.09ms     2.2 GB/sec    1.00      7.1±0.17ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.2±0.20ms     2.2 GB/sec    1.01      7.2±0.16ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.4±0.18ms  1924.8 MB/sec    1.03      8.6±0.06ms  1868.8 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      8.0±0.14ms  2015.1 MB/sec    1.00      8.0±0.18ms  2013.8 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 2015.4s
Peak memory 6.2 GiB
Avg memory 6.0 GiB
CPU user 1897.3s
CPU sys 113.7s
Peak spill 0 B

branch

Metric Value
Wall time 2115.5s
Peak memory 6.5 GiB
Avg memory 6.3 GiB
CPU user 2032.5s
CPU sys 80.9s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4451856049-90-g2tzh 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (24b83c7) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

…w arrays

The byte-budget check on `Utf8View` / `BinaryView` columns previously
fell through to a per-value walk via `ArrayAccessor::value`, which
constructs a `&str`/`&[u8]` slice for each index — chasing the buffer
pointer through the view's u128 word, then slicing `data_buffers[i]`.
At ~1 µs per chunk over ~1000 chunks on the 1 M-row `string_and_binary_view`
bench, that was a consistent ~+3–5 % regression vs `main` in both
GKE benchmark runs.

View arrays store each value's length in the low 32 bits of its u128
view, so we can scan lengths with no data-buffer dereferences:

```
let len = (views[idx] as u32) as usize;
```

Add a dedicated fast path for `Utf8View` and `BinaryView` that walks
the views buffer directly. Falls through to the per-value walk only
for `FixedSizeBinary` and `Dictionary` — the latter still needs the
dictionary-keys indirection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4451897215-91-nvnwl 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (70dc497) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.02     13.3±0.09ms    18.8 MB/sec    1.00     13.1±0.05ms    19.1 MB/sec
bool/cdc                                           1.01     15.9±0.08ms    15.7 MB/sec    1.00     15.9±0.05ms    15.8 MB/sec
bool/default                                       1.02     11.1±0.07ms    22.4 MB/sec    1.00     10.9±0.04ms    22.9 MB/sec
bool/parquet_2                                     1.01     14.9±0.10ms    16.8 MB/sec    1.00     14.7±0.05ms    17.0 MB/sec
bool/zstd                                          1.02     11.7±0.07ms    21.5 MB/sec    1.00     11.4±0.04ms    21.9 MB/sec
bool/zstd_parquet_2                                1.01     15.3±0.07ms    16.4 MB/sec    1.00     15.1±0.05ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.1±0.02ms    17.7 MB/sec    1.00      7.1±0.02ms    17.7 MB/sec
bool_non_null/cdc                                  1.00      6.9±0.03ms    18.1 MB/sec    1.01      7.0±0.02ms    18.0 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.01      4.3±0.02ms    29.1 MB/sec
bool_non_null/parquet_2                            1.01      9.1±0.04ms    13.7 MB/sec    1.00      9.0±0.03ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.0 MB/sec    1.02      4.7±0.08ms    26.5 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.02ms    13.1 MB/sec    1.00      9.5±0.04ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     95.0±0.41ms   147.3 MB/sec    1.01     95.8±0.62ms   146.2 MB/sec
float_with_nans/cdc                                1.02     84.7±2.06ms   165.4 MB/sec    1.00     83.3±0.31ms   168.2 MB/sec
float_with_nans/default                            1.00     74.8±0.24ms   187.2 MB/sec    1.01     75.3±0.28ms   185.9 MB/sec
float_with_nans/parquet_2                          1.00     96.5±0.41ms   145.1 MB/sec    1.00     96.9±0.54ms   144.5 MB/sec
float_with_nans/zstd                               1.00    113.0±0.29ms   123.9 MB/sec    1.00    113.3±0.32ms   123.5 MB/sec
float_with_nans/zstd_parquet_2                     1.00    133.4±0.51ms   105.0 MB/sec    1.01    134.2±0.49ms   104.3 MB/sec
large_string_non_null/bloom_filter                                                        1.00     80.9±0.27ms     3.1 GB/sec
large_string_non_null/cdc                                                                 1.00    244.3±2.51ms  1048.1 MB/sec
large_string_non_null/default                                                             1.00     62.0±0.20ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     61.8±0.21ms     4.0 GB/sec
large_string_non_null/zstd                                                                1.00     61.9±0.20ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     61.8±0.22ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    331.7±2.11ms  1644.3 MB/sec    1.03    341.7±2.31ms  1596.0 MB/sec
list_primitive/cdc                                 1.00    360.2±1.78ms  1514.3 MB/sec    1.00    361.7±1.29ms  1507.9 MB/sec
list_primitive/default                             1.00    251.7±1.44ms     2.1 GB/sec    1.01    255.2±1.27ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    269.9±0.95ms  2020.5 MB/sec    1.01    271.9±0.47ms  2005.8 MB/sec
list_primitive/zstd                                1.00    503.3±2.86ms  1083.6 MB/sec    1.01    509.6±6.03ms  1070.3 MB/sec
list_primitive/zstd_parquet_2                      1.00    493.2±0.54ms  1105.8 MB/sec    1.00    495.2±0.60ms  1101.4 MB/sec
list_primitive_non_null/bloom_filter               1.00   435.7±11.23ms  1249.1 MB/sec    1.01    439.1±9.47ms  1239.6 MB/sec
list_primitive_non_null/cdc                        1.01   443.8±10.63ms  1226.3 MB/sec    1.00   441.2±10.64ms  1233.5 MB/sec
list_primitive_non_null/default                    1.00    287.6±3.39ms  1892.4 MB/sec    1.05    300.7±4.21ms  1810.1 MB/sec
list_primitive_non_null/parquet_2                  1.00    318.2±0.89ms  1710.6 MB/sec    1.00   318.5±21.42ms  1708.6 MB/sec
list_primitive_non_null/zstd                       1.00    713.5±7.79ms   762.8 MB/sec    1.02   725.9±18.34ms   749.8 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    679.9±0.90ms   800.4 MB/sec    1.03    700.5±9.97ms   777.0 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.6±0.05ms     3.2 GB/sec    1.05     12.1±0.07ms     3.0 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     23.1±0.05ms  1616.8 MB/sec    1.02     23.7±0.07ms  1578.7 MB/sec
list_primitive_sparse_99pct_null/default           1.00     11.3±0.29ms     3.2 GB/sec    1.04     11.8±0.03ms     3.1 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.1±0.04ms     3.3 GB/sec    1.06     11.8±0.04ms     3.1 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     13.0±0.03ms     2.8 GB/sec    1.05     13.7±0.04ms     2.7 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.3±0.04ms     3.2 GB/sec    1.06     12.0±0.04ms     3.0 GB/sec
primitive/bloom_filter                             1.01    155.5±5.71ms   288.5 MB/sec    1.00    153.7±0.79ms   292.0 MB/sec
primitive/cdc                                      1.00    161.6±0.56ms   277.6 MB/sec    1.00    161.2±0.64ms   278.4 MB/sec
primitive/default                                  1.01    121.2±1.14ms   370.3 MB/sec    1.00    119.8±0.50ms   374.7 MB/sec
primitive/parquet_2                                1.00    135.0±0.50ms   332.5 MB/sec    1.00    135.2±0.48ms   331.8 MB/sec
primitive/zstd                                     1.01    149.6±0.53ms   299.9 MB/sec    1.00    148.7±0.41ms   301.7 MB/sec
primitive/zstd_parquet_2                           1.01    168.6±0.51ms   266.2 MB/sec    1.00    167.7±0.39ms   267.5 MB/sec
primitive_all_null/bloom_filter                    1.00     11.8±0.07ms     3.7 GB/sec    1.01     11.9±0.26ms     3.7 GB/sec
primitive_all_null/cdc                             1.00     30.7±0.42ms  1461.3 MB/sec    1.00     30.8±0.41ms  1458.2 MB/sec
primitive_all_null/default                         1.00     10.9±0.10ms     4.0 GB/sec    1.00     11.0±0.07ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     11.0±0.17ms     4.0 GB/sec    1.00     11.0±0.17ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.1±0.19ms     3.9 GB/sec    1.00     11.1±0.14ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.10ms     4.0 GB/sec    1.01     11.1±0.20ms     3.9 GB/sec
primitive_non_null/bloom_filter                    1.04    114.2±1.46ms   385.2 MB/sec    1.00    109.4±0.48ms   402.4 MB/sec
primitive_non_null/cdc                             1.00     91.1±0.54ms   482.9 MB/sec    1.01     91.6±0.24ms   480.3 MB/sec
primitive_non_null/default                         1.00     68.5±0.26ms   642.5 MB/sec    1.01     69.3±0.33ms   634.8 MB/sec
primitive_non_null/parquet_2                       1.00     91.3±0.51ms   482.0 MB/sec    1.00     91.0±0.33ms   483.3 MB/sec
primitive_non_null/zstd                            1.07    106.7±0.26ms   412.4 MB/sec    1.00     99.9±0.28ms   440.5 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    131.5±1.88ms   334.5 MB/sec    1.00    124.3±0.38ms   353.9 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     19.1±0.08ms     2.3 GB/sec    1.01     19.3±0.07ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     37.7±0.28ms  1189.3 MB/sec    1.01     37.9±0.31ms  1183.2 MB/sec
primitive_sparse_99pct_null/default                1.00     17.2±0.04ms     2.5 GB/sec    1.00     17.3±0.04ms     2.5 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.2±0.04ms     2.5 GB/sec    1.00     17.3±0.03ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.5±0.06ms     2.1 GB/sec    1.01     20.6±0.04ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     19.1±0.04ms     2.3 GB/sec    1.01     19.2±0.05ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.2±0.09ms   425.5 MB/sec
short_string_non_null/cdc                                                                 1.00     20.4±0.05ms   587.2 MB/sec
short_string_non_null/default                                                             1.00     16.1±0.05ms   744.8 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.8±0.09ms   464.8 MB/sec
short_string_non_null/zstd                                                                1.00     35.8±0.10ms   335.6 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.7±0.05ms   418.3 MB/sec
string/bloom_filter                                1.05   238.5±27.20ms     2.1 GB/sec    1.00   226.3±17.74ms     2.3 GB/sec
string/cdc                                         1.00    223.2±8.00ms     2.3 GB/sec    1.00    223.1±5.65ms     2.3 GB/sec
string/default                                     1.20   147.1±29.69ms     3.5 GB/sec    1.00   122.2±12.93ms     4.2 GB/sec
string/parquet_2                                   1.05    127.3±0.36ms     4.0 GB/sec    1.00    121.8±0.70ms     4.2 GB/sec
string/zstd                                        1.01    428.5±3.10ms  1223.6 MB/sec    1.00    423.7±1.44ms  1237.3 MB/sec
string/zstd_parquet_2                              1.00    396.2±1.24ms  1323.1 MB/sec    1.01    399.1±0.90ms  1313.7 MB/sec
string_and_binary_view/bloom_filter                1.00     65.9±0.55ms   489.5 MB/sec    1.07     70.4±0.76ms   458.4 MB/sec
string_and_binary_view/cdc                         1.00     59.2±0.33ms   544.4 MB/sec    1.05     62.4±0.26ms   516.8 MB/sec
string_and_binary_view/default                     1.00     48.6±0.22ms   664.2 MB/sec    1.07     52.0±0.23ms   620.7 MB/sec
string_and_binary_view/parquet_2                   1.00     59.3±0.27ms   543.4 MB/sec    1.06     62.8±0.34ms   513.3 MB/sec
string_and_binary_view/zstd                        1.00     85.0±0.28ms   379.2 MB/sec    1.04     88.3±0.22ms   365.4 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.3±0.29ms   440.0 MB/sec    1.05     76.8±0.23ms   419.8 MB/sec
string_dictionary/bloom_filter                     1.00    109.2±1.30ms     2.4 GB/sec    1.33    145.2±2.49ms  1818.7 MB/sec
string_dictionary/cdc                              1.00     81.6±5.02ms     3.2 GB/sec    1.27    103.3±2.71ms     2.5 GB/sec
string_dictionary/default                          1.00     64.1±1.61ms     4.0 GB/sec    1.48     94.7±1.24ms     2.7 GB/sec
string_dictionary/parquet_2                        1.00     67.1±0.25ms     3.8 GB/sec    1.56    104.9±0.45ms     2.5 GB/sec
string_dictionary/zstd                             1.00    217.7±2.62ms  1213.5 MB/sec    1.04   225.3±15.26ms  1172.3 MB/sec
string_dictionary/zstd_parquet_2                   1.00    199.2±0.26ms  1326.0 MB/sec    1.20    238.4±0.39ms  1107.9 MB/sec
string_non_null/bloom_filter                       1.00   256.9±15.35ms  2039.7 MB/sec    1.05   269.0±13.29ms  1948.2 MB/sec
string_non_null/cdc                                1.00    258.5±1.05ms  2026.8 MB/sec    1.10   284.5±12.16ms  1841.6 MB/sec
string_non_null/default                            1.01   138.3±15.37ms     3.7 GB/sec    1.00   137.4±12.80ms     3.7 GB/sec
string_non_null/parquet_2                          1.00    151.5±0.40ms     3.4 GB/sec    1.04    157.7±0.75ms     3.2 GB/sec
string_non_null/zstd                               1.01   602.1±23.07ms   870.2 MB/sec    1.00   598.9±35.12ms   874.9 MB/sec
string_non_null/zstd_parquet_2                     1.01   528.8±12.56ms   990.9 MB/sec    1.00   525.6±11.06ms   996.9 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.00ms     6.2 GB/sec    1.00      2.5±0.01ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.8±0.13ms  1639.3 MB/sec    1.00      9.8±0.12ms  1641.0 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     46.3±0.19ms   345.3 MB/sec    1.07     49.5±1.67ms   323.1 MB/sec
struct_non_null/cdc                                1.00     45.7±0.18ms   349.9 MB/sec    1.01     46.2±0.18ms   346.5 MB/sec
struct_non_null/default                            1.00     32.2±0.11ms   496.3 MB/sec    1.02     32.8±0.12ms   487.7 MB/sec
struct_non_null/parquet_2                          1.00     41.1±0.18ms   389.3 MB/sec    1.01     41.4±0.11ms   386.1 MB/sec
struct_non_null/zstd                               1.00     41.2±0.14ms   388.4 MB/sec    1.01     41.4±0.09ms   386.0 MB/sec
struct_non_null/zstd_parquet_2                     1.00     55.1±0.23ms   290.5 MB/sec    1.00     55.3±0.15ms   289.6 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.7±0.07ms     2.0 GB/sec    1.04      8.0±0.08ms  2019.0 MB/sec
struct_sparse_99pct_null/cdc                       1.00     15.9±0.13ms  1014.7 MB/sec    1.00     15.9±0.10ms  1014.9 MB/sec
struct_sparse_99pct_null/default                   1.00      7.2±0.03ms     2.2 GB/sec    1.00      7.3±0.03ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.3±0.04ms     2.2 GB/sec    1.01      7.3±0.03ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.6±0.05ms  1869.8 MB/sec    1.00      8.6±0.05ms  1877.6 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.01      8.1±0.68ms  1988.4 MB/sec    1.00      8.0±0.03ms  2004.7 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1965.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1892.3s
CPU sys 70.4s
Peak spill 0 B

branch

Metric Value
Wall time 2115.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2029.6s
CPU sys 83.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.2±0.10ms    19.0 MB/sec    1.00     13.1±0.12ms    19.1 MB/sec
bool/cdc                                           1.00     15.9±0.11ms    15.8 MB/sec    1.00     15.8±0.11ms    15.8 MB/sec
bool/default                                       1.00     11.0±0.08ms    22.7 MB/sec    1.00     11.0±0.11ms    22.7 MB/sec
bool/parquet_2                                     1.00     14.9±0.08ms    16.8 MB/sec    1.00     14.9±0.11ms    16.8 MB/sec
bool/zstd                                          1.00     11.5±0.11ms    21.7 MB/sec    1.00     11.5±0.11ms    21.7 MB/sec
bool/zstd_parquet_2                                1.00     15.2±0.10ms    16.4 MB/sec    1.00     15.2±0.12ms    16.4 MB/sec
bool_non_null/bloom_filter                         1.01      7.1±0.05ms    17.7 MB/sec    1.00      7.0±0.03ms    17.8 MB/sec
bool_non_null/cdc                                  1.01      6.9±0.05ms    18.2 MB/sec    1.00      6.8±0.04ms    18.4 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.2 MB/sec    1.00      4.3±0.02ms    29.4 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    13.8 MB/sec    1.01      9.1±0.03ms    13.8 MB/sec
bool_non_null/zstd                                 1.01      4.6±0.04ms    27.0 MB/sec    1.00      4.6±0.02ms    27.1 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.05ms    13.2 MB/sec    1.00      9.5±0.04ms    13.1 MB/sec
float_with_nans/bloom_filter                       1.00     93.6±0.60ms   149.5 MB/sec    1.00     93.7±0.37ms   149.4 MB/sec
float_with_nans/cdc                                1.00     81.9±0.47ms   171.0 MB/sec    1.00     81.8±0.18ms   171.1 MB/sec
float_with_nans/default                            1.00     74.6±0.31ms   187.8 MB/sec    1.00     74.4±0.19ms   188.1 MB/sec
float_with_nans/parquet_2                          1.00     95.5±0.77ms   146.6 MB/sec    1.00     95.2±0.27ms   147.1 MB/sec
float_with_nans/zstd                               1.00    112.2±0.36ms   124.7 MB/sec    1.00    112.2±0.20ms   124.8 MB/sec
float_with_nans/zstd_parquet_2                     1.00    132.2±0.73ms   105.9 MB/sec    1.00    132.6±0.42ms   105.6 MB/sec
large_string_non_null/bloom_filter                                                        1.00     81.4±0.25ms     3.1 GB/sec
large_string_non_null/cdc                                                                 1.00    242.9±1.01ms  1053.9 MB/sec
large_string_non_null/default                                                             1.00     62.7±0.16ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     62.6±0.20ms     4.0 GB/sec
large_string_non_null/zstd                                                                1.00     62.6±0.25ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     62.7±0.23ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    326.1±2.39ms  1672.5 MB/sec    1.01    330.5±0.96ms  1650.2 MB/sec
list_primitive/cdc                                 1.00    358.5±1.37ms  1521.0 MB/sec    1.00    358.3±0.87ms  1522.2 MB/sec
list_primitive/default                             1.00    247.5±1.16ms     2.2 GB/sec    1.01    250.2±2.22ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    267.9±0.57ms  2036.0 MB/sec    1.01    270.1±0.72ms  2019.5 MB/sec
list_primitive/zstd                                1.01    499.4±1.63ms  1092.1 MB/sec    1.00    494.7±1.32ms  1102.5 MB/sec
list_primitive/zstd_parquet_2                      1.00    491.4±0.65ms  1109.9 MB/sec    1.00    493.1±0.33ms  1106.0 MB/sec
list_primitive_non_null/bloom_filter               1.00    433.6±5.40ms  1255.1 MB/sec    1.00    434.0±8.63ms  1254.0 MB/sec
list_primitive_non_null/cdc                        1.00    441.9±8.75ms  1231.5 MB/sec    1.00   442.5±19.30ms  1229.8 MB/sec
list_primitive_non_null/default                    1.00    293.6±4.39ms  1853.7 MB/sec    1.04    304.1±7.62ms  1789.9 MB/sec
list_primitive_non_null/parquet_2                  1.01   311.0±13.65ms  1750.2 MB/sec    1.00   307.5±25.04ms  1769.6 MB/sec
list_primitive_non_null/zstd                       1.00    717.4±8.92ms   758.6 MB/sec    1.00   719.1±27.93ms   756.8 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    671.4±0.87ms   810.6 MB/sec    1.03    690.7±1.03ms   787.9 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.2±0.15ms     3.3 GB/sec    1.09     12.3±0.10ms     3.0 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.8±0.15ms  1638.9 MB/sec    1.03     23.5±0.10ms  1592.1 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.9±0.12ms     3.3 GB/sec    1.08     11.8±0.12ms     3.1 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.9±0.12ms     3.3 GB/sec    1.09     11.9±0.07ms     3.1 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.9±0.10ms     2.8 GB/sec    1.05     13.6±0.07ms     2.7 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.0±0.13ms     3.3 GB/sec    1.08     11.9±0.09ms     3.1 GB/sec
primitive/bloom_filter                             1.01    152.5±0.81ms   294.2 MB/sec    1.00    151.5±1.06ms   296.3 MB/sec
primitive/cdc                                      1.01    161.3±0.76ms   278.2 MB/sec    1.00    160.4±0.80ms   279.8 MB/sec
primitive/default                                  1.00    119.6±0.69ms   375.3 MB/sec    1.00    119.0±0.83ms   377.2 MB/sec
primitive/parquet_2                                1.00    134.3±0.66ms   334.0 MB/sec    1.00    134.8±0.72ms   333.0 MB/sec
primitive/zstd                                     1.00    149.1±0.93ms   300.9 MB/sec    1.00    148.6±0.60ms   301.9 MB/sec
primitive/zstd_parquet_2                           1.00    167.7±0.76ms   267.6 MB/sec    1.00    167.5±0.65ms   267.9 MB/sec
primitive_all_null/bloom_filter                    1.00     11.6±0.18ms     3.8 GB/sec    1.00     11.6±0.14ms     3.8 GB/sec
primitive_all_null/cdc                             1.01     30.8±0.42ms  1458.6 MB/sec    1.00     30.6±0.40ms  1466.5 MB/sec
primitive_all_null/default                         1.00     10.9±0.13ms     4.0 GB/sec    1.01     11.0±0.17ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     10.9±0.20ms     4.0 GB/sec    1.00     11.0±0.21ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.16ms     4.0 GB/sec    1.01     11.1±0.19ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.1±0.23ms     4.0 GB/sec    1.00     11.1±0.23ms     3.9 GB/sec
primitive_non_null/bloom_filter                    1.08    116.5±1.55ms   377.6 MB/sec    1.00    107.9±0.66ms   408.0 MB/sec
primitive_non_null/cdc                             1.00     90.9±0.67ms   484.0 MB/sec    1.00     90.7±0.51ms   485.0 MB/sec
primitive_non_null/default                         1.00     68.2±0.23ms   645.1 MB/sec    1.00     68.3±0.29ms   644.2 MB/sec
primitive_non_null/parquet_2                       1.00     90.0±0.38ms   488.7 MB/sec    1.00     89.6±0.36ms   491.1 MB/sec
primitive_non_null/zstd                            1.07    105.8±0.53ms   415.7 MB/sec    1.00     98.9±0.33ms   445.1 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    130.8±1.86ms   336.3 MB/sec    1.00    123.3±0.36ms   357.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.9±0.22ms     2.3 GB/sec    1.00     18.9±0.17ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     37.6±0.32ms  1194.9 MB/sec    1.00     37.6±0.23ms  1194.8 MB/sec
primitive_sparse_99pct_null/default                1.00     16.9±0.07ms     2.6 GB/sec    1.01     17.1±0.06ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.0±0.07ms     2.6 GB/sec    1.00     17.0±0.09ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.2±0.07ms     2.2 GB/sec    1.01     20.3±0.10ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     18.9±0.07ms     2.3 GB/sec    1.01     19.0±0.08ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.2±0.08ms   425.5 MB/sec
short_string_non_null/cdc                                                                 1.00     20.2±0.07ms   594.8 MB/sec
short_string_non_null/default                                                             1.00     15.9±0.10ms   752.7 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.6±0.06ms   469.3 MB/sec
short_string_non_null/zstd                                                                1.00     36.0±0.10ms   333.5 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.5±0.07ms   421.1 MB/sec
string/bloom_filter                                1.08   231.7±25.73ms     2.2 GB/sec    1.00   213.8±15.26ms     2.4 GB/sec
string/cdc                                         1.00    221.6±6.09ms     2.3 GB/sec    1.00    221.2±5.30ms     2.3 GB/sec
string/default                                     1.20   143.8±25.23ms     3.6 GB/sec    1.00   119.3±12.07ms     4.3 GB/sec
string/parquet_2                                   1.05    125.4±1.02ms     4.1 GB/sec    1.00    119.2±1.11ms     4.3 GB/sec
string/zstd                                        1.01    426.4±2.98ms  1229.6 MB/sec    1.00    420.4±1.72ms  1247.1 MB/sec
string/zstd_parquet_2                              1.00    394.0±0.73ms  1330.7 MB/sec    1.01    397.8±0.72ms  1317.9 MB/sec
string_and_binary_view/bloom_filter                1.00     66.2±0.68ms   486.9 MB/sec    1.02     67.8±0.21ms   475.8 MB/sec
string_and_binary_view/cdc                         1.00     59.0±0.28ms   546.6 MB/sec    1.05     61.7±0.13ms   522.6 MB/sec
string_and_binary_view/default                     1.00     48.6±0.34ms   664.1 MB/sec    1.05     50.9±0.14ms   633.0 MB/sec
string_and_binary_view/parquet_2                   1.00     59.3±0.43ms   544.3 MB/sec    1.04     61.7±0.13ms   522.3 MB/sec
string_and_binary_view/zstd                        1.00     85.1±0.44ms   378.8 MB/sec    1.03     87.6±0.11ms   368.1 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.2±0.27ms   440.9 MB/sec    1.03     75.6±0.14ms   426.3 MB/sec
string_dictionary/bloom_filter                     1.00     91.1±1.88ms     2.8 GB/sec    1.51    137.4±1.18ms  1922.0 MB/sec
string_dictionary/cdc                              1.00     86.0±1.23ms     3.0 GB/sec    1.16     99.8±2.45ms     2.6 GB/sec
string_dictionary/default                          1.00     49.3±0.58ms     5.2 GB/sec    1.86     91.8±1.06ms     2.8 GB/sec
string_dictionary/parquet_2                        1.00     54.3±0.47ms     4.7 GB/sec    1.89    102.6±0.37ms     2.5 GB/sec
string_dictionary/zstd                             1.00    211.1±1.33ms  1251.3 MB/sec    1.06   222.9±14.78ms  1185.0 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.4±0.36ms  1331.0 MB/sec    1.19    236.2±0.36ms  1118.3 MB/sec
string_non_null/bloom_filter                       1.01   256.4±16.13ms  2043.6 MB/sec    1.00   254.1±12.29ms     2.0 GB/sec
string_non_null/cdc                                1.00    269.9±9.26ms  1941.5 MB/sec    1.03   279.0±11.17ms  1878.1 MB/sec
string_non_null/default                            1.00   128.9±13.26ms     4.0 GB/sec    1.03   133.0±12.02ms     3.8 GB/sec
string_non_null/parquet_2                          1.00   141.2±11.53ms     3.6 GB/sec    1.09    154.6±0.46ms     3.3 GB/sec
string_non_null/zstd                               1.00    533.5±2.20ms   982.2 MB/sec    1.10   586.3±34.00ms   893.7 MB/sec
string_non_null/zstd_parquet_2                     1.00    506.4±2.28ms  1034.7 MB/sec    1.03   520.9±10.85ms  1005.9 MB/sec
struct_all_null/bloom_filter                       1.01      2.5±0.01ms     6.2 GB/sec    1.00      2.5±0.00ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.8±0.13ms  1640.5 MB/sec    1.00      9.8±0.16ms  1638.0 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.02     48.6±0.29ms   329.1 MB/sec    1.00     47.6±0.16ms   336.2 MB/sec
struct_non_null/cdc                                1.00     46.1±0.21ms   347.3 MB/sec    1.00     46.0±0.16ms   347.6 MB/sec
struct_non_null/default                            1.01     32.6±0.19ms   491.5 MB/sec    1.00     32.4±0.12ms   494.2 MB/sec
struct_non_null/parquet_2                          1.01     41.4±0.18ms   386.5 MB/sec    1.00     41.1±0.11ms   389.2 MB/sec
struct_non_null/zstd                               1.01     41.3±0.18ms   387.5 MB/sec    1.00     41.1±0.10ms   389.7 MB/sec
struct_non_null/zstd_parquet_2                     1.01     55.4±0.14ms   288.6 MB/sec    1.00     55.1±0.14ms   290.3 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.6±0.15ms     2.1 GB/sec    1.00      7.6±0.10ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.00     15.7±0.17ms  1028.6 MB/sec    1.00     15.7±0.18ms  1026.7 MB/sec
struct_sparse_99pct_null/default                   1.00      7.0±0.10ms     2.2 GB/sec    1.00      7.0±0.03ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.0±0.10ms     2.2 GB/sec    1.00      7.0±0.04ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.4±0.10ms  1909.6 MB/sec    1.00      8.4±0.04ms  1918.0 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.8±0.12ms     2.0 GB/sec    1.00      7.8±0.04ms     2.0 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1945.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1889.0s
CPU sys 55.7s
Peak spill 0 B

branch

Metric Value
Wall time 2105.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2019.6s
CPU sys 81.0s
Peak spill 0 B

File an issue against this benchmark runner

Two targeted regressions surfaced in the GKE benchmark sweep:

1. `string_dictionary/*` regressed +30-89 % vs `main` after writer-dict
   spill. The arrow Dictionary input falls through to the per-value
   walk via `ArrayAccessor::value`, which dereferences the dict
   (keys[idx] → values[key] → slice construction) for every index in
   every chunk. The whole point of the byte-budget check is to bound
   pages of large BYTE_ARRAY values, but an arrow column that's
   already Dictionary-encoded at the arrow layer implies its values
   are small enough that dedup is worthwhile — the opposite shape.
   Treat Dictionary input as "everything fits" and skip the check.

2. `list_primitive_sparse_99pct_null` regressed ~+8 % across props.
   The cost was `LevelDataRef::value_count`'s O(N) def-level scan on
   the 20 000-row compact-levels chunks the list path uses. The
   arrow path already has the answer cheaper: `value_indices` is the
   sorted list of non-null positions in the batch, so the count of
   indices falling in the current chunk's level range is a binary
   search (one `partition_point`). Use that when `value_indices` is
   `Some` and fall back to the def-level scan only on the non-arrow
   path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4452557769-94-2nbcr 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (bbe2b7e) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4453035593-96-5mrpz 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (bbe2b7e) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented May 14, 2026

Have you considered making the batch size configurable per column?

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.0±0.03ms    19.2 MB/sec    1.00     13.0±0.09ms    19.2 MB/sec
bool/cdc                                           1.00     15.6±0.04ms    16.0 MB/sec    1.00     15.7±0.12ms    16.0 MB/sec
bool/default                                       1.00     10.9±0.03ms    22.9 MB/sec    1.01     11.0±0.12ms    22.8 MB/sec
bool/parquet_2                                     1.00     14.7±0.04ms    17.0 MB/sec    1.00     14.8±0.12ms    16.9 MB/sec
bool/zstd                                          1.00     11.4±0.03ms    21.9 MB/sec    1.00     11.5±0.10ms    21.8 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.04ms    16.6 MB/sec    1.00     15.1±0.12ms    16.5 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.02ms    17.8 MB/sec    1.00      7.0±0.02ms    17.8 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.04ms    18.4 MB/sec    1.00      6.8±0.03ms    18.4 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.00      4.3±0.02ms    29.2 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    13.8 MB/sec    1.00      9.1±0.03ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.00      4.6±0.02ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.04ms    13.2 MB/sec    1.00      9.5±0.04ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     92.4±0.36ms   151.5 MB/sec    1.04     96.2±0.72ms   145.5 MB/sec
float_with_nans/cdc                                1.00     81.2±0.20ms   172.4 MB/sec    1.03     83.7±1.01ms   167.3 MB/sec
float_with_nans/default                            1.00     74.1±0.24ms   188.9 MB/sec    1.02     75.9±0.43ms   184.4 MB/sec
float_with_nans/parquet_2                          1.00     94.1±0.39ms   148.8 MB/sec    1.03     97.3±0.55ms   143.8 MB/sec
float_with_nans/zstd                               1.00    111.6±0.21ms   125.4 MB/sec    1.02    114.2±1.07ms   122.6 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.1±0.39ms   106.8 MB/sec    1.03    134.9±0.57ms   103.8 MB/sec
large_string_non_null/bloom_filter                                                        1.00     82.2±0.16ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    242.6±1.01ms  1055.1 MB/sec
large_string_non_null/default                                                             1.00     63.1±0.22ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     63.3±0.20ms     4.0 GB/sec
large_string_non_null/zstd                                                                1.00     63.4±0.22ms     3.9 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     63.4±0.23ms     3.9 GB/sec
list_primitive/bloom_filter                        1.00    323.2±0.77ms  1687.2 MB/sec    1.05    340.5±3.10ms  1601.6 MB/sec
list_primitive/cdc                                 1.00    357.4±1.04ms  1526.1 MB/sec    1.02    364.5±2.63ms  1496.3 MB/sec
list_primitive/default                             1.00   255.4±46.59ms     2.1 GB/sec    1.00    255.8±1.89ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    268.4±0.72ms  2031.7 MB/sec    1.02    272.7±1.05ms  1999.7 MB/sec
list_primitive/zstd                                1.00    500.1±1.93ms  1090.5 MB/sec    1.02    511.1±1.79ms  1067.0 MB/sec
list_primitive/zstd_parquet_2                      1.00    490.5±0.49ms  1111.9 MB/sec    1.02    500.3±2.01ms  1090.1 MB/sec
list_primitive_non_null/bloom_filter               1.00    418.7±6.33ms  1300.0 MB/sec    1.26   526.0±24.06ms  1034.8 MB/sec
list_primitive_non_null/cdc                        1.00    440.4±7.41ms  1235.7 MB/sec    1.01    442.8±6.00ms  1229.0 MB/sec
list_primitive_non_null/default                    1.00    285.9±4.54ms  1903.7 MB/sec    1.32   376.1±18.32ms  1446.9 MB/sec
list_primitive_non_null/parquet_2                  1.00   306.8±12.97ms  1773.9 MB/sec    1.30    397.9±5.13ms  1367.7 MB/sec
list_primitive_non_null/zstd                       1.00    719.5±6.21ms   756.4 MB/sec    1.09   782.7±16.12ms   695.4 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    686.4±1.31ms   792.9 MB/sec    1.09    749.3±3.12ms   726.3 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.1±0.22ms     3.3 GB/sec    1.05     11.7±0.11ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.5±0.09ms  1663.4 MB/sec    1.03     23.1±0.11ms  1615.5 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.7±0.04ms     3.4 GB/sec    1.05     11.2±0.11ms     3.2 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.7±0.04ms     3.4 GB/sec    1.04     11.2±0.05ms     3.3 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.6±0.05ms     2.9 GB/sec    1.04     13.1±0.05ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.9±0.04ms     3.4 GB/sec    1.04     11.4±0.08ms     3.2 GB/sec
primitive/bloom_filter                             1.00    148.3±0.47ms   302.7 MB/sec    1.00    148.8±0.66ms   301.6 MB/sec
primitive/cdc                                      1.00    158.7±0.53ms   282.8 MB/sec    1.00    158.8±0.86ms   282.6 MB/sec
primitive/default                                  1.00    117.3±0.24ms   382.6 MB/sec    1.01    117.9±0.68ms   380.5 MB/sec
primitive/parquet_2                                1.00    132.2±0.27ms   339.4 MB/sec    1.00    132.7±0.66ms   338.1 MB/sec
primitive/zstd                                     1.00    146.8±0.28ms   305.8 MB/sec    1.01    147.7±0.71ms   303.7 MB/sec
primitive/zstd_parquet_2                           1.00    165.5±0.33ms   271.1 MB/sec    1.00    166.1±0.59ms   270.1 MB/sec
primitive_all_null/bloom_filter                    1.00     11.5±0.11ms     3.8 GB/sec    1.02     11.8±0.17ms     3.7 GB/sec
primitive_all_null/cdc                             1.03     30.6±0.39ms  1465.9 MB/sec    1.00     29.8±0.46ms  1503.4 MB/sec
primitive_all_null/default                         1.00     10.9±0.21ms     4.0 GB/sec    1.00     10.9±0.11ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.01     11.0±0.20ms     4.0 GB/sec    1.00     10.9±0.15ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.14ms     4.0 GB/sec    1.02     11.2±0.21ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.16ms     4.0 GB/sec    1.01     11.1±0.22ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.08    113.3±1.61ms   388.4 MB/sec    1.00    105.2±0.20ms   418.4 MB/sec
primitive_non_null/cdc                             1.01     90.0±0.50ms   489.1 MB/sec    1.00     89.3±0.29ms   492.9 MB/sec
primitive_non_null/default                         1.01     67.4±0.22ms   653.2 MB/sec    1.00     67.0±0.18ms   656.5 MB/sec
primitive_non_null/parquet_2                       1.00     89.1±0.20ms   493.9 MB/sec    1.00     88.8±0.14ms   495.7 MB/sec
primitive_non_null/zstd                            1.08    105.6±0.45ms   416.8 MB/sec    1.00     97.6±0.12ms   450.8 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    130.1±1.86ms   338.1 MB/sec    1.00    122.2±0.18ms   360.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.3±0.16ms     2.4 GB/sec    1.06     19.3±0.26ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.00     37.2±0.38ms  1207.0 MB/sec    1.00     37.2±0.37ms  1205.8 MB/sec
primitive_sparse_99pct_null/default                1.00     16.8±0.06ms     2.6 GB/sec    1.03     17.3±0.08ms     2.5 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     16.7±0.06ms     2.6 GB/sec    1.03     17.3±0.10ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.0±0.10ms     2.2 GB/sec    1.03     20.6±0.12ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     18.7±0.09ms     2.3 GB/sec    1.03     19.3±0.12ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     29.7±0.13ms   403.9 MB/sec
short_string_non_null/cdc                                                                 1.00     20.2±0.06ms   594.2 MB/sec
short_string_non_null/default                                                             1.00     16.4±0.13ms   733.4 MB/sec
short_string_non_null/parquet_2                                                           1.00     26.0±0.14ms   461.6 MB/sec
short_string_non_null/zstd                                                                1.00     38.2±6.26ms   314.3 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     29.1±0.18ms   412.8 MB/sec
string/bloom_filter                                1.05   228.5±26.93ms     2.2 GB/sec    1.00   217.9±17.20ms     2.3 GB/sec
string/cdc                                         1.00    220.9±5.71ms     2.3 GB/sec    1.00    220.6±5.33ms     2.3 GB/sec
string/default                                     1.15   140.0±24.70ms     3.7 GB/sec    1.00   122.1±13.31ms     4.2 GB/sec
string/parquet_2                                   1.01    126.4±1.85ms     4.1 GB/sec    1.00    124.8±1.03ms     4.1 GB/sec
string/zstd                                        1.01    424.6±2.74ms  1234.6 MB/sec    1.00    421.6±1.32ms  1243.6 MB/sec
string/zstd_parquet_2                              1.00    394.7±1.28ms  1328.2 MB/sec    1.02    401.3±0.35ms  1306.2 MB/sec
string_and_binary_view/bloom_filter                1.00     63.4±0.24ms   508.7 MB/sec    1.09     69.2±0.17ms   465.9 MB/sec
string_and_binary_view/cdc                         1.00     58.3±0.12ms   553.1 MB/sec    1.05     61.4±0.15ms   525.6 MB/sec
string_and_binary_view/default                     1.00     47.7±0.10ms   676.3 MB/sec    1.10     52.5±0.18ms   614.2 MB/sec
string_and_binary_view/parquet_2                   1.00     58.5±0.12ms   551.0 MB/sec    1.08     63.2±0.32ms   510.4 MB/sec
string_and_binary_view/zstd                        1.00     84.2±0.15ms   382.9 MB/sec    1.06     89.3±0.37ms   360.9 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.4±0.11ms   445.6 MB/sec    1.07     77.1±0.16ms   418.0 MB/sec
string_dictionary/bloom_filter                     1.00     88.8±0.66ms     2.9 GB/sec    1.54    136.8±2.03ms  1930.6 MB/sec
string_dictionary/cdc                              1.00     86.6±2.67ms     3.0 GB/sec    1.14     98.8±3.49ms     2.6 GB/sec
string_dictionary/default                          1.00     48.2±0.34ms     5.3 GB/sec    1.91     91.9±2.19ms     2.8 GB/sec
string_dictionary/parquet_2                        1.00     53.8±0.16ms     4.8 GB/sec    1.94    104.2±2.36ms     2.5 GB/sec
string_dictionary/zstd                             1.00    208.7±0.68ms  1265.8 MB/sec    1.07   223.1±14.95ms  1184.1 MB/sec
string_dictionary/zstd_parquet_2                   1.00    197.8±0.12ms  1335.0 MB/sec    1.20    236.6±1.77ms  1116.4 MB/sec
string_non_null/bloom_filter                       1.00   252.4±15.95ms     2.0 GB/sec    1.01   254.2±12.25ms     2.0 GB/sec
string_non_null/cdc                                1.00    267.6±9.15ms  1958.5 MB/sec    1.06   283.6±12.31ms  1847.7 MB/sec
string_non_null/default                            1.00   126.2±12.63ms     4.1 GB/sec    1.07   135.2±12.42ms     3.8 GB/sec
string_non_null/parquet_2                          1.00   141.3±12.34ms     3.6 GB/sec    1.11    157.1±1.99ms     3.3 GB/sec
string_non_null/zstd                               1.00    531.4±2.22ms   986.0 MB/sec    1.11   589.4±34.73ms   889.0 MB/sec
string_non_null/zstd_parquet_2                     1.00    505.7±2.52ms  1036.1 MB/sec    1.04   527.6±11.72ms   993.2 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.00ms     6.2 GB/sec    1.01      2.6±0.02ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.9±0.12ms  1634.9 MB/sec    1.01     10.0±0.12ms  1611.5 MB/sec
struct_all_null/default                            1.00      2.2±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     47.1±0.20ms   339.8 MB/sec    1.02     47.9±0.83ms   334.2 MB/sec
struct_non_null/cdc                                1.00     45.7±0.22ms   350.3 MB/sec    1.01     46.1±0.46ms   346.9 MB/sec
struct_non_null/default                            1.00     32.1±0.15ms   499.0 MB/sec    1.01     32.4±0.44ms   493.3 MB/sec
struct_non_null/parquet_2                          1.00     40.8±0.51ms   392.0 MB/sec    1.01     41.2±0.49ms   388.4 MB/sec
struct_non_null/zstd                               1.00     40.8±0.11ms   392.1 MB/sec    1.01     41.2±0.54ms   387.9 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.9±0.17ms   291.5 MB/sec    1.03     56.3±2.02ms   284.1 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.5±0.05ms     2.1 GB/sec    1.07      8.0±0.11ms  2019.0 MB/sec
struct_sparse_99pct_null/cdc                       1.00     15.3±0.08ms  1051.6 MB/sec    1.01     15.5±0.09ms  1040.0 MB/sec
struct_sparse_99pct_null/default                   1.00      7.0±0.04ms     2.3 GB/sec    1.04      7.3±0.07ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.9±0.03ms     2.3 GB/sec    1.05      7.3±0.07ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.3±0.02ms  1949.3 MB/sec    1.05      8.7±0.07ms  1864.2 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.7±0.02ms     2.0 GB/sec    1.05      8.1±0.04ms  1998.4 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1940.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1879.7s
CPU sys 57.5s
Peak spill 0 B

branch

Metric Value
Wall time 2155.5s
Peak memory 6.8 GiB
Avg memory 6.7 GiB
CPU user 2078.6s
CPU sys 76.3s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

Have you considered making the batch size configurable per column?

Yes, that may be a simpler approach. But I'm hoping we can get to a place where users don't have to think about / configure this. Given they gave us a page size limit it'd be nice if we can always adhere to that...

Comment thread parquet/src/data_type.rs Outdated
/// push a page far past the configured page byte limit before the
/// post-write size check fires.
#[inline]
fn byte_size(&self) -> usize {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to duplicate dict_encoding_size. Also, #9700 wants to rename dict_encoding_size and instead implement it pretty much the same way as here.

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented May 14, 2026

Another thought...maybe add another chunker like the CDC work added (

fn write_with_chunker(
). If we compute batches up front when we know the shape of the data that might be faster 🤷

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.01     13.2±0.08ms    18.9 MB/sec    1.00     13.1±0.06ms    19.1 MB/sec
bool/cdc                                           1.01     16.0±0.08ms    15.6 MB/sec    1.00     15.8±0.26ms    15.8 MB/sec
bool/default                                       1.01     11.0±0.07ms    22.6 MB/sec    1.00     10.9±0.09ms    22.9 MB/sec
bool/parquet_2                                     1.00     14.7±0.05ms    17.0 MB/sec    1.00     14.7±0.07ms    17.0 MB/sec
bool/zstd                                          1.02     11.6±0.06ms    21.6 MB/sec    1.00     11.4±0.06ms    22.0 MB/sec
bool/zstd_parquet_2                                1.01     15.2±0.08ms    16.5 MB/sec    1.00     15.1±0.27ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.1±0.04ms    17.6 MB/sec    1.00      7.1±0.15ms    17.6 MB/sec
bool_non_null/cdc                                  1.00      7.0±0.05ms    17.9 MB/sec    1.00      7.0±0.13ms    17.8 MB/sec
bool_non_null/default                              1.00      4.3±0.03ms    29.1 MB/sec    1.00      4.3±0.10ms    29.0 MB/sec
bool_non_null/parquet_2                            1.00      9.1±0.05ms    13.7 MB/sec    1.00      9.1±0.24ms    13.7 MB/sec
bool_non_null/zstd                                 1.00      4.7±0.02ms    26.9 MB/sec    1.00      4.6±0.03ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.05ms    13.1 MB/sec    1.00      9.5±0.24ms    13.1 MB/sec
float_with_nans/bloom_filter                       1.00     95.7±2.15ms   146.3 MB/sec    1.01     97.0±2.24ms   144.3 MB/sec
float_with_nans/cdc                                1.00     82.9±1.46ms   168.8 MB/sec    1.03     85.0±1.07ms   164.7 MB/sec
float_with_nans/default                            1.00     76.1±2.46ms   184.0 MB/sec    1.01     76.9±1.74ms   182.0 MB/sec
float_with_nans/parquet_2                          1.00     97.5±2.31ms   143.7 MB/sec    1.00     97.7±2.52ms   143.3 MB/sec
float_with_nans/zstd                               1.01    114.5±2.02ms   122.2 MB/sec    1.00    113.0±1.76ms   123.9 MB/sec
float_with_nans/zstd_parquet_2                     1.00    134.1±2.59ms   104.4 MB/sec    1.01    135.2±2.50ms   103.6 MB/sec
large_string_non_null/bloom_filter                                                        1.00     84.5±3.68ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    244.0±2.19ms  1049.1 MB/sec
large_string_non_null/default                                                             1.00     64.3±2.55ms     3.9 GB/sec
large_string_non_null/parquet_2                                                           1.00     63.7±3.92ms     3.9 GB/sec
large_string_non_null/zstd                                                                1.00     60.7±0.26ms     4.1 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     62.2±1.88ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    340.8±6.92ms  1600.0 MB/sec    1.08   369.1±15.47ms  1477.7 MB/sec
list_primitive/cdc                                 1.00    364.0±3.08ms  1498.2 MB/sec    1.02    371.0±8.13ms  1469.8 MB/sec
list_primitive/default                             1.00    255.5±6.60ms     2.1 GB/sec    1.07    274.6±8.33ms  1986.2 MB/sec
list_primitive/parquet_2                           1.00    271.3±3.24ms  2010.0 MB/sec    1.06    288.7±2.44ms  1888.7 MB/sec
list_primitive/zstd                                1.00    506.5±6.71ms  1076.6 MB/sec    1.03    520.8±7.25ms  1047.1 MB/sec
list_primitive/zstd_parquet_2                      1.00    496.0±3.27ms  1099.5 MB/sec    1.00    496.6±4.31ms  1098.2 MB/sec
list_primitive_non_null/bloom_filter               1.00   447.4±21.93ms  1216.4 MB/sec    1.16   517.1±33.75ms  1052.5 MB/sec
list_primitive_non_null/cdc                        1.01   450.7±12.72ms  1207.4 MB/sec    1.00   447.8±11.71ms  1215.3 MB/sec
list_primitive_non_null/default                    1.00   303.8±11.10ms  1791.3 MB/sec    1.25   379.8±19.28ms  1432.9 MB/sec
list_primitive_non_null/parquet_2                  1.00   318.8±16.26ms  1707.3 MB/sec    1.33   422.5±16.94ms  1288.2 MB/sec
list_primitive_non_null/zstd                       1.00   735.5±14.23ms   740.0 MB/sec    1.07   783.7±19.04ms   694.4 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    702.1±6.10ms   775.1 MB/sec    1.06    747.6±4.61ms   728.0 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.3±0.24ms     3.2 GB/sec    1.04     11.7±0.06ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.02     23.0±0.23ms  1625.1 MB/sec    1.00     22.6±0.10ms  1652.1 MB/sec
list_primitive_sparse_99pct_null/default           1.03     11.2±0.05ms     3.3 GB/sec    1.00     10.9±0.04ms     3.4 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.2±0.14ms     3.3 GB/sec    1.02     11.3±0.07ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     13.1±0.29ms     2.8 GB/sec    1.01     13.2±0.29ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.9±0.05ms     3.3 GB/sec    1.03     11.3±0.31ms     3.2 GB/sec
primitive/bloom_filter                             1.02    156.1±3.23ms   287.4 MB/sec    1.00    153.2±2.93ms   293.0 MB/sec
primitive/cdc                                      1.00    161.7±3.06ms   277.5 MB/sec    1.01    162.6±3.18ms   276.1 MB/sec
primitive/default                                  1.02    120.6±0.82ms   372.0 MB/sec    1.00    118.3±1.10ms   379.3 MB/sec
primitive/parquet_2                                1.00    134.6±2.25ms   333.5 MB/sec    1.00    134.9±1.97ms   332.7 MB/sec
primitive/zstd                                     1.01    149.1±1.65ms   300.9 MB/sec    1.00    147.7±0.76ms   303.8 MB/sec
primitive/zstd_parquet_2                           1.00    168.8±1.53ms   265.9 MB/sec    1.00    168.8±1.52ms   265.8 MB/sec
primitive_all_null/bloom_filter                    1.00     11.9±0.18ms     3.7 GB/sec    1.00     11.8±0.23ms     3.7 GB/sec
primitive_all_null/cdc                             1.03     30.9±0.49ms  1450.1 MB/sec    1.00     30.1±0.51ms  1492.4 MB/sec
primitive_all_null/default                         1.00     10.9±0.11ms     4.0 GB/sec    1.00     11.0±0.16ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     11.0±0.18ms     4.0 GB/sec    1.00     11.0±0.23ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.13ms     4.0 GB/sec    1.00     11.0±0.17ms     4.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.01     11.0±0.15ms     4.0 GB/sec    1.00     10.9±0.10ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.09    117.1±2.94ms   375.6 MB/sec    1.00    107.7±1.65ms   408.5 MB/sec
primitive_non_null/cdc                             1.00     92.0±1.42ms   478.3 MB/sec    1.00     91.8±1.31ms   479.5 MB/sec
primitive_non_null/default                         1.03     69.7±1.09ms   631.7 MB/sec    1.00     67.7±0.30ms   649.9 MB/sec
primitive_non_null/parquet_2                       1.01     91.7±1.91ms   479.9 MB/sec    1.00     91.2±1.32ms   482.7 MB/sec
primitive_non_null/zstd                            1.07    107.1±2.41ms   410.8 MB/sec    1.00     99.8±1.92ms   441.0 MB/sec
primitive_non_null/zstd_parquet_2                  1.05    131.4±1.99ms   334.8 MB/sec    1.00    124.6±1.30ms   353.2 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.01     19.0±0.38ms     2.3 GB/sec    1.00     18.9±0.49ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.01     37.5±0.40ms  1198.2 MB/sec    1.00     37.1±0.54ms  1208.8 MB/sec
primitive_sparse_99pct_null/default                1.02     17.3±0.16ms     2.5 GB/sec    1.00     16.9±0.06ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.4±0.09ms     2.5 GB/sec    1.00     17.5±0.08ms     2.5 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.4±0.24ms     2.1 GB/sec    1.02     20.7±0.28ms     2.1 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     19.2±0.38ms     2.3 GB/sec    1.01     19.4±0.40ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     29.7±0.21ms   404.3 MB/sec
short_string_non_null/cdc                                                                 1.00     20.6±0.17ms   581.8 MB/sec
short_string_non_null/default                                                             1.00     16.9±0.20ms   710.8 MB/sec
short_string_non_null/parquet_2                                                           1.00     26.5±0.24ms   452.8 MB/sec
short_string_non_null/zstd                                                                1.00     36.5±0.12ms   328.8 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     29.2±0.17ms   410.8 MB/sec
string/bloom_filter                                1.09   231.5±23.89ms     2.2 GB/sec    1.00   211.7±14.75ms     2.4 GB/sec
string/cdc                                         1.01    226.1±9.11ms     2.3 GB/sec    1.00   224.7±11.94ms     2.3 GB/sec
string/default                                     1.12   146.7±24.93ms     3.5 GB/sec    1.00   131.6±11.00ms     3.9 GB/sec
string/parquet_2                                   1.07    129.2±2.88ms     4.0 GB/sec    1.00    120.6±6.92ms     4.2 GB/sec
string/zstd                                        1.00    430.5±8.49ms  1217.7 MB/sec    1.03    441.5±9.87ms  1187.4 MB/sec
string/zstd_parquet_2                              1.00    397.8±4.26ms  1318.0 MB/sec    1.02    406.3±4.56ms  1290.2 MB/sec
string_and_binary_view/bloom_filter                1.00     67.2±3.49ms   479.6 MB/sec    1.01     68.2±0.39ms   472.6 MB/sec
string_and_binary_view/cdc                         1.00     59.4±1.21ms   543.2 MB/sec    1.06     62.8±2.45ms   513.8 MB/sec
string_and_binary_view/default                     1.00     48.7±0.95ms   662.0 MB/sec    1.11     54.0±2.59ms   597.0 MB/sec
string_and_binary_view/parquet_2                   1.00     58.9±0.58ms   547.7 MB/sec    1.11     65.2±0.86ms   494.4 MB/sec
string_and_binary_view/zstd                        1.00     85.5±0.95ms   377.2 MB/sec    1.06     90.8±1.48ms   355.0 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     74.2±0.66ms   434.4 MB/sec    1.07     79.3±2.37ms   406.8 MB/sec
string_dictionary/bloom_filter                     1.03    104.5±0.90ms     2.5 GB/sec    1.00    101.1±5.73ms     2.6 GB/sec
string_dictionary/cdc                              1.38     78.4±2.15ms     3.3 GB/sec    1.00     56.7±3.11ms     4.6 GB/sec
string_dictionary/default                          1.29     65.3±3.55ms     3.9 GB/sec    1.00     50.6±0.56ms     5.1 GB/sec
string_dictionary/parquet_2                        1.19     67.8±0.44ms     3.8 GB/sec    1.00     56.8±1.57ms     4.5 GB/sec
string_dictionary/zstd                             1.02    218.7±3.32ms  1207.8 MB/sec    1.00    215.4±4.94ms  1226.0 MB/sec
string_dictionary/zstd_parquet_2                   1.00    200.1±2.51ms  1319.8 MB/sec    1.01    201.9±1.54ms  1308.5 MB/sec
string_non_null/bloom_filter                       1.03   270.4±28.27ms  1937.7 MB/sec    1.00   263.2±19.14ms  1990.8 MB/sec
string_non_null/cdc                                1.01   274.5±13.60ms  1909.1 MB/sec    1.00   272.9±11.94ms  1919.9 MB/sec
string_non_null/default                            1.00   148.1±14.93ms     3.5 GB/sec    1.03   152.1±16.41ms     3.4 GB/sec
string_non_null/parquet_2                          1.00    145.3±9.80ms     3.5 GB/sec    1.08    156.8±3.28ms     3.3 GB/sec
string_non_null/zstd                               1.00   573.6±17.33ms   913.5 MB/sec    1.02   584.3±20.28ms   896.8 MB/sec
string_non_null/zstd_parquet_2                     1.00   527.8±11.88ms   992.8 MB/sec    1.00   529.4±13.85ms   989.8 MB/sec
struct_all_null/bloom_filter                       1.00      2.6±0.04ms     6.1 GB/sec    1.01      2.6±0.04ms     6.1 GB/sec
struct_all_null/cdc                                1.00      9.9±0.17ms  1637.0 MB/sec    1.02     10.0±0.13ms  1612.6 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     47.1±0.89ms   339.4 MB/sec    1.03     48.4±1.17ms   330.9 MB/sec
struct_non_null/cdc                                1.00     46.4±0.36ms   344.8 MB/sec    1.00     46.3±0.38ms   345.9 MB/sec
struct_non_null/default                            1.01     32.8±0.62ms   487.6 MB/sec    1.00     32.6±0.23ms   490.3 MB/sec
struct_non_null/parquet_2                          1.00     40.9±0.33ms   391.4 MB/sec    1.01     41.4±0.79ms   386.5 MB/sec
struct_non_null/zstd                               1.02     41.3±0.53ms   387.0 MB/sec    1.00     40.7±0.23ms   392.8 MB/sec
struct_non_null/zstd_parquet_2                     1.01     55.3±0.45ms   289.4 MB/sec    1.00     54.9±0.14ms   291.6 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.9±0.36ms  2033.3 MB/sec    1.02      8.1±0.33ms  1993.7 MB/sec
struct_sparse_99pct_null/cdc                       1.04     16.0±0.15ms  1005.3 MB/sec    1.00     15.5±0.39ms  1041.4 MB/sec
struct_sparse_99pct_null/default                   1.00      7.2±0.18ms     2.2 GB/sec    1.03      7.4±0.08ms     2.1 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      7.0±0.12ms     2.2 GB/sec    1.03      7.3±0.22ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.05      8.9±0.27ms  1820.8 MB/sec    1.00      8.5±0.09ms  1903.6 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.8±0.10ms     2.0 GB/sec    1.03      8.1±0.22ms  1992.9 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1970.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1897.3s
CPU sys 71.4s
Peak spill 0 B

branch

Metric Value
Wall time 2145.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2083.9s
CPU sys 60.2s
Peak spill 0 B

File an issue against this benchmark runner

…dgetChunker

Two small structural cleanups in response to PR review:

- Remove `ParquetValueType::byte_size`. It overlapped with
  `dict_encoding_size`, which @etseidl pointed out is being renamed
  and generalized in apache#9700. Instead, compute the per-value plain-
  encoded byte cost inline in
  `ColumnValueEncoderImpl::count_values_within_byte_budget` from
  `dict_encoding_size`'s components, dispatched on the physical type
  (same dispatch shape as `DictEncoder::push` in
  `encodings/encoding/dict_encoder.rs:52`). No new trait method.

- Lift the byte-budget mini-batch sizing decision out of
  `write_batch_internal` into a new `ByteBudgetChunker` struct
  (`column/writer/byte_budget_chunker.rs`). The chunker captures the
  column-open-time facts (page byte limit, static-fits flag,
  max_def_level) once and exposes one `pick_sub_batch_size` method.
  `write_batch_internal`'s inner loop is now ~25 lines shorter and
  reads as: compute chunk boundary → ask chunker for sub_batch_size
  → write_mini_batch or write_granular_chunk.

  This is the lightweight version of the "make it a chunker like CDC"
  suggestion. A full CDC-style pre-compute would emit all chunk
  boundaries upfront, but the byte budget decision depends on the
  encoder's live `has_dictionary()` state, which changes mid-batch
  when the writer's dictionary spills. Querying that per chunk (as
  this refactor does) preserves the existing dict-active short-
  circuit; a precomputed plan would force a choice between losing
  that short-circuit or losing correctness when dict spills mid-batch
  on large-value columns.

No behavior change. Tests still pass and `cargo bench` shows the same
deltas as before the refactor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4453799534-97-tqn4k 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (145ea5d) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.1±0.04ms    19.1 MB/sec    1.00     13.1±0.06ms    19.1 MB/sec
bool/cdc                                           1.00     15.7±0.05ms    15.9 MB/sec    1.00     15.7±0.07ms    16.0 MB/sec
bool/default                                       1.01     11.0±0.03ms    22.8 MB/sec    1.00     10.9±0.05ms    23.0 MB/sec
bool/parquet_2                                     1.00     14.7±0.04ms    17.0 MB/sec    1.00     14.7±0.06ms    17.0 MB/sec
bool/zstd                                          1.01     11.5±0.03ms    21.8 MB/sec    1.00     11.4±0.05ms    22.0 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.05ms    16.6 MB/sec    1.00     15.1±0.06ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.02ms    17.8 MB/sec    1.00      7.0±0.04ms    17.8 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.03ms    18.3 MB/sec    1.00      6.8±0.02ms    18.4 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.01      4.3±0.03ms    29.0 MB/sec
bool_non_null/parquet_2                            1.00      9.1±0.04ms    13.8 MB/sec    1.00      9.1±0.03ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.03ms    27.1 MB/sec    1.00      4.6±0.02ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.03ms    13.2 MB/sec    1.00      9.5±0.03ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     92.9±0.44ms   150.7 MB/sec    1.01     93.8±0.42ms   149.3 MB/sec
float_with_nans/cdc                                1.00     81.7±0.31ms   171.4 MB/sec    1.00     81.7±0.16ms   171.4 MB/sec
float_with_nans/default                            1.00     74.2±0.24ms   188.8 MB/sec    1.00     74.3±0.22ms   188.4 MB/sec
float_with_nans/parquet_2                          1.00     94.4±0.36ms   148.2 MB/sec    1.00     94.7±0.21ms   147.8 MB/sec
float_with_nans/zstd                               1.00    111.9±0.18ms   125.1 MB/sec    1.00    112.1±0.27ms   124.9 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.7±0.42ms   106.3 MB/sec    1.00    132.0±0.23ms   106.1 MB/sec
large_string_non_null/bloom_filter                                                        1.00     82.1±0.48ms     3.0 GB/sec
large_string_non_null/cdc                                                                 1.00    242.9±1.24ms  1053.9 MB/sec
large_string_non_null/default                                                             1.00     62.4±0.23ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     62.5±0.27ms     4.0 GB/sec
large_string_non_null/zstd                                                                1.00     63.0±0.64ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     62.7±0.16ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    325.3±0.40ms  1676.6 MB/sec    1.01    328.9±0.91ms  1658.3 MB/sec
list_primitive/cdc                                 1.00    357.8±0.71ms  1524.2 MB/sec    1.00    358.9±0.82ms  1519.5 MB/sec
list_primitive/default                             1.00    247.0±0.43ms     2.2 GB/sec    1.01    249.0±1.25ms     2.1 GB/sec
list_primitive/parquet_2                           1.00    268.1±0.38ms  2034.3 MB/sec    1.01    270.2±0.88ms  2018.7 MB/sec
list_primitive/zstd                                1.00    498.2±3.24ms  1094.7 MB/sec    1.00    496.2±2.08ms  1099.0 MB/sec
list_primitive/zstd_parquet_2                      1.00    491.3±0.40ms  1110.0 MB/sec    1.01    494.8±0.41ms  1102.1 MB/sec
list_primitive_non_null/bloom_filter               1.00    420.2±4.43ms  1295.2 MB/sec    1.19   499.9±20.74ms  1088.6 MB/sec
list_primitive_non_null/cdc                        1.00    438.7±7.00ms  1240.6 MB/sec    1.00    437.6±8.86ms  1243.8 MB/sec
list_primitive_non_null/default                    1.00    285.3±2.86ms  1907.8 MB/sec    1.29   368.4±13.93ms  1477.1 MB/sec
list_primitive_non_null/parquet_2                  1.00   309.4±12.90ms  1759.2 MB/sec    1.30    403.4±1.10ms  1349.3 MB/sec
list_primitive_non_null/zstd                       1.00    713.7±3.94ms   762.6 MB/sec    1.08   770.2±14.80ms   706.6 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    683.7±2.16ms   796.0 MB/sec    1.13    771.0±0.90ms   705.9 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.0±0.06ms     3.3 GB/sec    1.07     11.8±0.05ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.4±0.08ms  1669.1 MB/sec    1.03     23.1±0.07ms  1619.4 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.7±0.06ms     3.4 GB/sec    1.07     11.4±0.04ms     3.2 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.7±0.05ms     3.4 GB/sec    1.06     11.4±0.03ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.5±0.06ms     2.9 GB/sec    1.06     13.2±0.04ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.9±0.07ms     3.3 GB/sec    1.06     11.6±0.03ms     3.2 GB/sec
primitive/bloom_filter                             1.00    148.6±0.64ms   302.0 MB/sec    1.01    149.7±0.43ms   299.8 MB/sec
primitive/cdc                                      1.00    158.3±0.51ms   283.5 MB/sec    1.01    159.6±0.90ms   281.1 MB/sec
primitive/default                                  1.00    117.0±0.21ms   383.5 MB/sec    1.00    117.5±0.35ms   381.9 MB/sec
primitive/parquet_2                                1.00    132.0±0.24ms   340.0 MB/sec    1.01    132.7±0.42ms   338.2 MB/sec
primitive/zstd                                     1.00    146.5±0.24ms   306.4 MB/sec    1.00    146.7±0.31ms   305.9 MB/sec
primitive/zstd_parquet_2                           1.00    165.3±0.47ms   271.4 MB/sec    1.00    165.6±0.32ms   271.1 MB/sec
primitive_all_null/bloom_filter                    1.00     11.5±0.13ms     3.8 GB/sec    1.01     11.6±0.21ms     3.8 GB/sec
primitive_all_null/cdc                             1.00     30.6±0.34ms  1468.7 MB/sec    1.00     30.7±0.40ms  1463.2 MB/sec
primitive_all_null/default                         1.00     11.0±0.18ms     4.0 GB/sec    1.00     11.0±0.22ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     11.0±0.26ms     4.0 GB/sec    1.00     11.0±0.30ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.0±0.14ms     4.0 GB/sec    1.00     11.1±0.18ms     4.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.16ms     4.0 GB/sec    1.00     11.0±0.10ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.03    110.2±1.22ms   399.2 MB/sec    1.00    107.1±0.56ms   410.7 MB/sec
primitive_non_null/cdc                             1.00     89.5±0.49ms   491.8 MB/sec    1.01     90.2±0.51ms   487.8 MB/sec
primitive_non_null/default                         1.00     67.1±0.19ms   656.1 MB/sec    1.01     67.6±0.28ms   650.9 MB/sec
primitive_non_null/parquet_2                       1.00     88.7±0.26ms   495.9 MB/sec    1.00     88.8±0.19ms   495.6 MB/sec
primitive_non_null/zstd                            1.06    104.0±0.24ms   422.9 MB/sec    1.00     97.9±0.18ms   449.2 MB/sec
primitive_non_null/zstd_parquet_2                  1.05    129.0±1.66ms   341.1 MB/sec    1.00    122.5±0.23ms   359.2 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.0±0.10ms     2.4 GB/sec    1.01     18.3±0.15ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.04     38.7±0.76ms  1160.2 MB/sec    1.00     37.1±0.32ms  1210.2 MB/sec
primitive_sparse_99pct_null/default                1.00     16.7±0.05ms     2.6 GB/sec    1.01     16.9±0.15ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     16.7±0.05ms     2.6 GB/sec    1.01     16.8±0.05ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     19.9±0.06ms     2.2 GB/sec    1.01     20.1±0.07ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     18.7±0.12ms     2.3 GB/sec    1.00     18.7±0.07ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     29.2±0.14ms   410.7 MB/sec
short_string_non_null/cdc                                                                 1.00     20.1±0.07ms   595.9 MB/sec
short_string_non_null/default                                                             1.00     16.2±0.09ms   740.3 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.9±0.10ms   463.6 MB/sec
short_string_non_null/zstd                                                                1.00     36.1±0.17ms   332.6 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.8±0.12ms   417.3 MB/sec
string/bloom_filter                                1.04   223.2±21.32ms     2.3 GB/sec    1.00   213.7±15.31ms     2.4 GB/sec
string/cdc                                         1.00    220.8±5.52ms     2.3 GB/sec    1.00    220.3±5.34ms     2.3 GB/sec
string/default                                     1.14   137.4±22.13ms     3.7 GB/sec    1.00   120.7±11.67ms     4.2 GB/sec
string/parquet_2                                   1.03    124.2±0.35ms     4.1 GB/sec    1.00    121.0±1.02ms     4.2 GB/sec
string/zstd                                        1.00    423.9±3.46ms  1236.7 MB/sec    1.00    422.3±3.77ms  1241.5 MB/sec
string/zstd_parquet_2                              1.00    394.5±0.33ms  1329.0 MB/sec    1.02    400.5±0.53ms  1309.0 MB/sec
string_and_binary_view/bloom_filter                1.00     63.6±0.24ms   507.0 MB/sec    1.08     68.9±0.40ms   467.8 MB/sec
string_and_binary_view/cdc                         1.00     58.5±0.11ms   551.0 MB/sec    1.06     62.1±0.33ms   519.6 MB/sec
string_and_binary_view/default                     1.00     47.9±0.12ms   673.7 MB/sec    1.08     51.8±0.35ms   623.2 MB/sec
string_and_binary_view/parquet_2                   1.00     58.8±0.11ms   548.0 MB/sec    1.07     63.1±0.43ms   511.4 MB/sec
string_and_binary_view/zstd                        1.00     84.4±0.14ms   382.1 MB/sec    1.05     88.9±0.43ms   362.7 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.6±0.13ms   444.2 MB/sec    1.07     77.5±0.42ms   416.2 MB/sec
string_dictionary/bloom_filter                     1.00     89.8±0.64ms     2.9 GB/sec    1.50    134.7±0.90ms  1960.4 MB/sec
string_dictionary/cdc                              1.00     83.7±0.66ms     3.1 GB/sec    1.15     96.6±2.33ms     2.7 GB/sec
string_dictionary/default                          1.00     48.5±0.29ms     5.3 GB/sec    1.86     90.2±0.88ms     2.9 GB/sec
string_dictionary/parquet_2                        1.00     54.0±0.12ms     4.8 GB/sec    1.86    100.7±0.40ms     2.6 GB/sec
string_dictionary/zstd                             1.00    208.7±0.64ms  1265.4 MB/sec    1.07   223.0±14.59ms  1184.2 MB/sec
string_dictionary/zstd_parquet_2                   1.00    199.3±0.67ms  1325.5 MB/sec    1.18    234.6±0.38ms  1126.0 MB/sec
string_non_null/bloom_filter                       1.00   251.7±14.03ms     2.0 GB/sec    1.01   253.4±11.77ms     2.0 GB/sec
string_non_null/cdc                                1.00    268.1±8.64ms  1954.7 MB/sec    1.04   278.3±10.72ms  1882.9 MB/sec
string_non_null/default                            1.00   125.5±11.69ms     4.1 GB/sec    1.07   134.1±11.39ms     3.8 GB/sec
string_non_null/parquet_2                          1.00   140.1±10.96ms     3.7 GB/sec    1.12    156.3±1.02ms     3.3 GB/sec
string_non_null/zstd                               1.00    530.7±1.49ms   987.4 MB/sec    1.10   585.0±32.92ms   895.7 MB/sec
string_non_null/zstd_parquet_2                     1.00    506.8±2.24ms  1034.0 MB/sec    1.04   524.6±10.14ms   998.9 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.00ms     6.2 GB/sec    1.00      2.5±0.00ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.9±0.09ms  1634.1 MB/sec    1.00      9.9±0.16ms  1629.1 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.8 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     46.5±0.36ms   344.3 MB/sec    1.00     46.5±0.15ms   343.8 MB/sec
struct_non_null/cdc                                1.00     45.1±0.17ms   354.7 MB/sec    1.02     45.9±0.17ms   348.5 MB/sec
struct_non_null/default                            1.00     31.7±0.11ms   504.2 MB/sec    1.02     32.5±0.13ms   492.9 MB/sec
struct_non_null/parquet_2                          1.00     40.8±0.09ms   392.3 MB/sec    1.01     41.1±0.10ms   389.2 MB/sec
struct_non_null/zstd                               1.00     40.6±0.11ms   394.3 MB/sec    1.01     41.1±0.17ms   389.3 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.7±0.12ms   292.5 MB/sec    1.01     55.2±0.14ms   290.0 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.4±0.03ms     2.1 GB/sec    1.00      7.4±0.03ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.00     15.4±0.10ms  1049.8 MB/sec    1.00     15.4±0.09ms  1050.4 MB/sec
struct_sparse_99pct_null/default                   1.00      6.9±0.01ms     2.3 GB/sec    1.00      6.9±0.02ms     2.3 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.9±0.02ms     2.3 GB/sec    1.00      6.9±0.02ms     2.3 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.2±0.02ms  1958.1 MB/sec    1.00      8.3±0.02ms  1953.3 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.04      8.0±0.02ms  2024.1 MB/sec    1.00      7.7±0.02ms     2.0 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1935.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1881.6s
CPU sys 52.9s
Peak spill 0 B

branch

Metric Value
Wall time 2145.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2067.2s
CPU sys 77.4s
Peak spill 0 B

File an issue against this benchmark runner

GKE bench shows string_dictionary regresses ~+90% on the branch even
though `pick_sub_batch_size` should short-circuit instantly when the
encoder's dictionary is still active (single struct-field load + virtual
call into `has_dictionary()`). Local laptop benches don't reproduce the
regression, suggesting it's an architecture-specific
inlining/code-layout effect on the GKE aarch64 runner.

Marking `new` and `pick_sub_batch_size` `#[inline]` to give the compiler
a clear hint that these should fold into `write_batch_internal`'s hot
loop. Local laptop bench is unchanged (~+3% on string_dictionary, ~+5%
on string_and_binary_view, both within noise); pushing to see whether
GKE moves.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4454549120-109-5k645 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (403af94) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.01     13.0±0.03ms    19.2 MB/sec    1.00     12.9±0.03ms    19.3 MB/sec
bool/cdc                                           1.00     15.7±0.05ms    15.9 MB/sec    1.01     15.9±0.05ms    15.7 MB/sec
bool/default                                       1.00     10.9±0.03ms    22.9 MB/sec    1.00     10.8±0.04ms    23.1 MB/sec
bool/parquet_2                                     1.01     14.7±0.05ms    17.0 MB/sec    1.00     14.6±0.04ms    17.1 MB/sec
bool/zstd                                          1.01     11.4±0.04ms    21.9 MB/sec    1.00     11.4±0.03ms    22.0 MB/sec
bool/zstd_parquet_2                                1.01     15.1±0.04ms    16.5 MB/sec    1.00     15.0±0.05ms    16.6 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.8 MB/sec    1.00      7.0±0.03ms    17.8 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.04ms    18.3 MB/sec    1.00      6.8±0.03ms    18.4 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.00      4.3±0.02ms    29.2 MB/sec
bool_non_null/parquet_2                            1.01      9.1±0.03ms    13.8 MB/sec    1.00      9.0±0.03ms    13.9 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.00      4.6±0.02ms    27.0 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.03ms    13.2 MB/sec    1.00      9.4±0.03ms    13.3 MB/sec
float_with_nans/bloom_filter                       1.00     92.9±0.42ms   150.6 MB/sec    1.01     93.9±0.50ms   149.0 MB/sec
float_with_nans/cdc                                1.00     81.6±0.21ms   171.6 MB/sec    1.01     82.6±0.87ms   169.5 MB/sec
float_with_nans/default                            1.00     74.3±0.26ms   188.4 MB/sec    1.01     75.2±0.33ms   186.3 MB/sec
float_with_nans/parquet_2                          1.00     94.5±0.44ms   148.2 MB/sec    1.01     95.2±0.36ms   147.0 MB/sec
float_with_nans/zstd                               1.00    111.9±0.29ms   125.1 MB/sec    1.01    112.6±0.31ms   124.3 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.7±0.36ms   106.3 MB/sec    1.01    132.7±0.37ms   105.5 MB/sec
large_string_non_null/bloom_filter                                                        1.00     81.9±0.17ms     3.1 GB/sec
large_string_non_null/cdc                                                                 1.00    242.1±1.22ms  1057.4 MB/sec
large_string_non_null/default                                                             1.00     62.1±0.16ms     4.0 GB/sec
large_string_non_null/parquet_2                                                           1.00     62.1±0.17ms     4.0 GB/sec
large_string_non_null/zstd                                                                1.00     62.2±0.16ms     4.0 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     62.2±0.15ms     4.0 GB/sec
list_primitive/bloom_filter                        1.00    325.1±0.49ms  1677.5 MB/sec    1.10    358.8±2.20ms  1520.0 MB/sec
list_primitive/cdc                                 1.00    359.1±0.90ms  1518.8 MB/sec    1.03    368.1±7.21ms  1481.6 MB/sec
list_primitive/default                             1.00    246.9±0.42ms     2.2 GB/sec    1.12    275.8±1.79ms  1977.5 MB/sec
list_primitive/parquet_2                           1.00    268.0±0.39ms  2035.1 MB/sec    1.06    285.1±3.32ms  1913.2 MB/sec
list_primitive/zstd                                1.00    498.2±1.76ms  1094.7 MB/sec    1.03    514.5±4.57ms  1060.0 MB/sec
list_primitive/zstd_parquet_2                      1.00    491.6±0.55ms  1109.3 MB/sec    1.01    495.8±1.57ms  1100.0 MB/sec
list_primitive_non_null/bloom_filter               1.00    423.7±4.26ms  1284.5 MB/sec    1.15   487.9±12.33ms  1115.4 MB/sec
list_primitive_non_null/cdc                        1.02    442.1±9.25ms  1231.0 MB/sec    1.00    435.0±6.86ms  1251.0 MB/sec
list_primitive_non_null/default                    1.00    294.2±3.07ms  1849.9 MB/sec    1.26   371.9±16.37ms  1463.5 MB/sec
list_primitive_non_null/parquet_2                  1.00   308.7±13.40ms  1762.7 MB/sec    1.27    392.3±4.23ms  1387.3 MB/sec
list_primitive_non_null/zstd                       1.00    715.2±7.81ms   761.0 MB/sec    1.07    767.5±8.20ms   709.1 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    670.8±0.36ms   811.4 MB/sec    1.12    752.3±1.83ms   723.5 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.0±0.03ms     3.3 GB/sec    1.01     11.1±0.05ms     3.3 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.4±0.07ms  1668.9 MB/sec    1.04     23.2±0.09ms  1608.4 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.8±0.03ms     3.4 GB/sec    1.01     10.8±0.04ms     3.4 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.7±0.02ms     3.4 GB/sec    1.02     10.9±0.05ms     3.4 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.5±0.04ms     2.9 GB/sec    1.02     12.7±0.06ms     2.9 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     10.8±0.03ms     3.4 GB/sec    1.02     11.0±0.11ms     3.3 GB/sec
primitive/bloom_filter                             1.00    147.9±0.53ms   303.4 MB/sec    1.00    148.0±1.28ms   303.3 MB/sec
primitive/cdc                                      1.00    158.5±0.49ms   283.0 MB/sec    1.00    158.5±0.56ms   283.1 MB/sec
primitive/default                                  1.00    117.1±0.21ms   383.1 MB/sec    1.00    116.9±0.22ms   383.9 MB/sec
primitive/parquet_2                                1.00    132.1±0.21ms   339.7 MB/sec    1.00    132.4±0.49ms   339.1 MB/sec
primitive/zstd                                     1.00    146.6±0.21ms   306.1 MB/sec    1.01    147.9±0.77ms   303.4 MB/sec
primitive/zstd_parquet_2                           1.00    165.4±0.37ms   271.3 MB/sec    1.00    165.9±0.42ms   270.5 MB/sec
primitive_all_null/bloom_filter                    1.00     11.5±0.17ms     3.8 GB/sec    1.02     11.8±0.21ms     3.7 GB/sec
primitive_all_null/cdc                             1.05     30.6±0.37ms  1466.2 MB/sec    1.00     29.1±0.30ms  1539.5 MB/sec
primitive_all_null/default                         1.00     10.9±0.14ms     4.0 GB/sec    1.01     11.0±0.16ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     10.9±0.22ms     4.0 GB/sec    1.01     11.0±0.21ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.1±0.18ms     3.9 GB/sec    1.00     11.1±0.12ms     4.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.0±0.26ms     4.0 GB/sec    1.00     11.1±0.15ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.05    111.1±1.31ms   396.1 MB/sec    1.00    105.5±0.46ms   417.1 MB/sec
primitive_non_null/cdc                             1.01     90.9±0.59ms   484.3 MB/sec    1.00     89.8±0.26ms   489.8 MB/sec
primitive_non_null/default                         1.00     67.1±0.20ms   655.6 MB/sec    1.01     67.5±0.30ms   651.9 MB/sec
primitive_non_null/parquet_2                       1.00     89.1±0.38ms   494.1 MB/sec    1.00     88.7±0.23ms   495.9 MB/sec
primitive_non_null/zstd                            1.08    105.3±1.64ms   417.9 MB/sec    1.00     97.7±0.23ms   450.5 MB/sec
primitive_non_null/zstd_parquet_2                  1.06    130.2±1.80ms   338.0 MB/sec    1.00    122.3±0.24ms   359.7 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.01     18.3±0.12ms     2.4 GB/sec    1.00     18.2±0.11ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.04     37.2±0.33ms  1206.6 MB/sec    1.00     35.7±0.29ms  1255.6 MB/sec
primitive_sparse_99pct_null/default                1.00     16.8±0.05ms     2.6 GB/sec    1.00     16.8±0.05ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     16.8±0.07ms     2.6 GB/sec    1.00     16.8±0.04ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.0±0.06ms     2.2 GB/sec    1.00     20.1±0.05ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.00     18.7±0.06ms     2.3 GB/sec    1.01     18.8±0.17ms     2.3 GB/sec
short_string_non_null/bloom_filter                                                        1.00     29.1±0.18ms   412.1 MB/sec
short_string_non_null/cdc                                                                 1.00     20.1±0.06ms   597.6 MB/sec
short_string_non_null/default                                                             1.00     16.3±0.14ms   735.5 MB/sec
short_string_non_null/parquet_2                                                           1.00     26.0±0.18ms   462.1 MB/sec
short_string_non_null/zstd                                                                1.00     36.1±0.18ms   332.1 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.9±0.21ms   415.2 MB/sec
string/bloom_filter                                1.06   227.5±24.81ms     2.3 GB/sec    1.00   215.0±22.79ms     2.4 GB/sec
string/cdc                                         1.00    221.1±5.72ms     2.3 GB/sec    1.00    220.2±9.61ms     2.3 GB/sec
string/default                                     1.17   140.6±24.01ms     3.6 GB/sec    1.00   119.6±11.96ms     4.3 GB/sec
string/parquet_2                                   1.00    125.2±0.37ms     4.1 GB/sec    1.03    129.4±1.07ms     4.0 GB/sec
string/zstd                                        1.00    425.9±2.66ms  1231.0 MB/sec    1.05   446.6±18.59ms  1173.9 MB/sec
string/zstd_parquet_2                              1.00    394.7±1.31ms  1328.1 MB/sec    1.01    400.0±0.55ms  1310.7 MB/sec
string_and_binary_view/bloom_filter                1.00     64.0±0.36ms   503.7 MB/sec    1.05     67.3±0.37ms   479.3 MB/sec
string_and_binary_view/cdc                         1.00     58.5±0.15ms   550.9 MB/sec    1.03     60.6±0.16ms   532.6 MB/sec
string_and_binary_view/default                     1.00     47.8±0.11ms   674.8 MB/sec    1.06     50.9±0.28ms   634.1 MB/sec
string_and_binary_view/parquet_2                   1.00     58.6±0.14ms   550.3 MB/sec    1.05     61.8±0.25ms   522.0 MB/sec
string_and_binary_view/zstd                        1.00     84.5±0.16ms   381.9 MB/sec    1.04     88.0±0.48ms   366.6 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.6±0.13ms   444.2 MB/sec    1.05     76.0±0.45ms   424.6 MB/sec
string_dictionary/bloom_filter                     1.00     89.0±0.65ms     2.9 GB/sec    1.03     91.4±0.43ms     2.8 GB/sec
string_dictionary/cdc                              1.60     84.9±0.64ms     3.0 GB/sec    1.00     53.0±0.47ms     4.9 GB/sec
string_dictionary/default                          1.00     48.4±0.30ms     5.3 GB/sec    1.03     50.0±0.78ms     5.2 GB/sec
string_dictionary/parquet_2                        1.00     53.8±0.21ms     4.8 GB/sec    1.04     55.9±0.13ms     4.6 GB/sec
string_dictionary/zstd                             1.00    209.4±0.94ms  1261.2 MB/sec    1.01    210.5±0.77ms  1254.6 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.1±0.19ms  1333.6 MB/sec    1.01    200.2±0.22ms  1319.2 MB/sec
string_non_null/bloom_filter                       1.00   250.8±14.97ms     2.0 GB/sec    1.03   259.5±13.75ms  2019.3 MB/sec
string_non_null/cdc                                1.00    269.3±9.38ms  1945.8 MB/sec    1.03   278.0±10.27ms  1885.2 MB/sec
string_non_null/default                            1.00   126.1±12.19ms     4.1 GB/sec    1.17   147.8±13.68ms     3.5 GB/sec
string_non_null/parquet_2                          1.00   141.2±11.51ms     3.6 GB/sec    1.12    158.6±0.34ms     3.2 GB/sec
string_non_null/zstd                               1.00    531.3±2.90ms   986.3 MB/sec    1.07   568.1±11.39ms   922.4 MB/sec
string_non_null/zstd_parquet_2                     1.00    507.1±2.20ms  1033.2 MB/sec    1.03    522.6±7.32ms  1002.8 MB/sec
struct_all_null/bloom_filter                       1.00      2.5±0.00ms     6.2 GB/sec    1.00      2.5±0.00ms     6.2 GB/sec
struct_all_null/cdc                                1.00      9.8±0.11ms  1645.7 MB/sec    1.01      9.9±0.05ms  1624.5 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     46.8±0.15ms   341.7 MB/sec    1.01     47.2±0.83ms   339.1 MB/sec
struct_non_null/cdc                                1.00     45.7±0.18ms   350.3 MB/sec    1.01     46.0±0.13ms   347.5 MB/sec
struct_non_null/default                            1.00     32.3±0.11ms   495.8 MB/sec    1.02     32.8±0.35ms   488.2 MB/sec
struct_non_null/parquet_2                          1.00     40.9±0.13ms   391.5 MB/sec    1.00     41.0±0.18ms   390.4 MB/sec
struct_non_null/zstd                               1.00     41.0±0.10ms   390.1 MB/sec    1.00     41.0±0.38ms   390.2 MB/sec
struct_non_null/zstd_parquet_2                     1.00     55.0±0.12ms   291.1 MB/sec    1.01     55.4±0.13ms   288.9 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      7.5±0.03ms     2.1 GB/sec    1.07      8.0±0.04ms  2014.4 MB/sec
struct_sparse_99pct_null/cdc                       1.04     15.4±0.09ms  1048.1 MB/sec    1.00     14.8±0.10ms  1087.6 MB/sec
struct_sparse_99pct_null/default                   1.00      6.9±0.02ms     2.3 GB/sec    1.05      7.3±0.03ms     2.2 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.9±0.02ms     2.3 GB/sec    1.05      7.3±0.03ms     2.2 GB/sec
struct_sparse_99pct_null/zstd                      1.00      8.3±0.02ms  1946.8 MB/sec    1.03      8.6±0.05ms  1881.4 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      7.7±0.02ms     2.0 GB/sec    1.02      7.9±0.07ms  2044.8 MB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1935.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1879.4s
CPU sys 54.5s
Peak spill 0 B

branch

Metric Value
Wall time 2130.5s
Peak memory 6.8 GiB
Avg memory 6.6 GiB
CPU user 2064.4s
CPU sys 63.3s
Peak spill 0 B

File an issue against this benchmark runner

…es are non-null

The chunker's per-chunk `partition_point` (arrow path) or
`LevelDataRef::value_count` (non-arrow path) returns `chunk_size` by
construction whenever the column has no nulls. The GKE bench showed
~+12–27% regressions on `list_primitive_non_null/*` and
`string_non_null/*` consistent with that walk dominating: ~50 K chunks
× a binary search through a 50 M-entry `non_null_indices` buffer means
cold cache reads on every chunk.

Compute a `ValueCountStrategy` once at `write_batch_internal` entry:

- `AllPresent` — set when the arrow caller passed
  `non_null_indices.len() == num_levels`, or when the column has
  `max_def_level == 0`. The chunker uses `chunk_size` directly with no
  per-chunk work.
- `Sorted(&[usize])` — arrow nullable path; binary-search the indices.
- `DefLevelScan(max_def)` — non-arrow nullable path; def-level scan.

For the bench's `list_primitive_non_null` (all-non-null lists with a
50 M-entry leaf), this drops the per-chunk binary search entirely;
expected to bring those rows back near noise.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4455066800-110-8wzwk 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (bb19d3e) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-size-mid-batch
-----                                              ----                                   ---------------------------
bool/bloom_filter                                  1.00     13.1±0.14ms    19.0 MB/sec    1.00     13.1±0.13ms    19.1 MB/sec
bool/cdc                                           1.00     15.6±0.04ms    16.0 MB/sec    1.01     15.7±0.14ms    15.9 MB/sec
bool/default                                       1.00     10.9±0.08ms    22.9 MB/sec    1.00     11.0±0.14ms    22.8 MB/sec
bool/parquet_2                                     1.00     14.8±0.11ms    16.9 MB/sec    1.00     14.9±0.19ms    16.8 MB/sec
bool/zstd                                          1.00     11.4±0.03ms    21.9 MB/sec    1.00     11.4±0.12ms    21.8 MB/sec
bool/zstd_parquet_2                                1.00     15.1±0.04ms    16.6 MB/sec    1.02     15.3±0.19ms    16.3 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.8 MB/sec    1.01      7.1±0.04ms    17.7 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.03ms    18.5 MB/sec    1.03      7.0±0.04ms    18.0 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.3 MB/sec    1.01      4.3±0.03ms    29.1 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.03ms    13.8 MB/sec    1.01      9.1±0.06ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.02ms    27.1 MB/sec    1.01      4.7±0.03ms    26.8 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.03ms    13.2 MB/sec    1.00      9.5±0.05ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.00     92.9±0.38ms   150.8 MB/sec    1.00     92.9±0.36ms   150.6 MB/sec
float_with_nans/cdc                                1.00     81.2±0.18ms   172.4 MB/sec    1.00     81.4±0.21ms   172.0 MB/sec
float_with_nans/default                            1.00     74.0±0.23ms   189.1 MB/sec    1.00     74.2±0.18ms   188.7 MB/sec
float_with_nans/parquet_2                          1.00     94.2±0.36ms   148.6 MB/sec    1.00     94.5±0.34ms   148.2 MB/sec
float_with_nans/zstd                               1.00    111.6±0.20ms   125.4 MB/sec    1.00    111.9±0.26ms   125.1 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.3±0.36ms   106.7 MB/sec    1.00    131.4±0.36ms   106.5 MB/sec
large_string_non_null/bloom_filter                                                        1.00     81.5±0.68ms     3.1 GB/sec
large_string_non_null/cdc                                                                 1.00    242.0±0.97ms  1057.8 MB/sec
large_string_non_null/default                                                             1.00     61.6±0.57ms     4.1 GB/sec
large_string_non_null/parquet_2                                                           1.00     61.6±0.48ms     4.1 GB/sec
large_string_non_null/zstd                                                                1.00     61.2±0.25ms     4.1 GB/sec
large_string_non_null/zstd_parquet_2                                                      1.00     61.0±0.15ms     4.1 GB/sec
list_primitive/bloom_filter                        1.01    326.1±1.20ms  1672.3 MB/sec    1.00    323.9±0.51ms  1683.7 MB/sec
list_primitive/cdc                                 1.00    357.7±0.73ms  1524.8 MB/sec    1.00    357.1±0.71ms  1527.2 MB/sec
list_primitive/default                             1.00    246.2±0.38ms     2.2 GB/sec    1.01    247.5±0.46ms     2.2 GB/sec
list_primitive/parquet_2                           1.00    268.0±0.87ms  2034.7 MB/sec    1.00    269.0±0.55ms  2027.6 MB/sec
list_primitive/zstd                                1.00    496.2±0.43ms  1099.0 MB/sec    1.00    497.7±1.26ms  1095.8 MB/sec
list_primitive/zstd_parquet_2                      1.00    490.9±0.37ms  1111.0 MB/sec    1.01    493.4±0.82ms  1105.4 MB/sec
list_primitive_non_null/bloom_filter               1.00    426.4±3.79ms  1276.3 MB/sec    1.10    466.9±5.03ms  1165.6 MB/sec
list_primitive_non_null/cdc                        1.01    442.5±7.99ms  1230.0 MB/sec    1.00    438.6±6.55ms  1240.8 MB/sec
list_primitive_non_null/default                    1.00    288.0±2.75ms  1889.6 MB/sec    1.15    330.3±2.68ms  1647.5 MB/sec
list_primitive_non_null/parquet_2                  1.00   310.0±13.38ms  1755.7 MB/sec    1.13    349.2±1.57ms  1558.4 MB/sec
list_primitive_non_null/zstd                       1.00    714.7±5.27ms   761.5 MB/sec    1.05    749.0±0.81ms   726.7 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    686.1±1.61ms   793.3 MB/sec    1.08    744.2±1.88ms   731.4 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.4±0.25ms     3.2 GB/sec    1.00     11.3±0.03ms     3.2 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.7±0.13ms  1642.7 MB/sec    1.00     22.7±0.05ms  1642.9 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.9±0.06ms     3.3 GB/sec    1.01     11.0±0.01ms     3.3 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     10.9±0.07ms     3.3 GB/sec    1.01     11.1±0.02ms     3.3 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.7±0.07ms     2.9 GB/sec    1.01     12.9±0.03ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.1±0.11ms     3.3 GB/sec    1.01     11.2±0.01ms     3.3 GB/sec
primitive/bloom_filter                             1.00    149.6±0.77ms   300.0 MB/sec    1.03    153.9±1.59ms   291.6 MB/sec
primitive/cdc                                      1.00    159.5±1.28ms   281.4 MB/sec    1.01    161.4±1.40ms   278.0 MB/sec
primitive/default                                  1.00    117.6±0.20ms   381.6 MB/sec    1.02    120.3±1.09ms   373.0 MB/sec
primitive/parquet_2                                1.00    132.1±0.15ms   339.7 MB/sec    1.02    134.2±0.94ms   334.5 MB/sec
primitive/zstd                                     1.00    146.6±0.15ms   306.1 MB/sec    1.02    149.1±0.87ms   301.0 MB/sec
primitive/zstd_parquet_2                           1.00    165.3±0.20ms   271.4 MB/sec    1.02    167.9±1.20ms   267.3 MB/sec
primitive_all_null/bloom_filter                    1.01     11.6±0.21ms     3.8 GB/sec    1.00     11.5±0.13ms     3.8 GB/sec
primitive_all_null/cdc                             1.00     30.6±0.39ms  1467.5 MB/sec    1.01     30.9±0.45ms  1451.7 MB/sec
primitive_all_null/default                         1.01     11.0±0.18ms     4.0 GB/sec    1.00     10.9±0.11ms     4.0 GB/sec
primitive_all_null/parquet_2                       1.00     11.0±0.26ms     4.0 GB/sec    1.00     10.9±0.22ms     4.0 GB/sec
primitive_all_null/zstd                            1.00     11.1±0.17ms     4.0 GB/sec    1.00     11.1±0.23ms     3.9 GB/sec
primitive_all_null/zstd_parquet_2                  1.00     11.1±0.25ms     4.0 GB/sec    1.00     11.0±0.19ms     4.0 GB/sec
primitive_non_null/bloom_filter                    1.06    114.8±1.69ms   383.4 MB/sec    1.00    108.5±0.65ms   405.4 MB/sec
primitive_non_null/cdc                             1.00     90.0±0.58ms   488.7 MB/sec    1.01     91.2±0.60ms   482.6 MB/sec
primitive_non_null/default                         1.00     67.9±0.32ms   648.5 MB/sec    1.01     68.7±0.43ms   640.8 MB/sec
primitive_non_null/parquet_2                       1.00     90.0±0.45ms   489.1 MB/sec    1.00     90.3±0.38ms   487.3 MB/sec
primitive_non_null/zstd                            1.07    105.9±0.43ms   415.4 MB/sec    1.00     99.4±0.58ms   442.7 MB/sec
primitive_non_null/zstd_parquet_2                  1.05    130.5±1.84ms   337.0 MB/sec    1.00    124.1±0.40ms   354.4 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.03     18.9±0.24ms     2.3 GB/sec    1.00     18.3±0.32ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.02     37.6±0.32ms  1193.4 MB/sec    1.00     36.8±0.24ms  1220.4 MB/sec
primitive_sparse_99pct_null/default                1.02     17.1±0.05ms     2.6 GB/sec    1.00     16.7±0.03ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.02     17.2±0.04ms     2.6 GB/sec    1.00     16.8±0.08ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.02     20.5±0.06ms     2.1 GB/sec    1.00     20.0±0.04ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.02     19.0±0.15ms     2.3 GB/sec    1.00     18.6±0.04ms     2.4 GB/sec
short_string_non_null/bloom_filter                                                        1.00     28.9±0.09ms   414.9 MB/sec
short_string_non_null/cdc                                                                 1.00     20.1±0.13ms   597.7 MB/sec
short_string_non_null/default                                                             1.00     16.0±0.12ms   751.7 MB/sec
short_string_non_null/parquet_2                                                           1.00     25.6±0.09ms   468.2 MB/sec
short_string_non_null/zstd                                                                1.00     35.6±0.11ms   337.5 MB/sec
short_string_non_null/zstd_parquet_2                                                      1.00     28.3±0.08ms   424.4 MB/sec
string/bloom_filter                                1.01   229.5±25.29ms     2.2 GB/sec    1.00   226.2±16.16ms     2.3 GB/sec
string/cdc                                         1.00    220.5±5.84ms     2.3 GB/sec    1.01    222.3±5.79ms     2.3 GB/sec
string/default                                     1.13   142.6±25.26ms     3.6 GB/sec    1.00   126.4±12.51ms     4.1 GB/sec
string/parquet_2                                   1.00    126.2±0.23ms     4.1 GB/sec    1.00    126.4±1.63ms     4.1 GB/sec
string/zstd                                        1.00    424.3±2.75ms  1235.7 MB/sec    1.00    425.3±2.65ms  1232.7 MB/sec
string/zstd_parquet_2                              1.00    393.9±0.25ms  1331.0 MB/sec    1.02    401.7±0.94ms  1305.0 MB/sec
string_and_binary_view/bloom_filter                1.00     64.4±0.21ms   500.9 MB/sec    1.04     66.7±0.29ms   483.4 MB/sec
string_and_binary_view/cdc                         1.00     58.6±0.17ms   550.1 MB/sec    1.03     60.3±0.11ms   534.9 MB/sec
string_and_binary_view/default                     1.00     48.2±0.09ms   668.5 MB/sec    1.06     51.1±0.35ms   630.7 MB/sec
string_and_binary_view/parquet_2                   1.00     59.0±0.11ms   546.7 MB/sec    1.04     61.1±0.16ms   527.4 MB/sec
string_and_binary_view/zstd                        1.00     84.8±0.13ms   380.2 MB/sec    1.03     87.3±0.20ms   369.5 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.8±0.14ms   442.7 MB/sec    1.03     75.2±0.21ms   428.7 MB/sec
string_dictionary/bloom_filter                     1.00     90.1±0.63ms     2.9 GB/sec    1.46    131.6±0.72ms  2007.7 MB/sec
string_dictionary/cdc                              1.00     85.0±0.64ms     3.0 GB/sec    1.13     95.8±2.13ms     2.7 GB/sec
string_dictionary/default                          1.00     48.5±0.29ms     5.3 GB/sec    1.81     88.0±0.72ms     2.9 GB/sec
string_dictionary/parquet_2                        1.00     53.8±0.11ms     4.8 GB/sec    1.84     99.1±0.25ms     2.6 GB/sec
string_dictionary/zstd                             1.00    208.8±0.58ms  1265.3 MB/sec    1.05   219.5±13.99ms  1203.1 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.2±0.14ms  1332.7 MB/sec    1.18    233.2±0.29ms  1132.8 MB/sec
string_non_null/bloom_filter                       1.02   254.9±15.30ms     2.0 GB/sec    1.00   249.0±11.44ms     2.1 GB/sec
string_non_null/cdc                                1.04    269.0±9.43ms  1947.7 MB/sec    1.00    259.2±3.52ms  2021.6 MB/sec
string_non_null/default                            1.00   127.3±12.85ms     4.0 GB/sec    1.08   137.4±13.12ms     3.7 GB/sec
string_non_null/parquet_2                          1.13   141.7±11.77ms     3.6 GB/sec    1.00    125.7±7.22ms     4.1 GB/sec
string_non_null/zstd                               1.00    529.7±1.54ms   989.2 MB/sec    1.00    531.6±1.72ms   985.8 MB/sec
string_non_null/zstd_parquet_2                     1.01    507.4±2.14ms  1032.6 MB/sec    1.00    503.0±1.42ms  1041.7 MB/sec
struct_all_null/bloom_filter                       1.01      2.5±0.01ms     6.2 GB/sec    1.00      2.5±0.00ms     6.2 GB/sec
struct_all_null/cdc                                1.01      9.9±0.15ms  1633.0 MB/sec    1.00      9.8±0.15ms  1642.9 MB/sec
struct_all_null/default                            1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/parquet_2                          1.00      2.3±0.00ms     7.0 GB/sec    1.00      2.3±0.00ms     7.0 GB/sec
struct_all_null/zstd                               1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.8 GB/sec
struct_all_null/zstd_parquet_2                     1.00      2.3±0.00ms     6.9 GB/sec    1.00      2.3±0.00ms     6.9 GB/sec
struct_non_null/bloom_filter                       1.00     46.9±0.20ms   341.2 MB/sec    1.01     47.4±0.19ms   337.8 MB/sec
struct_non_null/cdc                                1.00     45.5±0.19ms   351.9 MB/sec    1.00     45.7±0.17ms   350.3 MB/sec
struct_non_null/default                            1.00     32.1±0.17ms   497.7 MB/sec    1.00     32.3±0.14ms   495.7 MB/sec
struct_non_null/parquet_2                          1.00     40.9±0.48ms   391.3 MB/sec    1.01     41.1±0.14ms   389.2 MB/sec
struct_non_null/zstd                               1.00     40.9±0.11ms   391.3 MB/sec    1.00     41.0±0.11ms   390.4 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.9±0.14ms   291.6 MB/sec    1.00     55.0±0.13ms   291.0 MB/sec
struct_sparse_99pct_null/bloom_filter              1.03      7.6±0.03ms     2.1 GB/sec    1.00      7.4±0.01ms     2.1 GB/sec
struct_sparse_99pct_null/cdc                       1.02     15.7±0.14ms  1027.9 MB/sec    1.00     15.4±0.08ms  1050.1 MB/sec
struct_sparse_99pct_null/default                   1.00      7.0±0.02ms     2.3 GB/sec    1.00      6.9±0.01ms     2.3 GB/sec
struct_sparse_99pct_null/parquet_2                 1.02      7.1±0.06ms     2.2 GB/sec    1.00      6.9±0.01ms     2.3 GB/sec
struct_sparse_99pct_null/zstd                      1.02      8.5±0.05ms  1907.7 MB/sec    1.00      8.3±0.01ms  1952.5 MB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.02      7.8±0.05ms     2.0 GB/sec    1.00      7.7±0.01ms     2.1 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1945.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1885.9s
CPU sys 57.1s
Peak spill 0 B

branch

Metric Value
Wall time 2115.5s
Peak memory 6.9 GiB
Avg memory 6.7 GiB
CPU user 2061.4s
CPU sys 51.7s
Peak spill 0 B

File an issue against this benchmark runner

…th out

The previous `#[inline]` hint was no longer enough once
`pick_sub_batch_size` grew the `ValueCountStrategy` match — LLVM
silently stopped inlining and the most recent GKE bench bounced
`string_dictionary/*` back to +46–81% (`default` +81%, `parquet_2`
+86%, `bloom_filter` +46%).

Fix:

1. Mark `pick_sub_batch_size` `#[inline(always)]`. The hot path is
   just `if static_always_fits || has_dictionary || chunk_size == 0 {
   return chunk_size; }` — one struct-field load + one virtual call —
   so unconditional inlining is the right call, not a heuristic
   suggestion.

2. Pull the byte-budget computation out into a separate
   `byte_budget_sub_batch_size` method marked `#[inline(never)]`. This
   keeps the inlined fast path small even as the slow path grows; the
   slow path is paid for explicitly when bypasses don't fire, not
   smuggled into every chunk's inline body.

Same behavior, just compiler-friendlier code layout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4455497827-111-r8kgn 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-size-mid-batch (9d647dc) to 48fa8a7 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants