Skip to content

Pluggable page spilling for the Parquet ArrowWriter (PageStore)#10020

Open
adriangb wants to merge 5 commits into
apache:mainfrom
pydantic:parquet-page-spill
Open

Pluggable page spilling for the Parquet ArrowWriter (PageStore)#10020
adriangb wants to merge 5 commits into
apache:mainfrom
pydantic:parquet-page-spill

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented May 26, 2026

Problem description

We currently buffer entire row groups in memory. From our own docs:

The nature of Parquet requires buffering of an entire row group before it can be flushed to the underlying writer.

For our production workload where we have ~400 columns with large data skews (some much larger than others) this causes >=12GBs of memory consumed just to write Parquet.

When ArrowWriter writes a row group, record batches arrive with all columns
interleaved, but each Parquet column chunk must be contiguous on disk. So every
column's compressed pages are buffered for the whole row group and only
spliced into the output at flush. Peak ArrowWriter memory is therefore
≈ Σ(compressed bytes of every column chunk) for one row group, and it grows with
the row group size.

Today the only lever against this is [ArrowWriter::in_progress_size](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html#method.in_progress_size)() + flushing
smaller row groups — which trades away compression and read-time (page/row-group)
pruning. This has negative consequences for encoding efficiency, read performance, etc. Parquet already has pages, we don't need one column to force the layout of another. Ideally what we'd want in a case like this is a large (lets just say 1M row) row group with ~4 1MB pages for the id: i32 column and N ~1MB pages for the large_text column. Reading the small id column has no data fragmentation penalty, no page index bloat penalty, etc.

Related issues

Some of the issues I could find

Proposed solution

Introduce a trait for pluggable buffering. In particular we would like to implement spilling (spill buffered completed pages to disk). If this works well it can be upstreamed / made easily configurable and usable for all arrow users. I am not adding an implementation here to avoid discussing those APIs (is it a temp dir, how does it get configured, etc.).

What changes are included in this PR?

A small, intentionally "dumb" key/value store trait and its wiring, in four
stacked commits:

  1. PageStore + PageKey + PageStoreFactory + InMemoryPageStore, wired
    into ArrowWriter.
    The store maps an opaque, store-allocated PageKey to a
    blob of bytes and knows nothing about pages, dictionaries, ordering, or
    offsets — the caller keeps the handles and decides what they mean. The
    default InMemoryPageStore (a Vec<Bytes>) is byte-for-byte equivalent to
    the previous buffering with zero overhead. A PageStoreFactory is threaded
    through ArrowWriterOptions::with_page_store_factory
    ArrowRowGroupWriterFactoryArrowColumnWriterFactory.
  2. Stream column chunks out of the store at splice. Replaces the
    materialize-then-copy splice with a Read that takes each page blob back out
    of the store in write order as it is consumed and releases it immediately,
    so the splice never holds more than one page in memory at a time (essential
    for a spilling backend on skewed schemas). append_column is unchanged for
    external ChunkReader callers.
  3. Public PageKey::new/get (so external backends can mint their own
    handles) + a dhat memory regression test with a temp-file backend.
  4. Spill dictionary-column data pages too. Dictionary-encoded columns
    buffered every completed data page in GenericColumnWriter.data_pages until
    close() (the dictionary page must be written first but isn't final until all
    values are seen), so those pages never reached the store. A new
    PageWriter::defers_dictionary_ordering() lets a writer that buffers the
    whole chunk and splices later (the Arrow path) accept data pages before the
    dictionary page and order them itself; the column writer then streams
    dictionary-column data pages straight through. ArrowPageWriter holds the
    (bounded, ≤ dict_page_size_limit) dictionary page in memory — it now arrives
    last — and emits it first at splice, where the production-order page offsets
    are rewritten to the dictionary-first layout. The column-at-a-time
    SerializedFileWriter path is unchanged (it commits bytes live and still
    buffers, which is inherent there). This commit also fixes memory_size() to
    report bytes the writer actually holds resident (via
    PageStore::memory_size / PageWriter::buffered_memory_size) rather than
    bytes written, so it drops to ~0 once pages are spilled off-heap.

Are these changes tested?

Yes:

  • A byte-identical round-trip test using a custom PageStore with sparse,
    non-contiguous, HashMap-backed handles, proving the writer relies only on
    the opaque-handle contract across dictionary and non-dictionary columns and
    multiple row groups.

  • A dictionary round-trip test with the offset index disabled, covering the
    path where only the chunk-level dictionary/data page offsets are rewritten.

  • Unit tests for the in-memory backend contract and its resident-byte reporting.

  • An always-on dhat integration test (parquet/tests/page_spill_memory.rs)
    measuring peak heap, for both a skewed wide row group (~16 MiB) and a
    low-cardinality dictionary column (~4.2M rows):

    scenario in-memory store temp-file spill
    skewed ~16 MiB row group ~18.3 MiB ~4.2 MiB
    dictionary column, 4.2M rows ~2.69 MiB ~0.48 MiB

    i.e. the spilling backend bounds peak write memory by the in-flight
    encoder/dictionary buffers rather than the row group size, for both the page
    buffer and the dictionary-column data pages.

Are there any user-facing changes?

New, additive public API (default behavior unchanged):

  • ArrowWriterOptions::with_page_store_factory
  • PageStore, PageKey, PageStoreFactory, InMemoryPageStore,
    InMemoryPageStoreFactory (re-exported from parquet::arrow::arrow_writer,
    defined in parquet::column::page_store).
  • New defaulted PageWriter trait methods defers_dictionary_ordering() and
    buffered_memory_size() (both default to the previous behavior), and a
    defaulted PageStore::memory_size().

Not covered (by design)

  • The column-at-a-time SerializedFileWriter path still buffers
    dictionary-column data pages: it commits bytes to the file live, so the
    dictionary-first ordering must be resolved during encoding. That path already
    has minimal memory otherwise.
  • The in-flight encoder buffer and dictionary themselves stay resident
    (already bounded by the page/dict size limits), as do bloom filters.

🤖 Generated with Claude Code

adriangb and others added 3 commits May 26, 2026 10:13
Introduce a "dumb" key/value page store that the ArrowWriter uses to
buffer completed, serialized pages while a row group is being written.
The store maps an opaque, store-allocated PageKey to a blob of bytes and
knows nothing about pages, dictionaries, ordering, or offsets — the
caller keeps the handles and decides what they mean.

The default InMemoryPageStore keeps blobs in a Vec<Bytes>, byte-for-byte
equivalent to the previous buffering with zero overhead. A PageStoreFactory
is threaded through ArrowWriterOptions -> ArrowRowGroupWriterFactory ->
ArrowColumnWriterFactory so users can plug in a backend (temp file, object
storage) to bound peak write memory independently of row group size.

ArrowColumnChunkData now holds (store, keys) and materializes blobs in
write order at splice time, preserving the existing append_column path.

Tests:
- column::page_store unit tests for the in-memory backend contract.
- A byte-identical round-trip test using a custom HashMap-backed store
  with sparse, non-contiguous handles, proving the writer relies only on
  the opaque-handle contract.
- An always-on dhat integration test capturing the in-memory peak-heap
  baseline (memory grows with the row group), against which a spilling
  backend will be measured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the splice materialized an entire column chunk back into a
Vec<Bytes> before copying it into the output file, so peak memory during
the splice phase was bounded by the largest column chunk — defeating a
spilling backend for skewed schemas.

Replace the materialize-then-copy path with StreamingColumnChunkReader, a
Read that takes each page blob back out of the store in write order as it
is consumed and releases it immediately, so the splice holds at most one
page in memory at a time. SerializedRowGroupWriter::append_column is
refactored to delegate to a new append_column_from_read that consumes an
owned Read (append_column itself is unchanged for external ChunkReader
callers).

For the default in-memory store this is behavior-preserving (it already
holds the bytes); for a spilling store it keeps the splice within the
memory bound.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pilling backend

PageKey's field was private, so an external PageStore implementor could
not mint the handle it must return from put() — the trait was unusable
outside the crate. Add public PageKey::new/get so any backend can allocate
its own opaque, dense handles.

Extend the dhat integration test with a temp-file PageStore backend (one
unlinked temp file per column chunk; put appends, take seeks+reads) and
assert the headline invariant: writing a skewed ~16 MiB single row group,
peak heap drops from ~18 MiB with the in-memory store to ~3 MiB with the
spilling store — bounded by the in-flight encoder/dictionary buffers
rather than the row group size.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the parquet Changes to the parquet crate label May 26, 2026
adriangb and others added 2 commits May 26, 2026 11:39
…oint #2)

Dictionary-encoded columns buffered every completed data page in
GenericColumnWriter.data_pages until close(), because the dictionary page
must be written first but isn't final until all values are seen. Those
pages never reached the PageStore, so spilling couldn't bound them — a
low-cardinality 4.2M-row column peaked ~2.5 MiB regardless of backend.

Add PageWriter::defers_dictionary_ordering(): a writer that buffers the
whole chunk and splices it later (the Arrow path) can accept data pages
before the dictionary page and order them itself. When set, the column
writer streams dictionary-column data pages straight through instead of
buffering them. ArrowPageWriter returns true, holds the (bounded)
dictionary page in memory since it now arrives last, and at splice emits
it first; the buffer-relative page offsets recorded in production order
are rewritten to the dictionary-first layout there. The column-at-a-time
SerializedFileWriter path is unchanged (defaults to false).

Also fix memory_size() accounting: instead of counting bytes written
(which over-reports once pages are spilled off-heap), ask the page writer
how much it actually holds resident via PageWriter::buffered_memory_size()
and PageStore::memory_size(). For the in-memory store this is unchanged;
for a spilling store it drops to ~0 plus the retained dictionary page.

Result: the dict-column case drops from ~2.69 MiB to ~0.48 MiB peak heap
with a spilling backend. Adds an offset-index-disabled dictionary
round-trip test and store memory-size unit tests; extends the dhat test
with the dictionary-column scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test uses parquet::arrow, so without a required-features entry it was
auto-discovered and compiled under --all-targets --no-default-features,
breaking that CI compilation check. Mirror the other arrow integration
tests with required-features = ["arrow"].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark writer_overhead

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark parquet_round_trip

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4546798203-334-v4p7f 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-spill (36db7ea) to fd1c5b3 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4546801136-335-vh6m4 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-spill (36db7ea) to fd1c5b3 (merge-base) diff
BENCH_NAME=writer_overhead
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench writer_overhead
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4546802794-336-plwpf 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-spill (36db7ea) to fd1c5b3 (merge-base) diff
BENCH_NAME=parquet_round_trip
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench parquet_round_trip
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                         main                                   parquet-page-spill
-----                         ----                                   ------------------
writer_overhead/10000_cols    1.00     37.3±0.30ms        ? ?/sec    1.01     37.5±0.40ms        ? ?/sec
writer_overhead/1000_cols     1.00      3.5±0.01ms        ? ?/sec    1.01      3.5±0.02ms        ? ?/sec
writer_overhead/5000_cols     1.00     19.7±0.12ms        ? ?/sec    1.02     20.0±0.14ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 35.0s
Peak memory 4.3 GiB
Avg memory 4.2 GiB
CPU user 32.2s
CPU sys 1.6s
Peak spill 0 B

branch

Metric Value
Wall time 35.0s
Peak memory 4.3 GiB
Avg memory 4.3 GiB
CPU user 32.1s
CPU sys 1.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                     main                                   parquet-page-spill
-----                                     ----                                   ------------------
read Binary(100) delta_byte_array         1.01      8.0±0.04ms        ? ?/sec    1.00      8.0±0.04ms        ? ?/sec
read Binary(100) delta_length             1.01      2.8±0.04ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
read Binary(100) dict                     1.00      3.9±0.01ms        ? ?/sec    1.00      3.9±0.02ms        ? ?/sec
read Binary(100) plain                    1.00      3.8±0.04ms        ? ?/sec    1.01      3.8±0.03ms        ? ?/sec
read Binary(20) delta_byte_array          1.00      4.8±0.01ms        ? ?/sec    1.01      4.8±0.01ms        ? ?/sec
read Binary(20) delta_length              1.00   1831.7±3.83µs        ? ?/sec    1.00   1827.9±6.76µs        ? ?/sec
read Binary(20) dict                      1.00      2.6±0.01ms        ? ?/sec    1.02      2.6±0.01ms        ? ?/sec
read Binary(20) plain                     1.02      2.9±0.08ms        ? ?/sec    1.00      2.8±0.06ms        ? ?/sec
read Fixed(16) byte_stream_split          1.00      5.4±0.00ms        ? ?/sec    1.00      5.4±0.00ms        ? ?/sec
read Fixed(16) delta_byte_array           1.00      3.3±0.01ms        ? ?/sec    1.02      3.4±0.01ms        ? ?/sec
read Fixed(16) dict                       1.00    605.5±1.51µs        ? ?/sec    1.00    606.1±1.53µs        ? ?/sec
read Fixed(16) plain                      1.00    603.5±1.46µs        ? ?/sec    1.00    605.0±1.59µs        ? ?/sec
read Fixed(2) byte_stream_split           1.00    933.3±1.37µs        ? ?/sec    1.00    936.2±2.38µs        ? ?/sec
read Fixed(2) delta_byte_array            1.00      3.2±0.01ms        ? ?/sec    1.00      3.2±0.01ms        ? ?/sec
read Fixed(2) dict                        1.00    437.1±1.17µs        ? ?/sec    1.00    437.8±1.73µs        ? ?/sec
read Fixed(2) plain                       1.01    444.4±1.67µs        ? ?/sec    1.00    439.7±1.62µs        ? ?/sec
read String(100) delta_byte_array         1.01      8.8±0.05ms        ? ?/sec    1.00      8.8±0.05ms        ? ?/sec
read String(100) delta_length             1.00      3.5±0.07ms        ? ?/sec    1.02      3.5±0.08ms        ? ?/sec
read String(100) dict                     1.03      3.9±0.01ms        ? ?/sec    1.00      3.8±0.01ms        ? ?/sec
read String(100) plain                    1.00      4.4±0.05ms        ? ?/sec    1.00      4.4±0.07ms        ? ?/sec
read String(20) delta_byte_array          1.00      5.2±0.01ms        ? ?/sec    1.00      5.2±0.02ms        ? ?/sec
read String(20) delta_length              1.01   1969.7±6.85µs        ? ?/sec    1.00   1951.0±5.25µs        ? ?/sec
read String(20) dict                      1.00      2.6±0.00ms        ? ?/sec    1.00      2.6±0.00ms        ? ?/sec
read String(20) plain                     1.00      3.0±0.08ms        ? ?/sec    1.00      3.0±0.07ms        ? ?/sec
read StringView(100) delta_byte_array     1.01      8.1±0.09ms        ? ?/sec    1.00      8.1±0.05ms        ? ?/sec
read StringView(100) delta_length         1.00      3.5±0.06ms        ? ?/sec    1.00      3.5±0.08ms        ? ?/sec
read StringView(100) dict                 1.00    997.4±2.13µs        ? ?/sec    1.00    993.4±2.85µs        ? ?/sec
read StringView(100) plain                1.00      3.1±0.07ms        ? ?/sec    1.01      3.2±0.07ms        ? ?/sec
read StringView(20) delta_byte_array      1.06      5.2±0.03ms        ? ?/sec    1.00      4.9±0.01ms        ? ?/sec
read StringView(20) delta_length          1.00      2.5±0.00ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
read StringView(20) dict                  1.00    984.3±1.46µs        ? ?/sec    1.00    985.9±2.11µs        ? ?/sec
read StringView(20) plain                 1.02  1640.3±11.32µs        ? ?/sec    1.00   1605.2±6.95µs        ? ?/sec
read f32 byte_stream_split                1.00   1020.4±2.66µs        ? ?/sec    1.00   1019.9±3.16µs        ? ?/sec
read f32 dict                             1.00   1454.0±2.18µs        ? ?/sec    1.00   1453.8±4.73µs        ? ?/sec
read f32 plain                            1.00    953.3±1.68µs        ? ?/sec    1.00    951.1±3.13µs        ? ?/sec
read f64 byte_stream_split                1.00   1520.2±2.09µs        ? ?/sec    1.00   1526.6±2.97µs        ? ?/sec
read f64 dict                             1.00   1480.0±2.49µs        ? ?/sec    1.00   1479.4±4.24µs        ? ?/sec
read f64 plain                            1.00   1066.6±2.13µs        ? ?/sec    1.01   1082.5±4.58µs        ? ?/sec
read int32 byte_stream_split              1.00   1077.2±1.64µs        ? ?/sec    1.00   1082.2±8.32µs        ? ?/sec
read int32 delta_binary                   1.00      2.1±0.01ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
read int32 dict                           1.00   1517.4±2.41µs        ? ?/sec    1.00   1523.5±6.06µs        ? ?/sec
read int32 plain                          1.00   1005.2±1.68µs        ? ?/sec    1.00   1009.2±6.22µs        ? ?/sec
read int64 byte_stream_split              1.00   1583.9±5.15µs        ? ?/sec    1.00   1583.4±3.00µs        ? ?/sec
read int64 delta_binary                   1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
read int64 dict                           1.00   1539.3±2.07µs        ? ?/sec    1.01   1555.4±5.77µs        ? ?/sec
read int64 plain                          1.00   1125.2±2.71µs        ? ?/sec    1.01   1136.8±5.21µs        ? ?/sec
write Binary(100) delta_byte_array        1.01     27.6±1.04ms        ? ?/sec    1.00     27.4±0.79ms        ? ?/sec
write Binary(100) delta_length            1.00     22.8±0.33ms        ? ?/sec    1.02     23.3±0.87ms        ? ?/sec
write Binary(100) dict                    1.00     17.6±0.05ms        ? ?/sec    1.00     17.7±0.04ms        ? ?/sec
write Binary(100) plain                   1.00     12.5±0.03ms        ? ?/sec    1.00     12.4±0.06ms        ? ?/sec
write Binary(20) delta_byte_array         1.00     12.6±0.04ms        ? ?/sec    1.01     12.8±0.04ms        ? ?/sec
write Binary(20) delta_length             1.00      9.1±0.03ms        ? ?/sec    1.00      9.1±0.04ms        ? ?/sec
write Binary(20) dict                     1.00     13.2±0.03ms        ? ?/sec    1.00     13.2±0.04ms        ? ?/sec
write Binary(20) plain                    1.00      8.1±0.03ms        ? ?/sec    1.00      8.1±0.04ms        ? ?/sec
write Fixed(16) byte_stream_split         1.02     27.4±0.16ms        ? ?/sec    1.00     26.9±0.20ms        ? ?/sec
write Fixed(16) delta_byte_array          1.00     64.5±0.18ms        ? ?/sec    1.01     65.0±0.24ms        ? ?/sec
write Fixed(16) dict                      1.01     22.8±0.14ms        ? ?/sec    1.00     22.5±0.17ms        ? ?/sec
write Fixed(16) plain                     1.01     22.7±0.13ms        ? ?/sec    1.00     22.5±0.16ms        ? ?/sec
write Fixed(2) byte_stream_split          1.00     21.3±0.06ms        ? ?/sec    1.02     21.6±0.08ms        ? ?/sec
write Fixed(2) delta_byte_array           1.00     63.2±0.16ms        ? ?/sec    1.01     64.0±0.25ms        ? ?/sec
write Fixed(2) dict                       1.00     22.5±0.09ms        ? ?/sec    1.00     22.5±0.06ms        ? ?/sec
write Fixed(2) plain                      1.00     22.4±0.08ms        ? ?/sec    1.01     22.5±0.07ms        ? ?/sec
write String(100) delta_byte_array        1.00     18.4±0.79ms        ? ?/sec    1.46     27.0±0.91ms        ? ?/sec
write String(100) delta_length            1.00     13.3±0.02ms        ? ?/sec    1.76     23.3±0.86ms        ? ?/sec
write String(100) dict                    1.00     17.7±0.06ms        ? ?/sec    1.00     17.7±0.04ms        ? ?/sec
write String(100) plain                   1.00     12.4±0.02ms        ? ?/sec    1.00     12.4±0.04ms        ? ?/sec
write String(20) delta_byte_array         1.00     12.9±0.03ms        ? ?/sec    1.01     12.9±0.03ms        ? ?/sec
write String(20) delta_length             1.03      9.4±0.03ms        ? ?/sec    1.00      9.1±0.02ms        ? ?/sec
write String(20) dict                     1.00     13.2±0.03ms        ? ?/sec    1.01     13.2±0.02ms        ? ?/sec
write String(20) plain                    1.00      8.2±0.02ms        ? ?/sec    1.00      8.1±0.02ms        ? ?/sec
write StringView(100) delta_byte_array    1.00     17.5±0.03ms        ? ?/sec    1.58     27.6±0.88ms        ? ?/sec
write StringView(100) delta_length        1.00     13.7±0.03ms        ? ?/sec    1.77     24.1±0.69ms        ? ?/sec
write StringView(100) dict                1.00     18.5±0.03ms        ? ?/sec    1.00     18.6±0.05ms        ? ?/sec
write StringView(100) plain               1.00     12.8±0.02ms        ? ?/sec    1.00     12.7±0.04ms        ? ?/sec
write StringView(20) delta_byte_array     1.00     12.8±0.03ms        ? ?/sec    1.01     12.9±0.05ms        ? ?/sec
write StringView(20) delta_length         1.02      9.4±0.03ms        ? ?/sec    1.00      9.1±0.05ms        ? ?/sec
write StringView(20) dict                 1.00     13.4±0.02ms        ? ?/sec    1.01     13.5±0.04ms        ? ?/sec
write StringView(20) plain                1.00      8.0±0.02ms        ? ?/sec    1.00      8.1±0.04ms        ? ?/sec
write f32 byte_stream_split               1.00      5.0±0.04ms        ? ?/sec    1.03      5.2±0.03ms        ? ?/sec
write f32 dict                            1.01     16.0±0.03ms        ? ?/sec    1.00     15.8±0.35ms        ? ?/sec
write f32 plain                           1.00      5.1±0.03ms        ? ?/sec    1.03      5.2±0.03ms        ? ?/sec
write f64 byte_stream_split               1.00      6.7±0.03ms        ? ?/sec    1.02      6.9±0.02ms        ? ?/sec
write f64 dict                            1.02     16.0±0.03ms        ? ?/sec    1.00     15.7±0.30ms        ? ?/sec
write f64 plain                           1.00      5.9±0.03ms        ? ?/sec    1.01      6.0±0.02ms        ? ?/sec
write int32 byte_stream_split             1.00      5.3±0.02ms        ? ?/sec    1.01      5.4±0.03ms        ? ?/sec
write int32 delta_binary                  1.00     10.7±0.03ms        ? ?/sec    1.01     10.8±0.06ms        ? ?/sec
write int32 dict                          1.02     17.0±0.49ms        ? ?/sec    1.00     16.5±0.30ms        ? ?/sec
write int32 plain                         1.00      5.5±0.02ms        ? ?/sec    1.01      5.5±0.03ms        ? ?/sec
write int64 byte_stream_split             1.00      6.9±0.03ms        ? ?/sec    1.02      7.0±0.03ms        ? ?/sec
write int64 delta_binary                  1.00     11.9±0.06ms        ? ?/sec    1.01     12.1±0.06ms        ? ?/sec
write int64 dict                          1.02     16.4±0.07ms        ? ?/sec    1.00     16.1±0.31ms        ? ?/sec
write int64 plain                         1.00      6.1±0.03ms        ? ?/sec    1.01      6.2±0.04ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 940.2s
Peak memory 8.9 GiB
Avg memory 5.5 GiB
CPU user 903.8s
CPU sys 33.1s
Peak spill 0 B

branch

Metric Value
Wall time 940.2s
Peak memory 8.9 GiB
Avg memory 5.3 GiB
CPU user 897.6s
CPU sys 41.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb adriangb marked this pull request as ready for review May 26, 2026 18:14
@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark parquet_round_trip

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4547278197-337-j8vmb 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-spill (36db7ea) to fd1c5b3 (merge-base) diff
BENCH_NAME=parquet_round_trip
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench parquet_round_trip
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-spill
-----                                              ----                                   ------------------
bool/bloom_filter                                  1.00     13.1±0.08ms    19.0 MB/sec    1.01     13.2±0.07ms    18.9 MB/sec
bool/cdc                                           1.00     15.8±0.09ms    15.8 MB/sec    1.00     15.8±0.07ms    15.8 MB/sec
bool/default                                       1.00     11.0±0.06ms    22.7 MB/sec    1.00     11.0±0.08ms    22.7 MB/sec
bool/parquet_2                                     1.00     14.8±0.07ms    16.9 MB/sec    1.00     14.9±0.07ms    16.8 MB/sec
bool/zstd                                          1.00     11.6±0.07ms    21.6 MB/sec    1.00     11.6±0.07ms    21.6 MB/sec
bool/zstd_parquet_2                                1.00     15.2±0.08ms    16.4 MB/sec    1.00     15.2±0.07ms    16.4 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.8 MB/sec    1.01      7.1±0.02ms    17.7 MB/sec
bool_non_null/cdc                                  1.00      6.8±0.02ms    18.3 MB/sec    1.01      6.9±0.03ms    18.2 MB/sec
bool_non_null/default                              1.00      4.3±0.03ms    29.1 MB/sec    1.01      4.3±0.02ms    28.8 MB/sec
bool_non_null/parquet_2                            1.00      9.1±0.04ms    13.8 MB/sec    1.00      9.1±0.04ms    13.8 MB/sec
bool_non_null/zstd                                 1.00      4.7±0.03ms    26.9 MB/sec    1.01      4.7±0.02ms    26.7 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.03ms    13.2 MB/sec    1.00      9.5±0.03ms    13.2 MB/sec
float_with_nans/bloom_filter                       1.01     93.3±0.33ms   150.0 MB/sec    1.00     92.7±0.25ms   151.1 MB/sec
float_with_nans/cdc                                1.00     82.0±0.20ms   170.8 MB/sec    1.00     81.8±0.48ms   171.2 MB/sec
float_with_nans/default                            1.00     74.7±0.20ms   187.5 MB/sec    1.00     74.5±0.16ms   188.0 MB/sec
float_with_nans/parquet_2                          1.00     95.0±0.38ms   147.3 MB/sec    1.00     94.6±0.32ms   148.1 MB/sec
float_with_nans/zstd                               1.00    112.3±0.26ms   124.7 MB/sec    1.00    112.1±0.20ms   124.9 MB/sec
float_with_nans/zstd_parquet_2                     1.00    132.3±0.39ms   105.8 MB/sec    1.00    131.8±0.32ms   106.2 MB/sec
list_primitive/bloom_filter                        1.00    332.6±1.35ms  1639.8 MB/sec    1.02    338.0±7.18ms  1613.3 MB/sec
list_primitive/cdc                                 1.00    365.4±1.54ms  1492.6 MB/sec    1.00    367.1±2.08ms  1485.4 MB/sec
list_primitive/default                             1.00    254.7±2.00ms     2.1 GB/sec    1.03    263.2±7.53ms     2.0 GB/sec
list_primitive/parquet_2                           1.00    275.3±0.62ms  1981.3 MB/sec    1.03    284.7±7.10ms  1915.7 MB/sec
list_primitive/zstd                                1.00    505.6±2.43ms  1078.7 MB/sec    1.02    514.5±7.38ms  1060.1 MB/sec
list_primitive/zstd_parquet_2                      1.00    498.2±1.33ms  1094.7 MB/sec    1.02    508.0±7.50ms  1073.6 MB/sec
list_primitive_non_null/bloom_filter               1.00    399.2±4.80ms  1363.3 MB/sec    1.08    431.4±5.72ms  1261.5 MB/sec
list_primitive_non_null/cdc                        1.01    441.7±7.11ms  1232.1 MB/sec    1.00    437.1±9.33ms  1245.0 MB/sec
list_primitive_non_null/default                    1.00    266.3±3.96ms  2043.9 MB/sec    1.11    296.7±7.64ms  1834.4 MB/sec
list_primitive_non_null/parquet_2                  1.00    293.2±0.80ms  1856.4 MB/sec    1.12    327.4±0.76ms  1662.6 MB/sec
list_primitive_non_null/zstd                       1.00    687.6±3.94ms   791.6 MB/sec    1.03   708.3±11.37ms   768.4 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    669.1±2.04ms   813.4 MB/sec    1.00    669.8±0.88ms   812.5 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.4±0.07ms     3.2 GB/sec    1.03     11.8±0.12ms     3.1 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     23.0±0.06ms  1626.4 MB/sec    1.01     23.3±0.06ms  1607.1 MB/sec
list_primitive_sparse_99pct_null/default           1.00     11.0±0.05ms     3.3 GB/sec    1.03     11.4±0.13ms     3.2 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.1±0.07ms     3.3 GB/sec    1.04     11.5±0.13ms     3.2 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.9±0.07ms     2.8 GB/sec    1.04     13.3±0.11ms     2.7 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.2±0.06ms     3.3 GB/sec    1.04     11.6±0.14ms     3.1 GB/sec
primitive/bloom_filter                             1.00    151.0±0.48ms   297.2 MB/sec    1.00    151.5±0.47ms   296.2 MB/sec
primitive/cdc                                      1.01    160.8±0.57ms   279.1 MB/sec    1.00    159.8±0.60ms   280.9 MB/sec
primitive/default                                  1.00    119.6±0.42ms   375.2 MB/sec    1.00    119.5±0.40ms   375.4 MB/sec
primitive/parquet_2                                1.00    134.4±0.48ms   333.8 MB/sec    1.00    134.5±0.47ms   333.7 MB/sec
primitive/zstd                                     1.00    149.1±0.45ms   300.9 MB/sec    1.00    149.3±0.37ms   300.7 MB/sec
primitive/zstd_parquet_2                           1.00    167.8±0.49ms   267.4 MB/sec    1.00    168.0±0.32ms   267.1 MB/sec
primitive_all_null/bloom_filter                    1.00    901.0±2.43µs    48.6 GB/sec    1.00    902.0±5.25µs    48.6 GB/sec
primitive_all_null/cdc                             1.03     19.5±0.22ms     2.2 GB/sec    1.00     18.9±0.37ms     2.3 GB/sec
primitive_all_null/default                         1.00    276.9±0.79µs   158.2 GB/sec    1.01    280.7±1.35µs   156.1 GB/sec
primitive_all_null/parquet_2                       1.00    278.7±1.22µs   157.3 GB/sec    1.03    286.9±1.60µs   152.8 GB/sec
primitive_all_null/zstd                            1.00    389.2±0.87µs   112.6 GB/sec    1.03    399.0±1.06µs   109.8 GB/sec
primitive_all_null/zstd_parquet_2                  1.00    355.5±1.25µs   123.3 GB/sec    1.03    366.9±1.26µs   119.4 GB/sec
primitive_non_null/bloom_filter                    1.00    106.7±0.24ms   412.6 MB/sec    1.02    108.9±0.59ms   404.0 MB/sec
primitive_non_null/cdc                             1.00     90.6±0.49ms   485.9 MB/sec    1.00     90.5±0.41ms   486.4 MB/sec
primitive_non_null/default                         1.00     67.7±0.19ms   650.0 MB/sec    1.01     68.4±0.35ms   643.4 MB/sec
primitive_non_null/parquet_2                       1.00     89.1±0.24ms   493.6 MB/sec    1.01     89.9±0.18ms   489.4 MB/sec
primitive_non_null/zstd                            1.06    104.6±0.93ms   420.6 MB/sec    1.00     98.9±0.22ms   445.1 MB/sec
primitive_non_null/zstd_parquet_2                  1.04    128.8±2.49ms   341.6 MB/sec    1.00    123.6±0.16ms   356.0 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.00     18.8±0.10ms     2.3 GB/sec    1.01     18.9±0.10ms     2.3 GB/sec
primitive_sparse_99pct_null/cdc                    1.02     37.4±0.20ms  1199.7 MB/sec    1.00     36.8±0.29ms  1218.9 MB/sec
primitive_sparse_99pct_null/default                1.01     17.1±0.07ms     2.6 GB/sec    1.00     17.0±0.06ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.00     17.0±0.07ms     2.6 GB/sec    1.00     17.0±0.05ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.00     20.4±0.06ms     2.2 GB/sec    1.00     20.4±0.06ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     19.0±0.06ms     2.3 GB/sec    1.00     18.9±0.06ms     2.3 GB/sec
string/bloom_filter                                1.02   220.1±20.49ms     2.3 GB/sec    1.00   215.0±19.09ms     2.4 GB/sec
string/cdc                                         1.01    222.3±4.82ms     2.3 GB/sec    1.00    220.8±6.13ms     2.3 GB/sec
string/default                                     1.02   128.8±20.89ms     4.0 GB/sec    1.00   126.6±17.61ms     4.0 GB/sec
string/parquet_2                                   1.00    111.8±6.27ms     4.6 GB/sec    1.11    123.9±0.94ms     4.1 GB/sec
string/zstd                                        1.00    417.7±1.87ms  1255.1 MB/sec    1.04   434.3±15.80ms  1207.1 MB/sec
string/zstd_parquet_2                              1.00    402.1±6.49ms  1303.8 MB/sec    1.00    403.3±3.73ms  1299.9 MB/sec
string_and_binary_view/bloom_filter                1.02     64.9±0.26ms   496.9 MB/sec    1.00     63.9±0.28ms   504.5 MB/sec
string_and_binary_view/cdc                         1.01     58.7±0.20ms   549.6 MB/sec    1.00     58.2±0.18ms   553.9 MB/sec
string_and_binary_view/default                     1.01     48.5±0.20ms   664.9 MB/sec    1.00     48.2±0.15ms   668.9 MB/sec
string_and_binary_view/parquet_2                   1.00     59.2±0.20ms   545.1 MB/sec    1.00     59.2±0.20ms   545.0 MB/sec
string_and_binary_view/zstd                        1.00     85.0±0.20ms   379.4 MB/sec    1.00     84.6±0.21ms   381.2 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     73.0±0.20ms   441.5 MB/sec    1.00     73.0±0.20ms   441.9 MB/sec
string_dictionary/bloom_filter                     1.03     91.9±1.14ms     2.8 GB/sec    1.00     89.3±1.00ms     2.9 GB/sec
string_dictionary/cdc                              1.04     53.4±1.04ms     4.8 GB/sec    1.00     51.3±0.69ms     5.0 GB/sec
string_dictionary/default                          1.00     47.2±0.88ms     5.5 GB/sec    1.00     47.4±1.01ms     5.4 GB/sec
string_dictionary/parquet_2                        1.00     54.3±0.29ms     4.8 GB/sec    1.00     54.4±0.41ms     4.7 GB/sec
string_dictionary/zstd                             1.01    210.4±1.61ms  1255.2 MB/sec    1.00    207.9±1.40ms  1270.2 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.9±0.21ms  1327.7 MB/sec    1.00    199.1±0.17ms  1326.9 MB/sec
string_non_null/bloom_filter                       1.04   252.2±13.88ms     2.0 GB/sec    1.00   243.3±11.42ms     2.1 GB/sec
string_non_null/cdc                                1.00    267.0±2.82ms  1962.7 MB/sec    1.00    268.2±7.99ms  1953.7 MB/sec
string_non_null/default                            1.00   139.2±12.40ms     3.7 GB/sec    1.00   139.6±15.40ms     3.7 GB/sec
string_non_null/parquet_2                          1.00    130.7±2.87ms     3.9 GB/sec    1.00    130.6±1.07ms     3.9 GB/sec
string_non_null/zstd                               1.00    537.8±2.60ms   974.4 MB/sec    1.01    542.7±4.31ms   965.6 MB/sec
string_non_null/zstd_parquet_2                     1.00    504.0±1.59ms  1039.6 MB/sec    1.00    504.7±0.67ms  1038.2 MB/sec
struct_all_null/bloom_filter                       1.00    374.8±1.26µs    42.0 GB/sec    1.00    373.4±1.01µs    42.2 GB/sec
struct_all_null/cdc                                1.05      8.3±0.07ms  1954.4 MB/sec    1.00      7.9±0.19ms     2.0 GB/sec
struct_all_null/default                            1.00    119.8±0.24µs   131.4 GB/sec    1.01    120.6±0.64µs   130.6 GB/sec
struct_all_null/parquet_2                          1.00    120.4±0.48µs   130.8 GB/sec    1.03    124.0±0.67µs   127.0 GB/sec
struct_all_null/zstd                               1.00    167.3±0.38µs    94.2 GB/sec    1.02    171.3±1.35µs    91.9 GB/sec
struct_all_null/zstd_parquet_2                     1.00    153.4±0.58µs   102.6 GB/sec    1.03    158.7±0.57µs    99.2 GB/sec
struct_non_null/bloom_filter                       1.02     47.1±0.28ms   339.7 MB/sec    1.00     46.0±0.09ms   348.0 MB/sec
struct_non_null/cdc                                1.01     45.8±0.12ms   349.6 MB/sec    1.00     45.3±0.16ms   353.0 MB/sec
struct_non_null/default                            1.01     32.3±0.10ms   496.1 MB/sec    1.00     31.9±0.10ms   502.0 MB/sec
struct_non_null/parquet_2                          1.00     41.0±0.15ms   390.6 MB/sec    1.07     43.9±1.44ms   364.8 MB/sec
struct_non_null/zstd                               1.01     41.1±0.12ms   389.5 MB/sec    1.00     40.7±0.10ms   392.8 MB/sec
struct_non_null/zstd_parquet_2                     1.00     55.0±0.12ms   290.9 MB/sec    1.00     54.8±0.77ms   291.8 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      6.5±0.04ms     2.4 GB/sec    1.06      6.9±0.05ms     2.3 GB/sec
struct_sparse_99pct_null/cdc                       1.01     14.5±0.09ms  1111.8 MB/sec    1.00     14.4±0.10ms  1120.2 MB/sec
struct_sparse_99pct_null/default                   1.00      6.0±0.03ms     2.6 GB/sec    1.05      6.3±0.03ms     2.5 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      6.0±0.03ms     2.6 GB/sec    1.06      6.4±0.03ms     2.5 GB/sec
struct_sparse_99pct_null/zstd                      1.00      7.4±0.03ms     2.1 GB/sec    1.04      7.7±0.03ms     2.0 GB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      6.8±0.04ms     2.3 GB/sec    1.04      7.1±0.03ms     2.2 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1925.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1890.5s
CPU sys 33.2s
Peak spill 0 B

branch

Metric Value
Wall time 1935.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1885.2s
CPU sys 47.6s
Peak spill 0 B

File an issue against this benchmark runner

@adriangb
Copy link
Copy Markdown
Contributor Author

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4547447661-338-jctsh 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing parquet-page-spill (36db7ea) to fd1c5b3 (merge-base) diff
BENCH_NAME=arrow_writer
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_writer
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                     main                                   parquet-page-spill
-----                                     ----                                   ------------------
read Binary(100) delta_byte_array         1.02      9.1±0.17ms        ? ?/sec    1.00      8.9±0.09ms        ? ?/sec
read Binary(100) delta_length             1.06      3.6±0.13ms        ? ?/sec    1.00      3.4±0.11ms        ? ?/sec
read Binary(100) dict                     1.02      4.0±0.04ms        ? ?/sec    1.00      3.9±0.02ms        ? ?/sec
read Binary(100) plain                    1.00      4.6±0.20ms        ? ?/sec    1.00      4.6±0.14ms        ? ?/sec
read Binary(20) delta_byte_array          1.00      4.9±0.02ms        ? ?/sec    1.00      4.9±0.03ms        ? ?/sec
read Binary(20) delta_length              1.01  1927.2±32.01µs        ? ?/sec    1.00  1912.5±27.14µs        ? ?/sec
read Binary(20) dict                      1.00      2.6±0.02ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
read Binary(20) plain                     1.01      3.0±0.08ms        ? ?/sec    1.00      2.9±0.06ms        ? ?/sec
read Fixed(16) byte_stream_split          1.00      5.5±0.04ms        ? ?/sec    1.01      5.5±0.03ms        ? ?/sec
read Fixed(16) delta_byte_array           1.00      3.5±0.06ms        ? ?/sec    1.01      3.5±0.06ms        ? ?/sec
read Fixed(16) dict                       1.01   654.3±10.34µs        ? ?/sec    1.00   645.1±10.87µs        ? ?/sec
read Fixed(16) plain                      1.04   667.9±17.87µs        ? ?/sec    1.00    643.8±7.98µs        ? ?/sec
read Fixed(2) byte_stream_split           1.00    938.1±6.36µs        ? ?/sec    1.00    935.4±2.81µs        ? ?/sec
read Fixed(2) delta_byte_array            1.00      3.2±0.02ms        ? ?/sec    1.01      3.2±0.04ms        ? ?/sec
read Fixed(2) dict                        1.02    447.2±8.78µs        ? ?/sec    1.00    439.8±2.34µs        ? ?/sec
read Fixed(2) plain                       1.01    450.1±4.97µs        ? ?/sec    1.00    443.7±2.67µs        ? ?/sec
read String(100) delta_byte_array         1.01      9.7±0.18ms        ? ?/sec    1.00      9.6±0.07ms        ? ?/sec
read String(100) delta_length             1.00      4.6±0.19ms        ? ?/sec    1.01      4.6±0.12ms        ? ?/sec
read String(100) dict                     1.00      3.9±0.03ms        ? ?/sec    1.01      4.0±0.02ms        ? ?/sec
read String(100) plain                    1.00      5.4±0.32ms        ? ?/sec    1.04      5.6±0.14ms        ? ?/sec
read String(20) delta_byte_array          1.00      5.3±0.03ms        ? ?/sec    1.00      5.3±0.05ms        ? ?/sec
read String(20) delta_length              1.00      2.0±0.03ms        ? ?/sec    1.01      2.1±0.07ms        ? ?/sec
read String(20) dict                      1.00      2.6±0.00ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
read String(20) plain                     1.00      3.1±0.07ms        ? ?/sec    1.02      3.1±0.08ms        ? ?/sec
read StringView(100) delta_byte_array     1.01      9.1±0.15ms        ? ?/sec    1.00      9.0±0.11ms        ? ?/sec
read StringView(100) delta_length         1.00      4.9±0.18ms        ? ?/sec    1.00      4.9±0.13ms        ? ?/sec
read StringView(100) dict                 1.01    999.8±9.16µs        ? ?/sec    1.00    992.7±2.72µs        ? ?/sec
read StringView(100) plain                1.01      5.4±0.28ms        ? ?/sec    1.00      5.4±0.20ms        ? ?/sec
read StringView(20) delta_byte_array      1.06      5.3±0.04ms        ? ?/sec    1.00      5.0±0.03ms        ? ?/sec
read StringView(20) delta_length          1.03      2.7±0.07ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
read StringView(20) dict                  1.00   988.4±10.67µs        ? ?/sec    1.00    985.1±4.85µs        ? ?/sec
read StringView(20) plain                 1.00  1700.5±34.54µs        ? ?/sec    1.00  1693.0±53.47µs        ? ?/sec
read f32 byte_stream_split                1.00  1030.7±11.73µs        ? ?/sec    1.00  1030.2±11.74µs        ? ?/sec
read f32 dict                             1.00   1455.3±9.19µs        ? ?/sec    1.00   1455.7±5.42µs        ? ?/sec
read f32 plain                            1.01   972.3±17.49µs        ? ?/sec    1.00    961.7±6.52µs        ? ?/sec
read f64 byte_stream_split                1.01  1570.9±22.57µs        ? ?/sec    1.00  1550.2±28.34µs        ? ?/sec
read f64 dict                             1.01   1488.4±8.86µs        ? ?/sec    1.00   1479.5±7.26µs        ? ?/sec
read f64 plain                            1.00  1115.9±11.59µs        ? ?/sec    1.00  1116.0±29.18µs        ? ?/sec
read int32 byte_stream_split              1.00   1083.1±7.76µs        ? ?/sec    1.01   1091.3±5.80µs        ? ?/sec
read int32 delta_binary                   1.00      2.2±0.03ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
read int32 dict                           1.01   1525.1±8.21µs        ? ?/sec    1.00   1516.6±3.53µs        ? ?/sec
read int32 plain                          1.01   1023.2±9.44µs        ? ?/sec    1.00   1010.2±4.93µs        ? ?/sec
read int64 byte_stream_split              1.00  1606.3±16.62µs        ? ?/sec    1.02  1636.1±57.92µs        ? ?/sec
read int64 delta_binary                   1.00      2.5±0.05ms        ? ?/sec    1.01      2.5±0.06ms        ? ?/sec
read int64 dict                           1.00   1542.1±8.46µs        ? ?/sec    1.00   1542.0±8.38µs        ? ?/sec
read int64 plain                          1.00  1180.7±21.51µs        ? ?/sec    1.00  1175.1±15.01µs        ? ?/sec
write Binary(100) delta_byte_array        1.00     17.4±0.24ms        ? ?/sec    1.00     17.3±0.16ms        ? ?/sec
write Binary(100) delta_length            1.01     13.6±0.22ms        ? ?/sec    1.00     13.5±0.09ms        ? ?/sec
write Binary(100) dict                    1.00     17.6±0.03ms        ? ?/sec    1.01     17.8±0.05ms        ? ?/sec
write Binary(100) plain                   1.03     13.1±0.28ms        ? ?/sec    1.00     12.7±0.23ms        ? ?/sec
write Binary(20) delta_byte_array         1.00     12.4±0.03ms        ? ?/sec    1.00     12.4±0.04ms        ? ?/sec
write Binary(20) delta_length             1.01      9.2±0.05ms        ? ?/sec    1.00      9.1±0.04ms        ? ?/sec
write Binary(20) dict                     1.00     13.2±0.06ms        ? ?/sec    1.00     13.3±0.05ms        ? ?/sec
write Binary(20) plain                    1.00      8.2±0.06ms        ? ?/sec    1.00      8.2±0.05ms        ? ?/sec
write Fixed(16) byte_stream_split         1.00     27.4±0.21ms        ? ?/sec    1.01     27.6±0.13ms        ? ?/sec
write Fixed(16) delta_byte_array          1.00     64.3±0.46ms        ? ?/sec    1.01     64.8±0.29ms        ? ?/sec
write Fixed(16) dict                      1.02     23.0±0.27ms        ? ?/sec    1.00     22.6±0.14ms        ? ?/sec
write Fixed(16) plain                     1.00     22.5±0.17ms        ? ?/sec    1.01     22.7±0.14ms        ? ?/sec
write Fixed(2) byte_stream_split          1.00     21.4±0.11ms        ? ?/sec    1.01     21.6±0.06ms        ? ?/sec
write Fixed(2) delta_byte_array           1.00     62.9±0.43ms        ? ?/sec    1.01     63.6±0.25ms        ? ?/sec
write Fixed(2) dict                       1.00     22.4±0.08ms        ? ?/sec    1.01     22.5±0.06ms        ? ?/sec
write Fixed(2) plain                      1.00     22.4±0.06ms        ? ?/sec    1.01     22.6±0.06ms        ? ?/sec
write String(100) delta_byte_array        1.00     17.2±0.26ms        ? ?/sec    1.08     18.6±0.99ms        ? ?/sec
write String(100) delta_length            1.00     14.8±1.17ms        ? ?/sec    1.62     23.9±0.52ms        ? ?/sec
write String(100) dict                    1.00     17.6±0.07ms        ? ?/sec    1.00     17.7±0.03ms        ? ?/sec
write String(100) plain                   1.00     12.7±0.31ms        ? ?/sec    1.00     12.7±0.10ms        ? ?/sec
write String(20) delta_byte_array         1.00     13.0±0.04ms        ? ?/sec    1.00     13.0±0.11ms        ? ?/sec
write String(20) delta_length             1.03      9.5±0.03ms        ? ?/sec    1.00      9.2±0.05ms        ? ?/sec
write String(20) dict                     1.00     13.2±0.03ms        ? ?/sec    1.00     13.2±0.06ms        ? ?/sec
write String(20) plain                    1.00      8.2±0.02ms        ? ?/sec    1.00      8.2±0.05ms        ? ?/sec
write StringView(100) delta_byte_array    1.00     17.8±0.27ms        ? ?/sec    1.16     20.6±1.96ms        ? ?/sec
write StringView(100) delta_length        1.00     13.9±0.23ms        ? ?/sec    1.08     15.1±1.69ms        ? ?/sec
write StringView(100) dict                1.00     18.5±0.10ms        ? ?/sec    1.00     18.6±0.03ms        ? ?/sec
write StringView(100) plain               1.00     12.9±0.28ms        ? ?/sec    1.00     12.9±0.12ms        ? ?/sec
write StringView(20) delta_byte_array     1.00     12.9±0.12ms        ? ?/sec    1.01     12.9±0.04ms        ? ?/sec
write StringView(20) delta_length         1.03      9.5±0.09ms        ? ?/sec    1.00      9.2±0.02ms        ? ?/sec
write StringView(20) dict                 1.00     13.5±0.04ms        ? ?/sec    1.00     13.6±0.06ms        ? ?/sec
write StringView(20) plain                1.00      8.1±0.02ms        ? ?/sec    1.01      8.1±0.05ms        ? ?/sec
write f32 byte_stream_split               1.00      5.1±0.05ms        ? ?/sec    1.04      5.3±0.09ms        ? ?/sec
write f32 dict                            1.01     16.0±0.04ms        ? ?/sec    1.00     15.8±0.35ms        ? ?/sec
write f32 plain                           1.00      5.1±0.07ms        ? ?/sec    1.03      5.3±0.07ms        ? ?/sec
write f64 byte_stream_split               1.00      6.8±0.04ms        ? ?/sec    1.02      6.9±0.09ms        ? ?/sec
write f64 dict                            1.01     16.0±0.04ms        ? ?/sec    1.00     15.8±0.30ms        ? ?/sec
write f64 plain                           1.00      5.9±0.03ms        ? ?/sec    1.02      6.0±0.08ms        ? ?/sec
write int32 byte_stream_split             1.00      5.4±0.05ms        ? ?/sec    1.02      5.5±0.05ms        ? ?/sec
write int32 delta_binary                  1.00     10.8±0.06ms        ? ?/sec    1.01     10.9±0.07ms        ? ?/sec
write int32 dict                          1.01     16.8±0.09ms        ? ?/sec    1.00     16.6±0.30ms        ? ?/sec
write int32 plain                         1.00      5.5±0.05ms        ? ?/sec    1.02      5.6±0.05ms        ? ?/sec
write int64 byte_stream_split             1.00      6.9±0.05ms        ? ?/sec    1.02      7.0±0.06ms        ? ?/sec
write int64 delta_binary                  1.01     12.3±0.19ms        ? ?/sec    1.00     12.1±0.10ms        ? ?/sec
write int64 dict                          1.01     16.4±0.09ms        ? ?/sec    1.00     16.2±0.30ms        ? ?/sec
write int64 plain                         1.00      6.2±0.05ms        ? ?/sec    1.01      6.2±0.06ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 960.2s
Peak memory 8.9 GiB
Avg memory 5.6 GiB
CPU user 930.3s
CPU sys 25.3s
Peak spill 0 B

branch

Metric Value
Wall time 945.2s
Peak memory 8.9 GiB
Avg memory 5.5 GiB
CPU user 915.0s
CPU sys 28.6s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                              main                                   parquet-page-spill
-----                                              ----                                   ------------------
bool/bloom_filter                                  1.00     13.0±0.04ms    19.3 MB/sec    1.01     13.1±0.06ms    19.1 MB/sec
bool/cdc                                           1.00     15.7±0.07ms    16.0 MB/sec    1.01     15.8±0.06ms    15.8 MB/sec
bool/default                                       1.00     10.9±0.04ms    23.0 MB/sec    1.01     10.9±0.07ms    22.9 MB/sec
bool/parquet_2                                     1.00     14.6±0.04ms    17.1 MB/sec    1.01     14.8±0.07ms    16.9 MB/sec
bool/zstd                                          1.00     11.4±0.04ms    21.9 MB/sec    1.01     11.5±0.07ms    21.7 MB/sec
bool/zstd_parquet_2                                1.00     15.0±0.05ms    16.6 MB/sec    1.01     15.1±0.05ms    16.5 MB/sec
bool_non_null/bloom_filter                         1.00      7.0±0.03ms    17.8 MB/sec    1.01      7.1±0.03ms    17.6 MB/sec
bool_non_null/cdc                                  1.00      6.9±0.04ms    18.2 MB/sec    1.01      6.9±0.05ms    18.1 MB/sec
bool_non_null/default                              1.00      4.3±0.02ms    29.2 MB/sec    1.02      4.4±0.02ms    28.7 MB/sec
bool_non_null/parquet_2                            1.00      9.0±0.05ms    13.8 MB/sec    1.01      9.1±0.03ms    13.7 MB/sec
bool_non_null/zstd                                 1.00      4.6±0.03ms    27.0 MB/sec    1.02      4.7±0.02ms    26.6 MB/sec
bool_non_null/zstd_parquet_2                       1.00      9.5±0.05ms    13.2 MB/sec    1.01      9.5±0.03ms    13.1 MB/sec
float_with_nans/bloom_filter                       1.00     92.6±0.45ms   151.3 MB/sec    1.00     93.0±0.41ms   150.6 MB/sec
float_with_nans/cdc                                1.00     81.7±0.24ms   171.4 MB/sec    1.00     82.0±0.34ms   170.7 MB/sec
float_with_nans/default                            1.00     74.7±0.30ms   187.5 MB/sec    1.00     74.7±0.33ms   187.5 MB/sec
float_with_nans/parquet_2                          1.00     94.7±0.56ms   147.9 MB/sec    1.00     94.8±0.40ms   147.8 MB/sec
float_with_nans/zstd                               1.00    112.0±0.39ms   125.0 MB/sec    1.00    112.3±0.29ms   124.7 MB/sec
float_with_nans/zstd_parquet_2                     1.00    131.8±0.63ms   106.3 MB/sec    1.00    132.3±0.61ms   105.8 MB/sec
list_primitive/bloom_filter                        1.01    337.9±2.06ms  1614.1 MB/sec    1.00    333.3±2.34ms  1636.5 MB/sec
list_primitive/cdc                                 1.00    367.9±1.75ms  1482.4 MB/sec    1.00    366.3±2.18ms  1488.7 MB/sec
list_primitive/default                             1.00    255.6±2.18ms     2.1 GB/sec    1.01    257.0±2.37ms     2.1 GB/sec
list_primitive/parquet_2                           1.01    277.1±1.30ms  1968.4 MB/sec    1.00    275.2±0.85ms  1982.0 MB/sec
list_primitive/zstd                                1.01    508.8±2.93ms  1071.9 MB/sec    1.00    503.6±1.38ms  1083.0 MB/sec
list_primitive/zstd_parquet_2                      1.01    500.6±0.85ms  1089.4 MB/sec    1.00    497.5±0.67ms  1096.1 MB/sec
list_primitive_non_null/bloom_filter               1.00    403.1±5.40ms  1350.1 MB/sec    1.08    434.9±6.03ms  1251.4 MB/sec
list_primitive_non_null/cdc                        1.01    440.6±7.71ms  1235.3 MB/sec    1.00    434.8±9.05ms  1251.7 MB/sec
list_primitive_non_null/default                    1.00    269.1±4.01ms  2022.2 MB/sec    1.11    299.7±7.94ms  1816.0 MB/sec
list_primitive_non_null/parquet_2                  1.00    295.9±1.09ms  1839.3 MB/sec    1.12    330.3±0.64ms  1647.8 MB/sec
list_primitive_non_null/zstd                       1.00    692.7±5.18ms   785.7 MB/sec    1.02   707.9±11.33ms   768.8 MB/sec
list_primitive_non_null/zstd_parquet_2             1.00    670.1±0.98ms   812.2 MB/sec    1.00    668.5±1.06ms   814.1 MB/sec
list_primitive_sparse_99pct_null/bloom_filter      1.00     11.3±0.12ms     3.2 GB/sec    1.02     11.4±0.06ms     3.2 GB/sec
list_primitive_sparse_99pct_null/cdc               1.00     22.9±0.13ms  1628.6 MB/sec    1.00     22.9±0.10ms  1631.3 MB/sec
list_primitive_sparse_99pct_null/default           1.00     10.9±0.11ms     3.3 GB/sec    1.02     11.2±0.04ms     3.3 GB/sec
list_primitive_sparse_99pct_null/parquet_2         1.00     11.0±0.13ms     3.3 GB/sec    1.01     11.1±0.03ms     3.3 GB/sec
list_primitive_sparse_99pct_null/zstd              1.00     12.8±0.13ms     2.8 GB/sec    1.01     13.0±0.05ms     2.8 GB/sec
list_primitive_sparse_99pct_null/zstd_parquet_2    1.00     11.1±0.11ms     3.3 GB/sec    1.02     11.3±0.04ms     3.2 GB/sec
primitive/bloom_filter                             1.01    151.5±0.56ms   296.3 MB/sec    1.00    149.8±0.60ms   299.7 MB/sec
primitive/cdc                                      1.01    160.8±0.74ms   279.0 MB/sec    1.00    159.5±0.60ms   281.3 MB/sec
primitive/default                                  1.00    118.7±0.56ms   377.9 MB/sec    1.00    118.4±0.48ms   379.0 MB/sec
primitive/parquet_2                                1.00    133.5±0.55ms   336.1 MB/sec    1.00    134.1±0.62ms   334.7 MB/sec
primitive/zstd                                     1.00    148.3±0.51ms   302.6 MB/sec    1.00    147.7±0.37ms   303.7 MB/sec
primitive/zstd_parquet_2                           1.00    167.0±0.55ms   268.7 MB/sec    1.00    166.5±0.38ms   269.4 MB/sec
primitive_all_null/bloom_filter                    1.02    895.4±3.08µs    48.9 GB/sec    1.00    881.2±2.18µs    49.7 GB/sec
primitive_all_null/cdc                             1.04     19.6±0.27ms     2.2 GB/sec    1.00     18.9±0.35ms     2.3 GB/sec
primitive_all_null/default                         1.00    279.4±0.82µs   156.8 GB/sec    1.00    280.7±1.51µs   156.1 GB/sec
primitive_all_null/parquet_2                       1.00    278.9±1.45µs   157.1 GB/sec    1.03    286.4±1.76µs   153.0 GB/sec
primitive_all_null/zstd                            1.00    390.3±0.97µs   112.3 GB/sec    1.02    398.5±1.24µs   110.0 GB/sec
primitive_all_null/zstd_parquet_2                  1.00    356.3±1.43µs   123.0 GB/sec    1.03    366.2±1.31µs   119.7 GB/sec
primitive_non_null/bloom_filter                    1.01    109.0±0.66ms   403.7 MB/sec    1.00    108.4±0.55ms   406.0 MB/sec
primitive_non_null/cdc                             1.00     90.5±0.46ms   486.5 MB/sec    1.00     90.9±2.25ms   484.1 MB/sec
primitive_non_null/default                         1.00     68.1±0.42ms   646.1 MB/sec    1.00     68.4±0.34ms   643.1 MB/sec
primitive_non_null/parquet_2                       1.00     89.7±0.47ms   490.6 MB/sec    1.00     90.0±0.32ms   488.8 MB/sec
primitive_non_null/zstd                            1.07    105.4±1.06ms   417.3 MB/sec    1.00     98.8±0.44ms   445.2 MB/sec
primitive_non_null/zstd_parquet_2                  1.05    129.5±2.63ms   339.8 MB/sec    1.00    123.4±0.42ms   356.6 MB/sec
primitive_sparse_99pct_null/bloom_filter           1.02     18.6±0.36ms     2.4 GB/sec    1.00     18.3±0.16ms     2.4 GB/sec
primitive_sparse_99pct_null/cdc                    1.03     37.4±0.32ms  1198.9 MB/sec    1.00     36.4±0.35ms  1231.5 MB/sec
primitive_sparse_99pct_null/default                1.02     17.1±0.12ms     2.6 GB/sec    1.00     16.8±0.06ms     2.6 GB/sec
primitive_sparse_99pct_null/parquet_2              1.01     17.0±0.15ms     2.6 GB/sec    1.00     16.8±0.06ms     2.6 GB/sec
primitive_sparse_99pct_null/zstd                   1.01     20.3±0.15ms     2.2 GB/sec    1.00     20.1±0.08ms     2.2 GB/sec
primitive_sparse_99pct_null/zstd_parquet_2         1.01     18.9±0.16ms     2.3 GB/sec    1.00     18.7±0.08ms     2.3 GB/sec
string/bloom_filter                                1.00   220.3±21.00ms     2.3 GB/sec    1.01   222.6±21.16ms     2.3 GB/sec
string/cdc                                         1.00    221.7±4.74ms     2.3 GB/sec    1.00    222.2±6.72ms     2.3 GB/sec
string/default                                     1.00   126.9±20.95ms     4.0 GB/sec    1.03   130.3±19.16ms     3.9 GB/sec
string/parquet_2                                   1.00    111.5±6.74ms     4.6 GB/sec    1.14    127.2±1.12ms     4.0 GB/sec
string/zstd                                        1.00    417.8±2.62ms  1254.6 MB/sec    1.04   435.8±16.87ms  1203.1 MB/sec
string/zstd_parquet_2                              1.00    402.1±6.87ms  1303.7 MB/sec    1.00    403.2±4.12ms  1300.2 MB/sec
string_and_binary_view/bloom_filter                1.00     64.4±0.40ms   500.4 MB/sec    1.00     64.7±0.42ms   498.4 MB/sec
string_and_binary_view/cdc                         1.00     58.4±0.16ms   552.3 MB/sec    1.00     58.4±0.19ms   552.6 MB/sec
string_and_binary_view/default                     1.00     48.1±0.15ms   670.8 MB/sec    1.00     48.1±0.18ms   670.1 MB/sec
string_and_binary_view/parquet_2                   1.00     59.1±0.17ms   546.0 MB/sec    1.00     59.1±0.22ms   545.9 MB/sec
string_and_binary_view/zstd                        1.00     84.8±0.17ms   380.4 MB/sec    1.00     84.5±0.18ms   381.6 MB/sec
string_and_binary_view/zstd_parquet_2              1.00     72.9±0.16ms   442.5 MB/sec    1.00     72.8±0.19ms   442.8 MB/sec
string_dictionary/bloom_filter                     1.01     91.9±1.30ms     2.8 GB/sec    1.00     90.6±1.51ms     2.8 GB/sec
string_dictionary/cdc                              1.02     53.0±1.11ms     4.9 GB/sec    1.00     51.7±1.00ms     5.0 GB/sec
string_dictionary/default                          1.00     47.0±1.10ms     5.5 GB/sec    1.01     47.4±1.26ms     5.4 GB/sec
string_dictionary/parquet_2                        1.00     54.3±0.46ms     4.8 GB/sec    1.00     54.4±0.26ms     4.7 GB/sec
string_dictionary/zstd                             1.01    210.5±1.90ms  1254.6 MB/sec    1.00    208.3±1.82ms  1268.0 MB/sec
string_dictionary/zstd_parquet_2                   1.00    198.9±0.43ms  1328.2 MB/sec    1.00    198.7±0.50ms  1329.2 MB/sec
string_non_null/bloom_filter                       1.02   253.7±13.69ms     2.0 GB/sec    1.00   248.8±12.71ms     2.1 GB/sec
string_non_null/cdc                                1.00    265.5±2.77ms  1973.6 MB/sec    1.01    269.0±8.74ms  1948.3 MB/sec
string_non_null/default                            1.00   138.4±11.75ms     3.7 GB/sec    1.03   142.7±16.24ms     3.6 GB/sec
string_non_null/parquet_2                          1.01    131.8±2.95ms     3.9 GB/sec    1.00    131.1±1.14ms     3.9 GB/sec
string_non_null/zstd                               1.00    536.9±3.36ms   976.0 MB/sec    1.01    543.1±4.32ms   964.8 MB/sec
string_non_null/zstd_parquet_2                     1.00    502.6±0.77ms  1042.6 MB/sec    1.00    504.1±0.90ms  1039.5 MB/sec
struct_all_null/bloom_filter                       1.00    368.6±1.30µs    42.7 GB/sec    1.00    369.4±1.19µs    42.6 GB/sec
struct_all_null/cdc                                1.06      8.3±0.09ms  1953.2 MB/sec    1.00      7.8±0.13ms     2.0 GB/sec
struct_all_null/default                            1.00    119.7±0.34µs   131.5 GB/sec    1.01    120.9±0.70µs   130.3 GB/sec
struct_all_null/parquet_2                          1.00    120.3±0.48µs   130.9 GB/sec    1.03    123.3±0.78µs   127.7 GB/sec
struct_all_null/zstd                               1.00    167.1±0.40µs    94.2 GB/sec    1.02    171.0±0.77µs    92.1 GB/sec
struct_all_null/zstd_parquet_2                     1.00    153.1±0.84µs   102.9 GB/sec    1.03    158.1±1.03µs    99.6 GB/sec
struct_non_null/bloom_filter                       1.00     45.8±0.24ms   349.7 MB/sec    1.00     45.7±0.15ms   350.2 MB/sec
struct_non_null/cdc                                1.01     45.4±0.26ms   352.7 MB/sec    1.00     45.0±0.13ms   355.5 MB/sec
struct_non_null/default                            1.01     31.8±0.16ms   502.9 MB/sec    1.00     31.6±0.11ms   505.6 MB/sec
struct_non_null/parquet_2                          1.00     40.6±0.17ms   394.5 MB/sec    1.02     41.5±1.79ms   385.4 MB/sec
struct_non_null/zstd                               1.01     40.7±0.14ms   393.1 MB/sec    1.00     40.4±0.12ms   395.8 MB/sec
struct_non_null/zstd_parquet_2                     1.00     54.7±0.18ms   292.5 MB/sec    1.00     54.5±0.21ms   293.5 MB/sec
struct_sparse_99pct_null/bloom_filter              1.00      6.5±0.10ms     2.4 GB/sec    1.05      6.8±0.04ms     2.3 GB/sec
struct_sparse_99pct_null/cdc                       1.03     14.5±0.13ms  1113.2 MB/sec    1.00     14.1±0.08ms  1141.7 MB/sec
struct_sparse_99pct_null/default                   1.00      5.9±0.06ms     2.7 GB/sec    1.05      6.2±0.02ms     2.5 GB/sec
struct_sparse_99pct_null/parquet_2                 1.00      5.9±0.05ms     2.7 GB/sec    1.05      6.2±0.02ms     2.5 GB/sec
struct_sparse_99pct_null/zstd                      1.00      7.3±0.07ms     2.2 GB/sec    1.04      7.6±0.03ms     2.1 GB/sec
struct_sparse_99pct_null/zstd_parquet_2            1.00      6.7±0.05ms     2.4 GB/sec    1.05      7.0±0.02ms     2.3 GB/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 1925.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1889.4s
CPU sys 33.3s
Peak spill 0 B

branch

Metric Value
Wall time 1935.4s
Peak memory 6.6 GiB
Avg memory 6.4 GiB
CPU user 1880.6s
CPU sys 50.8s
Peak spill 0 B

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants