Skip to content

bench: io_uring-backed LocalFileSystem for dfbench (Linux)#21793

Closed
Dandandan wants to merge 2 commits intoapache:mainfrom
Dandandan:io-uring-local-fs
Closed

bench: io_uring-backed LocalFileSystem for dfbench (Linux)#21793
Dandandan wants to merge 2 commits intoapache:mainfrom
Dandandan:io-uring-local-fs

Conversation

@Dandandan
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

When profiling DataFusion's local parquet reads under ClickBench, object_store::LocalFileSystem::get_ranges serializes all range reads inside a single spawn_blocking task:

async fn get_ranges(&self, location: &Path, ranges: &[Range<u64>]) -> Result<Vec<Bytes>> {
    let path = self.path_to_filesystem(location)?;
    let ranges = ranges.to_vec();
    maybe_spawn_blocking(move || {
        // Vectored IO might be faster
        let mut file = File::open(&path).map_err(|e| map_open_error(e, &path))?;
        ranges.into_iter().map(|r| read_range(&mut file, &path, r)).collect()
    }).await
}

One blocking thread, N sequential seek + read_exact pairs. On NVMe devices with meaningful queue-depth capability, and on cold-cache reads, this leaves a lot of parallelism unused — the kernel block layer would happily service many concurrent reads if we asked it to.

This PR adds a benchmark-only alternative ObjectStore (no changes to datafusion/ or object_store) that routes the range reads through an io_uring submission queue, so N preads become N concurrent kernel-side operations. It's intended as a tool for A/B measurement rather than a production-quality replacement.

What changes are included in this PR?

  • benchmarks/src/util/uring_local_fs.rs (new, ~480 lines): a UringLocalFileSystem implementing object_store::ObjectStore. It owns an inner: LocalFileSystem for non-read ops and a dedicated io-uring-driver OS thread that owns the IoUring instance and the submission/completion loop.
  • benchmarks/src/util/mod.rs: registers the module under #[cfg(target_os = \"linux\")].
  • benchmarks/src/util/options.rs: CommonOpt::build_runtime registers UringLocalFileSystem for file:/// by default on Linux, with DATAFUSION_IO_URING=0 as the opt-out. Layers with --simulate-latency as expected (LatencyObjectStore wraps the uring store).
  • benchmarks/Cargo.toml: io-uring = \"0.7\" added under [target.'cfg(target_os = \"linux\")'.dependencies], so non-Linux targets don't pull it in.

Driver shape:

  1. Any tokio task calls submit_read(Arc<File>, offset, len) — a sync fn — which sends a Cmd::Read over an mpsc and returns a oneshot::Receiver<io::Result<Bytes>>. This is sync on purpose: get_ranges enqueues all N ranges before awaiting any of them, so the driver sees the whole batch in one try_recv drain.
  2. The driver fills the SQ up to free slots, submit_and_wait(1) to flush and block for at least one completion when work is outstanding, then drains the CQ and fires the oneshots. Idles with blocking_recv() when empty.
  3. Buffers (Box<[u8]>) and the keep-alive Arc<File> live in the driver's in_flight map until the corresponding CQ arrives — the kernel never writes into freed memory or a closed fd.

Known rough edges (documented in the module header):

  • No fd cache — one open(2) per get_ranges call (same as today).
  • No registered buffers / IORING_OP_READV — one SQE per range, heap allocation per op.
  • No IORING_OP_ASYNC_CANCEL on dropped-future cancellation; the submission runs to completion and its result is discarded.
  • Metrics / tracing not yet plumbed in.

Not included in this PR: any change to object_store or datafusion core, or any production path. All non-Linux users get the stock LocalFileSystem via the existing cfg-gated code.

Are these changes tested?

  • cargo check -p datafusion-benchmarks and cargo clippy -p datafusion-benchmarks --all-targets -- -D warnings pass on macOS (Linux module is cfg-d out).
  • The Linux build path has not yet been exercised on a real Linux toolchain in this change — please let CI / benchmark runners exercise it before merging. Running ./target/release-nonlto/dfbench clickbench --iterations 3 --path <hits_partitioned> --queries-path benchmarks/queries/clickbench/queries with and without DATAFUSION_IO_URING=0 is the expected first validation.

Are there any user-facing changes?

Only within dfbench on Linux:

  • Startup prints Using io_uring-backed LocalFileSystem so it's visible which backend is in effect.
  • DATAFUSION_IO_URING=0 in the environment restores the stock LocalFileSystem.

No API changes. No changes to any crate that downstream users depend on.

Adds `UringLocalFileSystem`, an `ObjectStore` that routes byte-range
reads through a dedicated io_uring driver thread and delegates all
other operations to an inner `LocalFileSystem`. Wired into
`CommonOpt::build_runtime` and registered for `file:///` by default on
Linux; opt out with `DATAFUSION_IO_URING=0`. Non-Linux targets are
unaffected — the module and the `io-uring` dep are both gated on
`target_os = "linux"`.

Design:
- `UringSubmitter` owns an `mpsc::UnboundedSender<Cmd>`; any tokio task
  calls `submit_read` (sync) to enqueue an `IORING_OP_READ`, getting back
  a `oneshot::Receiver<Result<Bytes>>`.
- A dedicated `io-uring-driver` thread owns the `IoUring` instance. Each
  iteration drains the mpsc up to the submission queue's free slots,
  `submit_and_wait(1)` to flush SQEs and block for at least one CQ entry
  when work is outstanding, then drains CQ entries and fires the
  oneshots. When nothing is in flight it `blocking_recv()` for the next
  command instead of spinning.
- `get_ranges` submits all N ranges into the mpsc synchronously before
  awaiting any of them, so the driver sees the whole batch in one drain
  and hands the kernel all N at once — giving the device queue depth
  > 1 for cold reads instead of the one-pread-at-a-time pattern that
  `LocalFileSystem::get_ranges` produces today.
- Buffers (`Box<[u8]>`) and the `Arc<File>` keeping the fd open are
  retained in the driver's `in_flight` map until the corresponding CQ
  entry arrives, so the kernel never writes into freed memory or a
  closed fd.

Rough edges (called out in module-level docs):
- No fd cache (one `open(2)` per `get_ranges` call, same as today).
- No registered buffers / `IORING_OP_READV` — one SQE per range, heap
  allocation per op.
- No `IORING_OP_ASYNC_CANCEL` on dropped-future cancellation; the
  submission runs to completion and its result is discarded.
- Metrics / tracing not yet plumbed in.

Intended as a first draft for benchmarking; good enough to A/B against
the stock `LocalFileSystem` on ClickBench-style workloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4299178582-1761-z64xs 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing io-uring-local-fs (8844eb9) to 9a1ed57 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4299178582-1762-sbrvf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing io-uring-local-fs (8844eb9) to 9a1ed57 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
   Compiling datafusion v53.1.0 (/workspace/datafusion-branch/datafusion/core)
   Compiling datafusion-benchmarks v53.1.0 (/workspace/datafusion-branch/benchmarks)
error[E0277]: `UringSubmitter` doesn't implement `std::fmt::Debug`
   --> benchmarks/src/util/uring_local_fs.rs:297:5
    |
294 | #[derive(Debug)]
    |          ----- in this derive macro expansion
...
297 |     submitter: UringSubmitter,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `std::fmt::Debug` is not implemented for `UringSubmitter`
    |
    = note: add `#[derive(Debug)]` to `UringSubmitter` or manually `impl std::fmt::Debug for UringSubmitter`
help: consider annotating `UringSubmitter` with `#[derive(Debug)]`
    |
103 + #[derive(Debug)]
104 | pub struct UringSubmitter {
    |

For more information about this error, try `rustc --explain E0277`.
error: could not compile `datafusion-benchmarks` (lib) due to 1 previous error

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4299178582-1763-ngnpr 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing io-uring-local-fs (8844eb9) to 9a1ed57 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
   Compiling datafusion v53.1.0 (/workspace/datafusion-branch/datafusion/core)
   Compiling datafusion-benchmarks v53.1.0 (/workspace/datafusion-branch/benchmarks)
error[E0277]: `UringSubmitter` doesn't implement `std::fmt::Debug`
   --> benchmarks/src/util/uring_local_fs.rs:297:5
    |
294 | #[derive(Debug)]
    |          ----- in this derive macro expansion
...
297 |     submitter: UringSubmitter,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `std::fmt::Debug` is not implemented for `UringSubmitter`
    |
    = note: add `#[derive(Debug)]` to `UringSubmitter` or manually `impl std::fmt::Debug for UringSubmitter`
help: consider annotating `UringSubmitter` with `#[derive(Debug)]`
    |
103 + #[derive(Debug)]
104 | pub struct UringSubmitter {
    |

For more information about this error, try `rustc --explain E0277`.
error: could not compile `datafusion-benchmarks` (lib) due to 1 previous error

File an issue against this benchmark runner

1 similar comment
@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
   Compiling datafusion v53.1.0 (/workspace/datafusion-branch/datafusion/core)
   Compiling datafusion-benchmarks v53.1.0 (/workspace/datafusion-branch/benchmarks)
error[E0277]: `UringSubmitter` doesn't implement `std::fmt::Debug`
   --> benchmarks/src/util/uring_local_fs.rs:297:5
    |
294 | #[derive(Debug)]
    |          ----- in this derive macro expansion
...
297 |     submitter: UringSubmitter,
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `std::fmt::Debug` is not implemented for `UringSubmitter`
    |
    = note: add `#[derive(Debug)]` to `UringSubmitter` or manually `impl std::fmt::Debug for UringSubmitter`
help: consider annotating `UringSubmitter` with `#[derive(Debug)]`
    |
103 + #[derive(Debug)]
104 | pub struct UringSubmitter {
    |

For more information about this error, try `rustc --explain E0277`.
error: could not compile `datafusion-benchmarks` (lib) due to 1 previous error

File an issue against this benchmark runner

The Debug derive on UringLocalFileSystem transitively requires Debug
on its fields; UringSubmitter is one of them, and tokio's
mpsc::UnboundedSender<T> already impls Debug regardless of T, so a
plain derive is enough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Copy Markdown
Contributor Author

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4299315837-1766-xczpr 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing io-uring-local-fs (fbeeee2) to 9a1ed57 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4299315837-1764-ck85p 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing io-uring-local-fs (fbeeee2) to 9a1ed57 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4299315837-1765-9cktr 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing io-uring-local-fs (fbeeee2) to 9a1ed57 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and io-uring-local-fs
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃                        io-uring-local-fs ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.85 / 7.40 ±0.81 / 9.01 ms │              6.82 / 7.30 ±0.80 / 8.89 ms │     no change │
│ QQuery 2  │        147.63 / 147.98 ±0.33 / 148.45 ms │        145.70 / 145.89 ±0.17 / 146.11 ms │     no change │
│ QQuery 3  │        113.20 / 114.36 ±1.02 / 116.22 ms │        112.79 / 113.42 ±0.64 / 114.57 ms │     no change │
│ QQuery 4  │     1280.27 / 1291.44 ±8.73 / 1303.53 ms │     1282.24 / 1290.43 ±4.77 / 1296.36 ms │     no change │
│ QQuery 5  │        172.15 / 173.09 ±0.75 / 173.89 ms │        173.98 / 174.77 ±0.98 / 176.54 ms │     no change │
│ QQuery 6  │       801.41 / 839.75 ±21.99 / 866.46 ms │       821.78 / 835.59 ±11.26 / 854.35 ms │     no change │
│ QQuery 7  │        332.54 / 334.20 ±1.26 / 335.83 ms │        333.82 / 336.04 ±1.70 / 338.95 ms │     no change │
│ QQuery 8  │        112.30 / 113.33 ±1.31 / 115.88 ms │        112.10 / 113.11 ±0.78 / 114.27 ms │     no change │
│ QQuery 9  │        100.15 / 105.21 ±2.79 / 108.53 ms │        109.70 / 117.09 ±5.71 / 123.93 ms │  1.11x slower │
│ QQuery 10 │        100.40 / 102.88 ±2.19 / 106.75 ms │        101.67 / 103.54 ±2.71 / 108.89 ms │     no change │
│ QQuery 11 │        886.44 / 898.05 ±9.32 / 913.79 ms │        889.74 / 899.22 ±7.08 / 911.40 ms │     no change │
│ QQuery 12 │           43.06 / 43.41 ±0.25 / 43.80 ms │           43.16 / 43.55 ±0.31 / 44.03 ms │     no change │
│ QQuery 13 │        388.42 / 390.38 ±1.98 / 393.78 ms │        385.61 / 389.57 ±4.21 / 396.54 ms │     no change │
│ QQuery 14 │        986.15 / 993.13 ±5.10 / 999.89 ms │       992.19 / 997.50 ±4.19 / 1004.10 ms │     no change │
│ QQuery 15 │           14.53 / 14.72 ±0.13 / 14.90 ms │           14.49 / 14.73 ±0.25 / 15.22 ms │     no change │
│ QQuery 16 │              7.34 / 7.49 ±0.19 / 7.86 ms │              7.23 / 7.34 ±0.10 / 7.54 ms │     no change │
│ QQuery 17 │        221.91 / 223.62 ±1.26 / 225.84 ms │        220.84 / 223.26 ±2.26 / 227.33 ms │     no change │
│ QQuery 18 │        121.84 / 122.89 ±1.12 / 124.95 ms │        121.56 / 122.60 ±0.82 / 123.65 ms │     no change │
│ QQuery 19 │        153.51 / 154.40 ±0.71 / 155.45 ms │        152.13 / 153.73 ±2.50 / 158.72 ms │     no change │
│ QQuery 20 │           13.11 / 13.42 ±0.33 / 14.01 ms │           12.82 / 13.09 ±0.22 / 13.41 ms │     no change │
│ QQuery 21 │           18.93 / 19.28 ±0.32 / 19.76 ms │           18.93 / 19.27 ±0.29 / 19.80 ms │     no change │
│ QQuery 22 │        476.52 / 478.88 ±1.67 / 480.72 ms │        475.02 / 477.24 ±1.67 / 479.91 ms │     no change │
│ QQuery 23 │        815.34 / 822.23 ±3.70 / 825.60 ms │        816.40 / 829.55 ±7.40 / 837.58 ms │     no change │
│ QQuery 24 │        374.33 / 377.04 ±2.99 / 382.80 ms │        379.38 / 383.31 ±6.90 / 397.09 ms │     no change │
│ QQuery 25 │        331.83 / 334.25 ±1.71 / 336.92 ms │        333.60 / 339.08 ±6.38 / 350.82 ms │     no change │
│ QQuery 26 │           76.39 / 76.74 ±0.23 / 77.02 ms │           76.43 / 79.17 ±3.39 / 85.79 ms │     no change │
│ QQuery 27 │              6.81 / 6.97 ±0.15 / 7.25 ms │              6.85 / 6.98 ±0.16 / 7.27 ms │     no change │
│ QQuery 28 │        148.39 / 149.60 ±1.22 / 151.79 ms │        156.67 / 158.20 ±0.80 / 158.97 ms │  1.06x slower │
│ QQuery 29 │        272.19 / 278.34 ±8.34 / 294.46 ms │        271.28 / 276.52 ±5.96 / 286.82 ms │     no change │
│ QQuery 30 │           41.11 / 41.61 ±0.33 / 42.07 ms │           41.14 / 42.17 ±0.60 / 42.87 ms │     no change │
│ QQuery 31 │        164.48 / 166.73 ±3.34 / 173.34 ms │        168.12 / 168.93 ±0.90 / 170.67 ms │     no change │
│ QQuery 32 │           13.28 / 13.58 ±0.32 / 14.20 ms │           13.32 / 13.56 ±0.19 / 13.88 ms │     no change │
│ QQuery 33 │        138.13 / 139.40 ±0.84 / 140.63 ms │        138.40 / 140.12 ±2.00 / 143.78 ms │     no change │
│ QQuery 34 │              6.96 / 7.09 ±0.17 / 7.43 ms │              6.93 / 7.08 ±0.20 / 7.47 ms │     no change │
│ QQuery 35 │         99.37 / 100.50 ±0.88 / 101.80 ms │        101.07 / 102.78 ±1.19 / 104.00 ms │     no change │
│ QQuery 36 │              6.67 / 6.83 ±0.12 / 7.02 ms │              6.59 / 6.85 ±0.18 / 7.04 ms │     no change │
│ QQuery 37 │              8.13 / 8.34 ±0.15 / 8.56 ms │              8.05 / 8.21 ±0.08 / 8.29 ms │     no change │
│ QQuery 38 │           84.29 / 85.31 ±0.79 / 86.29 ms │           86.84 / 90.16 ±2.88 / 95.48 ms │  1.06x slower │
│ QQuery 39 │        117.62 / 120.83 ±3.91 / 128.52 ms │        117.56 / 119.22 ±0.98 / 120.32 ms │     no change │
│ QQuery 40 │        101.79 / 104.89 ±2.61 / 108.70 ms │        101.77 / 107.93 ±7.24 / 120.48 ms │     no change │
│ QQuery 41 │           13.97 / 14.20 ±0.24 / 14.61 ms │           13.89 / 14.14 ±0.25 / 14.58 ms │     no change │
│ QQuery 42 │        106.42 / 107.80 ±1.57 / 110.88 ms │        105.33 / 106.16 ±0.48 / 106.64 ms │     no change │
│ QQuery 43 │              5.73 / 5.87 ±0.16 / 6.18 ms │              5.68 / 5.77 ±0.14 / 6.04 ms │     no change │
│ QQuery 44 │           11.54 / 11.65 ±0.07 / 11.76 ms │           11.50 / 11.66 ±0.10 / 11.79 ms │     no change │
│ QQuery 45 │           48.08 / 48.50 ±0.27 / 48.92 ms │           47.96 / 51.25 ±5.48 / 62.17 ms │  1.06x slower │
│ QQuery 46 │             8.42 / 9.53 ±1.99 / 13.49 ms │              8.24 / 8.41 ±0.14 / 8.64 ms │ +1.13x faster │
│ QQuery 47 │        681.53 / 692.84 ±5.89 / 698.43 ms │        690.91 / 698.02 ±5.18 / 706.13 ms │     no change │
│ QQuery 48 │        272.71 / 275.65 ±2.03 / 278.32 ms │        275.30 / 282.17 ±5.76 / 292.33 ms │     no change │
│ QQuery 49 │        248.53 / 251.21 ±2.72 / 255.31 ms │        247.92 / 250.13 ±2.40 / 254.69 ms │     no change │
│ QQuery 50 │        201.07 / 206.84 ±4.28 / 212.49 ms │       205.43 / 214.22 ±10.10 / 232.71 ms │     no change │
│ QQuery 51 │        174.86 / 178.98 ±3.06 / 184.20 ms │        177.78 / 182.48 ±7.40 / 197.21 ms │     no change │
│ QQuery 52 │        106.90 / 107.53 ±0.49 / 108.26 ms │        106.25 / 106.66 ±0.32 / 106.99 ms │     no change │
│ QQuery 53 │        101.64 / 103.09 ±1.21 / 105.25 ms │        101.10 / 102.64 ±2.19 / 106.98 ms │     no change │
│ QQuery 54 │        142.16 / 143.65 ±1.05 / 145.19 ms │        141.90 / 144.23 ±1.72 / 147.18 ms │     no change │
│ QQuery 55 │        105.98 / 106.72 ±1.00 / 108.69 ms │        105.30 / 105.74 ±0.45 / 106.56 ms │     no change │
│ QQuery 56 │        139.25 / 140.52 ±0.97 / 141.80 ms │        138.30 / 139.92 ±1.13 / 141.45 ms │     no change │
│ QQuery 57 │        164.78 / 165.85 ±0.71 / 166.98 ms │        166.25 / 168.09 ±2.17 / 172.01 ms │     no change │
│ QQuery 58 │        309.57 / 310.55 ±0.60 / 311.27 ms │        310.81 / 312.46 ±1.16 / 314.29 ms │     no change │
│ QQuery 59 │        198.42 / 199.94 ±1.54 / 202.80 ms │        194.89 / 195.52 ±0.65 / 196.78 ms │     no change │
│ QQuery 60 │        140.42 / 141.10 ±0.60 / 141.98 ms │        141.06 / 142.15 ±1.00 / 143.86 ms │     no change │
│ QQuery 61 │           13.23 / 13.36 ±0.21 / 13.78 ms │           13.46 / 13.58 ±0.19 / 13.96 ms │     no change │
│ QQuery 62 │        852.74 / 866.31 ±7.05 / 873.19 ms │        867.89 / 880.19 ±9.90 / 896.37 ms │     no change │
│ QQuery 63 │        102.21 / 103.05 ±0.57 / 103.66 ms │        101.43 / 105.33 ±6.65 / 118.63 ms │     no change │
│ QQuery 64 │        656.30 / 660.03 ±3.57 / 666.26 ms │        665.91 / 677.69 ±9.25 / 693.53 ms │     no change │
│ QQuery 65 │        242.94 / 247.07 ±5.59 / 257.95 ms │        243.54 / 246.56 ±5.08 / 256.69 ms │     no change │
│ QQuery 66 │        216.10 / 223.29 ±7.40 / 232.60 ms │       213.35 / 225.64 ±14.95 / 253.78 ms │     no change │
│ QQuery 67 │        291.82 / 296.42 ±6.47 / 308.95 ms │       294.59 / 318.19 ±26.09 / 362.28 ms │  1.07x slower │
│ QQuery 68 │            8.50 / 10.43 ±3.47 / 17.35 ms │              8.49 / 8.70 ±0.27 / 9.24 ms │ +1.20x faster │
│ QQuery 69 │          97.37 / 99.27 ±3.27 / 105.80 ms │        97.54 / 104.33 ±11.45 / 127.12 ms │  1.05x slower │
│ QQuery 70 │       309.07 / 325.76 ±12.79 / 347.50 ms │        326.43 / 330.15 ±3.13 / 333.80 ms │     no change │
│ QQuery 71 │        134.23 / 136.34 ±2.35 / 140.84 ms │        133.32 / 134.04 ±0.78 / 135.22 ms │     no change │
│ QQuery 72 │        594.45 / 608.31 ±9.05 / 618.64 ms │        587.80 / 596.52 ±6.63 / 607.20 ms │     no change │
│ QQuery 73 │              6.61 / 6.76 ±0.20 / 7.15 ms │              6.47 / 6.59 ±0.18 / 6.95 ms │     no change │
│ QQuery 74 │        563.39 / 573.72 ±7.61 / 586.85 ms │        557.69 / 564.51 ±5.67 / 573.67 ms │     no change │
│ QQuery 75 │        269.88 / 272.75 ±2.75 / 276.75 ms │        269.61 / 272.20 ±4.42 / 281.01 ms │     no change │
│ QQuery 76 │        130.80 / 132.67 ±1.08 / 134.04 ms │        131.42 / 134.51 ±5.06 / 144.58 ms │     no change │
│ QQuery 77 │        187.74 / 189.69 ±1.76 / 192.98 ms │        185.74 / 187.02 ±1.24 / 189.18 ms │     no change │
│ QQuery 78 │        328.81 / 333.17 ±2.79 / 336.40 ms │        330.64 / 334.79 ±3.05 / 339.09 ms │     no change │
│ QQuery 79 │        229.42 / 231.68 ±2.54 / 236.26 ms │        227.54 / 232.39 ±6.00 / 243.86 ms │     no change │
│ QQuery 80 │        319.03 / 321.56 ±2.12 / 324.44 ms │        319.64 / 322.28 ±2.64 / 327.29 ms │     no change │
│ QQuery 81 │           25.51 / 25.80 ±0.21 / 26.03 ms │           25.48 / 25.90 ±0.38 / 26.59 ms │     no change │
│ QQuery 82 │           39.17 / 39.62 ±0.23 / 39.77 ms │           39.17 / 39.39 ±0.24 / 39.85 ms │     no change │
│ QQuery 83 │           36.73 / 36.86 ±0.09 / 37.02 ms │           36.58 / 37.36 ±0.52 / 37.87 ms │     no change │
│ QQuery 84 │           45.59 / 46.03 ±0.61 / 47.23 ms │           45.90 / 46.12 ±0.23 / 46.56 ms │     no change │
│ QQuery 85 │        138.76 / 140.40 ±1.77 / 143.58 ms │        139.15 / 139.94 ±0.56 / 140.80 ms │     no change │
│ QQuery 86 │           37.20 / 37.49 ±0.31 / 37.93 ms │           36.87 / 39.61 ±4.66 / 48.90 ms │  1.06x slower │
│ QQuery 87 │              3.45 / 3.53 ±0.15 / 3.84 ms │              3.46 / 3.57 ±0.16 / 3.89 ms │     no change │
│ QQuery 88 │         98.40 / 100.86 ±2.56 / 105.22 ms │         99.81 / 100.68 ±0.47 / 101.21 ms │     no change │
│ QQuery 89 │        115.69 / 116.41 ±0.49 / 117.18 ms │        116.15 / 118.91 ±4.76 / 128.42 ms │     no change │
│ QQuery 90 │           22.26 / 22.59 ±0.31 / 23.12 ms │           22.27 / 22.53 ±0.31 / 23.12 ms │     no change │
│ QQuery 91 │           58.05 / 59.97 ±1.21 / 61.32 ms │           58.63 / 59.07 ±0.38 / 59.62 ms │     no change │
│ QQuery 92 │           55.63 / 56.27 ±0.44 / 56.84 ms │           55.80 / 56.55 ±0.62 / 57.31 ms │     no change │
│ QQuery 93 │        180.71 / 181.80 ±1.21 / 184.15 ms │        180.46 / 182.61 ±1.93 / 185.99 ms │     no change │
│ QQuery 94 │           60.02 / 60.61 ±0.56 / 61.64 ms │           60.12 / 61.45 ±1.84 / 65.09 ms │     no change │
│ QQuery 95 │        125.83 / 126.51 ±0.65 / 127.73 ms │        125.99 / 127.04 ±0.97 / 128.31 ms │     no change │
│ QQuery 96 │           68.31 / 69.08 ±0.50 / 69.65 ms │           69.47 / 72.48 ±3.32 / 78.83 ms │     no change │
│ QQuery 97 │        116.39 / 118.17 ±1.55 / 120.44 ms │        117.28 / 117.89 ±0.59 / 118.87 ms │     no change │
│ QQuery 98 │        148.51 / 150.69 ±2.08 / 154.00 ms │        149.57 / 150.51 ±1.27 / 153.00 ms │     no change │
│ QQuery 99 │ 10729.96 / 10789.96 ±44.14 / 10844.71 ms │ 10774.66 / 10797.16 ±18.38 / 10816.71 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                │ 30485.87ms │
│ Total Time (io-uring-local-fs)   │ 30613.88ms │
│ Average Time (HEAD)              │   307.94ms │
│ Average Time (io-uring-local-fs) │   309.23ms │
│ Queries Faster                   │          2 │
│ Queries Slower                   │          7 │
│ Queries with No Change           │         90 │
│ Queries with Failure             │          0 │
└──────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 155.0s
Peak memory 6.2 GiB
Avg memory 5.5 GiB
CPU user 255.2s
CPU sys 8.1s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 155.0s
Peak memory 5.8 GiB
Avg memory 5.1 GiB
CPU user 255.7s
CPU sys 9.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and io-uring-local-fs
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃                     io-uring-local-fs ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.23 / 4.78 ±6.93 / 18.63 ms │          1.25 / 4.78 ±6.89 / 18.56 ms │     no change │
│ QQuery 1  │        12.99 / 13.43 ±0.25 / 13.69 ms │        12.75 / 13.25 ±0.27 / 13.56 ms │     no change │
│ QQuery 2  │        37.42 / 37.98 ±0.38 / 38.34 ms │        37.41 / 37.58 ±0.16 / 37.78 ms │     no change │
│ QQuery 3  │        31.96 / 32.90 ±0.92 / 34.56 ms │        40.67 / 41.26 ±0.73 / 42.68 ms │  1.25x slower │
│ QQuery 4  │     267.58 / 273.09 ±4.37 / 279.78 ms │    241.80 / 250.73 ±12.18 / 274.40 ms │ +1.09x faster │
│ QQuery 5  │     307.70 / 312.77 ±4.61 / 320.73 ms │    288.35 / 299.82 ±12.01 / 315.79 ms │     no change │
│ QQuery 6  │           6.78 / 7.06 ±0.26 / 7.50 ms │           7.21 / 7.86 ±0.52 / 8.54 ms │  1.11x slower │
│ QQuery 7  │        14.50 / 14.84 ±0.42 / 15.67 ms │        14.44 / 14.74 ±0.23 / 15.02 ms │     no change │
│ QQuery 8  │     356.08 / 358.84 ±2.21 / 362.36 ms │    328.20 / 343.16 ±13.80 / 367.11 ms │     no change │
│ QQuery 9  │     532.12 / 536.06 ±4.23 / 543.78 ms │    510.06 / 523.26 ±15.63 / 552.05 ms │     no change │
│ QQuery 10 │        77.79 / 78.52 ±0.52 / 79.14 ms │        79.03 / 82.67 ±4.15 / 89.77 ms │  1.05x slower │
│ QQuery 11 │        89.73 / 90.97 ±0.75 / 92.07 ms │        89.00 / 93.49 ±3.76 / 98.73 ms │     no change │
│ QQuery 12 │    277.14 / 289.83 ±11.90 / 309.54 ms │     301.47 / 307.47 ±4.40 / 312.74 ms │  1.06x slower │
│ QQuery 13 │     416.03 / 430.93 ±7.96 / 439.54 ms │     408.84 / 423.72 ±9.96 / 437.55 ms │     no change │
│ QQuery 14 │    294.90 / 306.10 ±13.08 / 327.37 ms │     290.54 / 303.46 ±9.94 / 318.08 ms │     no change │
│ QQuery 15 │    294.95 / 312.38 ±16.95 / 339.56 ms │     295.75 / 310.67 ±8.26 / 321.01 ms │     no change │
│ QQuery 16 │     645.21 / 653.33 ±6.89 / 662.72 ms │    678.97 / 704.29 ±16.03 / 727.80 ms │  1.08x slower │
│ QQuery 17 │    633.62 / 650.55 ±13.79 / 668.16 ms │     682.23 / 691.99 ±5.26 / 696.68 ms │  1.06x slower │
│ QQuery 18 │ 1303.89 / 1346.13 ±29.69 / 1389.52 ms │ 1321.94 / 1363.25 ±25.41 / 1395.08 ms │     no change │
│ QQuery 19 │        29.80 / 31.95 ±2.94 / 37.78 ms │        36.97 / 40.06 ±2.81 / 45.37 ms │  1.25x slower │
│ QQuery 20 │     524.93 / 533.90 ±9.45 / 551.38 ms │     569.65 / 574.13 ±3.46 / 577.81 ms │  1.08x slower │
│ QQuery 21 │     599.18 / 607.99 ±6.71 / 615.42 ms │     654.71 / 664.17 ±8.78 / 679.55 ms │  1.09x slower │
│ QQuery 22 │  1080.83 / 1096.23 ±9.34 / 1109.82 ms │ 1151.18 / 1179.22 ±21.67 / 1207.78 ms │  1.08x slower │
│ QQuery 23 │  3413.47 / 3428.66 ±9.05 / 3439.32 ms │ 3588.65 / 3637.67 ±44.81 / 3704.14 ms │  1.06x slower │
│ QQuery 24 │        42.00 / 43.36 ±1.16 / 45.27 ms │        50.97 / 57.93 ±6.31 / 65.74 ms │  1.34x slower │
│ QQuery 25 │     114.54 / 119.01 ±6.22 / 131.10 ms │     117.43 / 121.84 ±6.53 / 134.61 ms │     no change │
│ QQuery 26 │        44.64 / 48.36 ±5.08 / 58.39 ms │        49.87 / 51.67 ±1.52 / 53.71 ms │  1.07x slower │
│ QQuery 27 │    675.14 / 690.60 ±10.02 / 700.92 ms │     707.32 / 719.08 ±8.11 / 729.09 ms │     no change │
│ QQuery 28 │ 3070.84 / 3094.38 ±17.15 / 3119.39 ms │ 3059.66 / 3081.71 ±29.42 / 3139.54 ms │     no change │
│ QQuery 29 │       43.01 / 56.01 ±15.83 / 82.65 ms │        43.76 / 49.69 ±9.57 / 68.80 ms │ +1.13x faster │
│ QQuery 30 │    311.80 / 325.88 ±10.28 / 340.41 ms │     337.39 / 340.69 ±4.69 / 350.02 ms │     no change │
│ QQuery 31 │    303.34 / 317.98 ±10.28 / 332.71 ms │     343.95 / 359.15 ±9.16 / 371.16 ms │  1.13x slower │
│ QQuery 32 │ 1041.39 / 1068.18 ±30.05 / 1112.33 ms │ 1050.41 / 1077.91 ±19.35 / 1098.44 ms │     no change │
│ QQuery 33 │ 1494.45 / 1516.17 ±21.16 / 1548.56 ms │ 1494.64 / 1526.65 ±30.28 / 1584.09 ms │     no change │
│ QQuery 34 │ 1499.98 / 1525.86 ±32.19 / 1587.62 ms │ 1521.60 / 1560.04 ±30.78 / 1597.98 ms │     no change │
│ QQuery 35 │    286.68 / 312.92 ±25.48 / 359.67 ms │    298.40 / 332.70 ±35.04 / 391.87 ms │  1.06x slower │
│ QQuery 36 │        63.79 / 69.44 ±5.64 / 79.89 ms │        61.28 / 68.38 ±6.76 / 79.98 ms │     no change │
│ QQuery 37 │        37.99 / 42.69 ±5.46 / 53.09 ms │        35.39 / 36.08 ±0.92 / 37.88 ms │ +1.18x faster │
│ QQuery 38 │        40.55 / 42.86 ±2.91 / 48.11 ms │        43.75 / 47.59 ±2.39 / 51.23 ms │  1.11x slower │
│ QQuery 39 │     129.34 / 138.37 ±5.75 / 145.17 ms │     129.52 / 138.38 ±5.33 / 144.54 ms │     no change │
│ QQuery 40 │        14.50 / 15.67 ±0.71 / 16.44 ms │        14.65 / 14.86 ±0.13 / 15.05 ms │ +1.05x faster │
│ QQuery 41 │        13.70 / 15.20 ±2.61 / 20.40 ms │        14.11 / 14.46 ±0.28 / 14.75 ms │     no change │
│ QQuery 42 │        13.07 / 17.18 ±5.81 / 28.47 ms │        14.75 / 17.77 ±5.26 / 28.29 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                │ 20909.31ms │
│ Total Time (io-uring-local-fs)   │ 21529.28ms │
│ Average Time (HEAD)              │   486.26ms │
│ Average Time (io-uring-local-fs) │   500.68ms │
│ Queries Faster                   │          4 │
│ Queries Slower                   │         16 │
│ Queries with No Change           │         23 │
│ Queries with Failure             │          0 │
└──────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.7 GiB
Avg memory 23.4 GiB
CPU user 1107.5s
CPU sys 69.1s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 110.0s
Peak memory 29.1 GiB
Avg memory 22.0 GiB
CPU user 1107.2s
CPU sys 74.7s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan Dandandan closed this Apr 22, 2026
@adriangbot
Copy link
Copy Markdown

Benchmark for this request hit the 7200s job deadline before finishing.

Benchmarks requested: tpch

Kubernetes message
Job was active longer than specified deadline

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants