feat(parquet): runtime row-group early stop via TopK dynamic filter by zhuqi-lucas · Pull Request #22450 · apache/datafusion

zhuqi-lucas · 2026-05-22T03:22:37Z

Which issue does this PR close?

Rationale for this change

DataFusion already prunes parquet at three granularities — file
(EarlyStoppingStream + FilePruner), row group at scan-startup
(PruningPredicate → RowGroupAccessPlanFilter), and row inside an
open RG (RowFilter).

There's a gap in the middle: once Layer 1 (RG-static) picks the row
groups at file open, that decision is frozen because the dynamic
filter is still lit(true) then. As TopK tightens its threshold at
runtime, subsequent RGs in the already-opened file keep getting decoded
even when their stats already prove they can't beat the threshold. This
is the dominant cost for ORDER BY ... LIMIT queries on multi-RG files
where file-level pruning can't help (single large file, or scrambled-RG
multi-file).

See the issue for a full architectural diagram and a concrete trace
showing where the wasted I/O / decompression / decode lives.

What changes are included in this PR?

Two coordinated pieces that close the gap:

RowGroupPruner (in datafusion/datasource-parquet/src/push_decoder.rs)
mirrors FilePruner's pattern at row-group granularity. Tracks
snapshot_generation(&predicate) so the cached PruningPredicate
is rebuilt only when the dynamic filter has actually moved, then
evaluates against the next pending decoder run's row-group stats
via the existing RowGroupPruningStatistics adapter. Errors fall
back to "don't prune" — a flaky pruning path never silently drops
data.
Per-row-group decoder splitting when the predicate is dynamic.
ParquetAccessPlan::split_runs previously coalesced consecutive
same-fully_matched RGs into a single run. For ORDER BY + LIMIT
the initial dynamic filter is lit(true), so the static
fully-matched analysis marks nothing and split_runs collapsed
every RG into one run — leaving no inter-run hook. A new
force_per_row_group flag (set by is_dynamic_physical_expr)
disables coalescing for dynamic predicates only, so static
WHERE queries pay nothing.

PendingDecoderRun wraps each queued decoder with its row group
indices. PushDecoderStreamState::transition consults the pruner at
every run boundary and skips runs whose row groups are proved
unwinnable.

Observability

New Count metric row_groups_pruned_dynamic_filter on
ParquetFileMetrics surfaces the runtime saving.
New dynamic_rg_pruning=eligible marker on ParquetSource's
EXPLAIN (fmt_extra Default + Verbose) signals plan-time
eligibility. Eligible rather than true because the static
plan can't predict the runtime outcome.

Benchmarks (`benchmarks/sort_pushdown_inexact`, 5 iterations)

Query	main	this PR	Δ
Q1 `ORDER BY l_orderkey DESC LIMIT 100`	6.99 ms	3.80 ms	−46%
Q2 `ORDER BY l_orderkey DESC LIMIT 1000`	3.29 ms	1.33 ms	−60%
Q3 `SELECT * ... DESC LIMIT 100`	11.17 ms	9.91 ms	−11%
Q4 `SELECT * ... DESC LIMIT 1000`	9.28 ms	7.95 ms	−14%

Narrow-projection queries gain the most — their per-RG cost is
dominated by metadata + sort-column read, which this PR eliminates
for unwinnable RGs. Wide-projection queries gain less because the
kept RG's all-column decode dominates total time, but still see
meaningful savings.

Are these changes tested?

Yes. Three layers:

6 unit tests:
- 3 in push_decoder.rs::tests: RowGroupPruner basic pruning,
  generation-tracked dynamic-filter updates, fallback when the
  predicate has no analyzable bounds.
- 3 in source.rs::tests: dynamic_rg_pruning=eligible marker
  present on dynamic predicate, absent on static predicate, absent
  when there is no predicate at all.
2 integration tests in
datafusion/core/tests/parquet/dynamic_row_group_pruning.rs:
asserts row_groups_pruned_dynamic_filter >= 1 end-to-end on a
5-RG ORDER BY DESC LIMIT 5 scan, and asserts the metric stays at
0 when there is no TopK (no spurious firing).
New SLT
datafusion/sqllogictest/test_files/dynamic_row_group_pruning.slt:
asserts both EXPLAIN surfaces — plain EXPLAIN shows
dynamic_rg_pruning=eligible, and EXPLAIN ANALYZE pins
row_groups_pruned_dynamic_filter=4 (five RGs, four pruned at
runtime).

129 parquet unit + 204 parquet integration + SLT all pass.
cargo clippy --all-targets --all-features -- -D warnings clean.

Are there any user-facing changes?

Two visible additions, both opt-in via existing dynamic-filter
infrastructure:

New row_groups_pruned_dynamic_filter counter visible in
EXPLAIN ANALYZE for queries whose plan carries a
DynamicFilterPhysicalExpr (today: only TopK with
enable_topk_dynamic_filter_pushdown=true, which is the default).
New dynamic_rg_pruning=eligible marker visible in EXPLAIN
output for the same queries.

No config changes, no API breakage, no behavior change for queries
without a dynamic predicate.

Closes apache#22407. ## What Adds runtime row-group pruning between push-decoder runs, driven by the dynamic predicate a TopK `SortExec` pushes down via `DynamicFilterPhysicalExpr`. As the heap fills, the threshold tightens, and subsequent row groups whose statistics prove they cannot contribute are skipped without ever invoking their decoder — zero IO, zero decode. ## Why DataFusion already prunes parquet at three granularities — file (`EarlyStoppingStream`), row group at scan-startup (`PruningPredicate`), and row (`RowFilter`). There is a gap: once `Layer 1` selects a file's row groups, that decision is **frozen** at scan startup, when the dynamic filter is still `lit(true)`. As `TopK` tightens at runtime, subsequent RGs in the already-opened file keep being decoded even when stats prove they can't beat the threshold. This is the dominant cost for `ORDER BY ... LIMIT` queries on multi-RG files. See apache#22407 for the full architectural trace. ## How Two coordinated pieces: 1. **`RowGroupPruner`** (in `push_decoder.rs`). Mirrors `FilePruner`'s pattern at row-group granularity: tracks `snapshot_generation` so the cached `PruningPredicate` is rebuilt only when the dynamic filter has actually moved; evaluates against the next pending run's row-group stats via the existing `RowGroupPruningStatistics` adapter from `row_group_filter.rs`. Errors fall back to "don't prune" — a flaky pruning path never silently drops data. 2. **Per-RG decoder splitting when the predicate is dynamic**. `RowGroupAccessPlan::split_runs` previously coalesced consecutive same-`fully_matched` RGs into a single run. For ORDER BY + LIMIT the initial dynamic filter is `lit(true)`, the static fully-matched analysis marks nothing, and `split_runs` collapsed every RG into one run — leaving no inter-run hook for runtime pruning. A new `force_per_row_group` flag (set by `is_dynamic_physical_expr`) disables coalescing for dynamic predicates only, so static-WHERE queries pay nothing. Plumbing: `PendingDecoderRun` wraps each queued decoder with its row group indices. `PushDecoderStreamState::transition` consults the pruner at every run boundary and skips runs whose row groups are proved unwinnable. ## Observability - New `Count` metric `row_groups_pruned_dynamic_filter` on `ParquetFileMetrics` surfaces the runtime saving. - New `dynamic_rg_pruning=eligible` marker on `ParquetSource`'s `EXPLAIN` (`fmt_extra` Default + Verbose) signals plan-time eligibility — *eligible* rather than *true* because the static plan can't predict the runtime outcome. ## Benchmarks (`benchmarks/sort_pushdown_inexact`, 5 iters) | Query | main | this PR | Δ | |---|---|---|---| | Q1 `ORDER BY l_orderkey DESC LIMIT 100` | 6.99 ms | 3.80 ms | **−46%** | | Q2 `ORDER BY l_orderkey DESC LIMIT 1000` | 3.29 ms | 1.33 ms | **−60%** | | Q3 `SELECT * ... DESC LIMIT 100` | 11.17 ms | 9.91 ms | −11% | | Q4 `SELECT * ... DESC LIMIT 1000` | 9.28 ms | 7.95 ms | −14% | Narrow-projection queries gain the most — their per-RG cost is dominated by metadata + sort-column read, which this PR eliminates for unwinnable RGs. Wide-projection queries gain less because the *kept* RG's all-column decode dominates total time, but still see meaningful savings. ## Test coverage - 6 new unit tests: 3 on `RowGroupPruner` (basic pruning, generation-tracked dynamic updates, fallback when predicate has no analyzable bounds) + 3 on `fmt_extra` marker (present on dynamic predicate, absent on static, absent on no-predicate). - 2 new integration tests in `datafusion/core/tests/parquet/dynamic_row_group_pruning.rs`: asserts `row_groups_pruned_dynamic_filter >= 1` end-to-end on a 5-RG TopK query, and asserts the metric stays at 0 when no TopK is present (no spurious firing). - New SLT `datafusion/sqllogictest/test_files/dynamic_row_group_pruning.slt` asserts both `EXPLAIN` surfaces: plain EXPLAIN shows `dynamic_rg_pruning=eligible`, and EXPLAIN ANALYZE pins `row_groups_pruned_dynamic_filter=4` (five RGs, four pruned). 129 parquet unit + 204 parquet integration + SLT all pass. `cargo clippy -D warnings` clean.

Two CI failures on PR apache#22450: 1. **cargo doc** — broken intra-doc link in `ParquetFileMetrics::row_groups_pruned_dynamic_filter`. Switch from `[\`row_groups_pruned_statistics\`]` to `[\`Self::row_groups_pruned_statistics\`]` so rustdoc can resolve it. 2. **sqllogictest substrait round-trip** — adding `dynamic_rg_pruning=eligible` to ParquetSource's `fmt_extra` output shifted every `EXPLAIN` line that already showed a `DynamicFilter` predicate. Add the marker to 13 SLT expectations: - clickbench, explain_analyze, limit, limit_pruning, dynamic_filter_pushdown_config, preserve_file_partitioning, projection_pushdown, push_down_filter_parquet, push_down_filter_regression, repartition_subset_satisfaction, sort_pushdown, statistics_registry, topk - 134 marker insertions total, all on `DataSourceExec:` lines whose predicate contains `DynamicFilter [`. Two summary-level analyze tests also need the new `row_groups_pruned_dynamic_filter=0` counter in their metrics block (`limit_pruning.slt`, `dynamic_filter_pushdown_config.slt`). Dev-level analyze output elides zero-valued counters so the other files don't need it. No behavior change beyond what was already in the previous commit.

CI runs `cargo doc --document-private-items` which catches links on private items (the previous fix only covered public items). The `row_groups_pruned_dynamic` field's doc comment referenced `[\`row_group_pruner\`]` — same-struct field, needs `Self::` to resolve.

Dandandan · 2026-05-22T05:25:39Z

run benchmarks

adriangbot · 2026-05-22T05:28:44Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-274-nhjxx 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:28:44Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-275-nknmm 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:28:51Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-276-gkpr8 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:45:06Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 37.91 / 39.47 ±2.07 / 43.42 ms │     37.82 / 38.74 ±0.94 / 40.41 ms │ no change │
│ QQuery 2  │ 19.53 / 19.69 ±0.21 / 20.10 ms │     19.91 / 20.59 ±0.57 / 21.21 ms │ no change │
│ QQuery 3  │ 34.33 / 35.15 ±0.46 / 35.69 ms │     32.22 / 34.22 ±1.50 / 35.76 ms │ no change │
│ QQuery 4  │ 16.91 / 17.09 ±0.17 / 17.40 ms │     17.03 / 17.66 ±0.67 / 18.79 ms │ no change │
│ QQuery 5  │ 41.39 / 41.63 ±0.22 / 42.01 ms │     39.52 / 40.94 ±0.75 / 41.54 ms │ no change │
│ QQuery 6  │ 15.93 / 15.99 ±0.06 / 16.09 ms │     15.90 / 16.11 ±0.16 / 16.33 ms │ no change │
│ QQuery 7  │ 46.71 / 49.37 ±3.17 / 55.06 ms │     45.70 / 47.23 ±1.55 / 49.93 ms │ no change │
│ QQuery 8  │ 43.93 / 44.87 ±0.82 / 46.09 ms │     44.03 / 44.42 ±0.37 / 45.10 ms │ no change │
│ QQuery 9  │ 48.69 / 50.08 ±1.04 / 51.92 ms │     48.60 / 49.70 ±0.91 / 51.25 ms │ no change │
│ QQuery 10 │ 63.20 / 63.42 ±0.21 / 63.74 ms │     62.93 / 63.36 ±0.39 / 64.02 ms │ no change │
│ QQuery 11 │ 13.16 / 13.34 ±0.16 / 13.64 ms │     13.04 / 13.27 ±0.26 / 13.77 ms │ no change │
│ QQuery 12 │ 23.70 / 24.54 ±0.93 / 26.30 ms │     23.40 / 24.02 ±0.42 / 24.44 ms │ no change │
│ QQuery 13 │ 33.54 / 35.52 ±1.26 / 37.06 ms │     33.30 / 35.22 ±1.09 / 36.63 ms │ no change │
│ QQuery 14 │ 24.96 / 25.10 ±0.09 / 25.20 ms │     24.90 / 25.37 ±0.64 / 26.62 ms │ no change │
│ QQuery 15 │ 30.72 / 30.88 ±0.08 / 30.95 ms │     30.34 / 30.93 ±0.49 / 31.80 ms │ no change │
│ QQuery 16 │ 14.44 / 14.65 ±0.16 / 14.84 ms │     14.67 / 14.84 ±0.24 / 15.30 ms │ no change │
│ QQuery 17 │ 72.04 / 73.15 ±1.03 / 74.89 ms │     74.86 / 75.90 ±0.62 / 76.76 ms │ no change │
│ QQuery 18 │ 61.21 / 62.59 ±1.05 / 63.75 ms │     62.12 / 63.07 ±0.64 / 64.05 ms │ no change │
│ QQuery 19 │ 33.14 / 33.64 ±0.83 / 35.29 ms │     33.47 / 33.73 ±0.34 / 34.40 ms │ no change │
│ QQuery 20 │ 36.90 / 37.57 ±0.77 / 38.86 ms │     37.22 / 37.47 ±0.24 / 37.89 ms │ no change │
│ QQuery 21 │ 56.14 / 57.72 ±1.20 / 59.48 ms │     53.82 / 55.70 ±1.59 / 58.18 ms │ no change │
│ QQuery 22 │ 23.10 / 23.81 ±0.50 / 24.61 ms │     23.31 / 23.99 ±0.93 / 25.83 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 809.27ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 806.50ms │
│ Average Time (HEAD)                               │  36.78ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │  36.66ms │
│ Queries Faster                                    │        0 │
│ Queries Slower                                    │        0 │
│ Queries with No Change                            │       22 │
│ Queries with Failure                              │        0 │
└───────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.1 GiB
CPU user	29.4s
CPU sys	1.9s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	29.6s
CPU sys	1.8s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-05-22T05:46:29Z

run benchmark sort_pushdown_inexact

adriangbot · 2026-05-22T05:46:53Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │           5.69 / 6.30 ±0.88 / 8.05 ms │           5.78 / 6.31 ±0.95 / 8.21 ms │ no change │
│ QQuery 2  │        80.75 / 81.16 ±0.31 / 81.58 ms │        80.16 / 80.81 ±0.40 / 81.25 ms │ no change │
│ QQuery 3  │        29.38 / 29.57 ±0.13 / 29.71 ms │        28.91 / 29.15 ±0.16 / 29.38 ms │ no change │
│ QQuery 4  │     512.73 / 515.37 ±2.82 / 520.75 ms │     509.87 / 514.41 ±2.49 / 516.87 ms │ no change │
│ QQuery 5  │        50.96 / 51.70 ±0.61 / 52.66 ms │        50.75 / 51.13 ±0.44 / 51.94 ms │ no change │
│ QQuery 6  │        35.28 / 35.77 ±0.38 / 36.31 ms │        35.30 / 35.86 ±0.37 / 36.34 ms │ no change │
│ QQuery 7  │     109.30 / 110.73 ±1.72 / 112.86 ms │     108.59 / 109.56 ±0.94 / 110.99 ms │ no change │
│ QQuery 8  │        36.33 / 36.65 ±0.31 / 37.03 ms │        36.32 / 37.02 ±0.53 / 37.95 ms │ no change │
│ QQuery 9  │        53.25 / 55.51 ±2.05 / 58.81 ms │        53.15 / 54.23 ±0.72 / 54.96 ms │ no change │
│ QQuery 10 │        80.67 / 82.08 ±1.67 / 85.32 ms │        81.02 / 81.34 ±0.21 / 81.55 ms │ no change │
│ QQuery 11 │     314.18 / 317.53 ±4.18 / 325.76 ms │     310.62 / 316.44 ±4.99 / 322.48 ms │ no change │
│ QQuery 12 │        28.43 / 28.63 ±0.20 / 28.94 ms │        28.49 / 28.69 ±0.17 / 28.94 ms │ no change │
│ QQuery 13 │     125.87 / 126.80 ±1.06 / 128.82 ms │     126.23 / 126.71 ±0.54 / 127.68 ms │ no change │
│ QQuery 14 │     502.60 / 505.33 ±1.50 / 506.71 ms │     502.81 / 505.10 ±1.82 / 507.78 ms │ no change │
│ QQuery 15 │        60.38 / 61.69 ±1.71 / 65.03 ms │        60.49 / 61.18 ±0.66 / 62.17 ms │ no change │
│ QQuery 16 │           6.30 / 6.52 ±0.26 / 7.03 ms │           6.34 / 6.51 ±0.16 / 6.81 ms │ no change │
│ QQuery 17 │        80.24 / 81.16 ±0.67 / 82.22 ms │        80.30 / 80.98 ±0.58 / 81.73 ms │ no change │
│ QQuery 18 │     152.37 / 153.04 ±0.40 / 153.48 ms │     151.31 / 152.10 ±0.68 / 153.13 ms │ no change │
│ QQuery 19 │        40.79 / 41.20 ±0.25 / 41.56 ms │        41.17 / 41.46 ±0.21 / 41.79 ms │ no change │
│ QQuery 20 │        34.92 / 35.59 ±0.65 / 36.69 ms │        35.35 / 35.91 ±0.30 / 36.17 ms │ no change │
│ QQuery 21 │        16.69 / 16.98 ±0.24 / 17.33 ms │        16.78 / 17.01 ±0.23 / 17.45 ms │ no change │
│ QQuery 22 │        61.65 / 62.39 ±0.46 / 63.01 ms │        61.77 / 62.59 ±1.17 / 64.87 ms │ no change │
│ QQuery 23 │     480.28 / 482.80 ±1.89 / 485.52 ms │     479.56 / 482.95 ±3.05 / 487.66 ms │ no change │
│ QQuery 24 │     236.03 / 239.91 ±6.29 / 252.43 ms │     233.29 / 235.80 ±2.41 / 239.92 ms │ no change │
│ QQuery 25 │     114.42 / 114.91 ±0.67 / 116.14 ms │     112.30 / 114.90 ±1.52 / 116.61 ms │ no change │
│ QQuery 26 │        70.92 / 71.14 ±0.34 / 71.82 ms │        69.91 / 70.46 ±0.30 / 70.78 ms │ no change │
│ QQuery 27 │           6.42 / 6.56 ±0.16 / 6.87 ms │           6.47 / 6.64 ±0.22 / 7.08 ms │ no change │
│ QQuery 28 │        57.25 / 60.78 ±1.85 / 62.74 ms │        57.93 / 61.09 ±1.61 / 62.34 ms │ no change │
│ QQuery 29 │      98.46 / 100.38 ±2.61 / 105.53 ms │      98.98 / 101.66 ±3.64 / 108.85 ms │ no change │
│ QQuery 30 │        30.12 / 30.48 ±0.31 / 31.00 ms │        29.94 / 30.30 ±0.30 / 30.83 ms │ no change │
│ QQuery 31 │     111.59 / 113.79 ±2.44 / 118.38 ms │     111.55 / 112.71 ±1.71 / 116.05 ms │ no change │
│ QQuery 32 │        20.35 / 20.93 ±0.34 / 21.38 ms │        20.17 / 20.46 ±0.27 / 20.79 ms │ no change │
│ QQuery 33 │        38.68 / 39.14 ±0.35 / 39.58 ms │        38.31 / 38.57 ±0.20 / 38.80 ms │ no change │
│ QQuery 34 │           9.29 / 9.58 ±0.29 / 9.98 ms │          9.20 / 9.57 ±0.36 / 10.20 ms │ no change │
│ QQuery 35 │        80.49 / 81.05 ±0.48 / 81.93 ms │        80.76 / 81.51 ±0.49 / 82.23 ms │ no change │
│ QQuery 36 │           5.75 / 5.91 ±0.17 / 6.25 ms │           5.89 / 6.01 ±0.15 / 6.30 ms │ no change │
│ QQuery 37 │           6.78 / 6.95 ±0.11 / 7.06 ms │           6.72 / 6.94 ±0.27 / 7.42 ms │ no change │
│ QQuery 38 │        68.29 / 69.85 ±1.23 / 71.85 ms │        68.57 / 69.09 ±0.38 / 69.67 ms │ no change │
│ QQuery 39 │        98.14 / 98.47 ±0.32 / 99.05 ms │        97.99 / 98.65 ±0.51 / 99.53 ms │ no change │
│ QQuery 40 │        22.81 / 23.33 ±0.79 / 24.91 ms │        23.09 / 23.27 ±0.15 / 23.51 ms │ no change │
│ QQuery 41 │        11.03 / 11.79 ±1.11 / 13.97 ms │        11.24 / 11.71 ±0.44 / 12.29 ms │ no change │
│ QQuery 42 │        23.77 / 24.22 ±0.29 / 24.69 ms │        24.09 / 24.42 ±0.37 / 25.05 ms │ no change │
│ QQuery 43 │           4.64 / 4.76 ±0.17 / 5.09 ms │           4.80 / 4.91 ±0.17 / 5.24 ms │ no change │
│ QQuery 44 │        10.51 / 10.58 ±0.07 / 10.71 ms │        10.60 / 10.85 ±0.17 / 11.09 ms │ no change │
│ QQuery 45 │        39.70 / 40.69 ±0.70 / 41.49 ms │        40.41 / 40.86 ±0.33 / 41.33 ms │ no change │
│ QQuery 46 │        12.83 / 13.15 ±0.27 / 13.55 ms │        12.69 / 12.87 ±0.16 / 13.12 ms │ no change │
│ QQuery 47 │     229.18 / 232.49 ±2.73 / 236.09 ms │     228.68 / 231.85 ±1.91 / 233.47 ms │ no change │
│ QQuery 48 │     102.74 / 103.73 ±0.80 / 104.88 ms │     103.30 / 103.95 ±0.91 / 105.69 ms │ no change │
│ QQuery 49 │        78.52 / 79.33 ±0.62 / 80.41 ms │        78.96 / 79.36 ±0.45 / 80.18 ms │ no change │
│ QQuery 50 │        59.63 / 60.33 ±0.37 / 60.68 ms │        59.40 / 59.73 ±0.22 / 60.07 ms │ no change │
│ QQuery 51 │        92.53 / 95.36 ±1.81 / 97.37 ms │       92.09 / 95.60 ±4.02 / 102.85 ms │ no change │
│ QQuery 52 │        23.93 / 24.34 ±0.37 / 25.02 ms │        23.81 / 24.28 ±0.38 / 24.74 ms │ no change │
│ QQuery 53 │        29.46 / 29.71 ±0.16 / 29.96 ms │        29.04 / 29.30 ±0.20 / 29.57 ms │ no change │
│ QQuery 54 │        53.72 / 54.35 ±0.36 / 54.67 ms │        54.44 / 55.05 ±0.44 / 55.75 ms │ no change │
│ QQuery 55 │        23.52 / 24.34 ±1.07 / 26.44 ms │        23.21 / 23.56 ±0.24 / 23.91 ms │ no change │
│ QQuery 56 │        39.01 / 39.29 ±0.26 / 39.78 ms │        38.43 / 38.84 ±0.28 / 39.20 ms │ no change │
│ QQuery 57 │     178.37 / 180.09 ±1.83 / 183.56 ms │     175.76 / 176.93 ±1.12 / 178.61 ms │ no change │
│ QQuery 58 │     118.44 / 118.92 ±0.37 / 119.44 ms │     117.20 / 117.99 ±0.54 / 118.58 ms │ no change │
│ QQuery 59 │     117.61 / 119.84 ±2.42 / 123.96 ms │     117.87 / 118.49 ±0.72 / 119.91 ms │ no change │
│ QQuery 60 │        39.05 / 39.84 ±0.52 / 40.49 ms │        38.74 / 39.22 ±0.42 / 39.82 ms │ no change │
│ QQuery 61 │        12.52 / 12.63 ±0.10 / 12.81 ms │        12.61 / 12.79 ±0.24 / 13.25 ms │ no change │
│ QQuery 62 │        46.89 / 47.53 ±0.38 / 47.98 ms │        46.32 / 46.78 ±0.48 / 47.68 ms │ no change │
│ QQuery 63 │        29.70 / 31.05 ±2.04 / 35.10 ms │        29.42 / 29.66 ±0.25 / 30.09 ms │ no change │
│ QQuery 64 │     462.94 / 468.99 ±7.22 / 482.49 ms │     458.19 / 462.97 ±5.14 / 471.10 ms │ no change │
│ QQuery 65 │     146.94 / 150.20 ±2.58 / 152.60 ms │     149.10 / 151.28 ±1.81 / 153.79 ms │ no change │
│ QQuery 66 │        78.92 / 81.62 ±3.98 / 89.53 ms │        78.42 / 80.72 ±2.20 / 84.79 ms │ no change │
│ QQuery 67 │     245.43 / 251.17 ±5.20 / 259.23 ms │     248.87 / 251.06 ±2.35 / 255.17 ms │ no change │
│ QQuery 68 │        12.91 / 13.11 ±0.22 / 13.53 ms │        13.06 / 13.23 ±0.15 / 13.51 ms │ no change │
│ QQuery 69 │        76.13 / 79.82 ±4.99 / 89.27 ms │        76.72 / 77.40 ±0.76 / 78.84 ms │ no change │
│ QQuery 70 │     106.87 / 110.44 ±3.00 / 115.47 ms │     105.01 / 109.92 ±6.93 / 123.68 ms │ no change │
│ QQuery 71 │        35.58 / 35.96 ±0.27 / 36.41 ms │        35.37 / 36.07 ±0.57 / 36.94 ms │ no change │
│ QQuery 72 │ 2132.94 / 2183.71 ±42.17 / 2236.73 ms │ 2111.52 / 2158.63 ±35.37 / 2214.83 ms │ no change │
│ QQuery 73 │           9.03 / 9.24 ±0.22 / 9.65 ms │           9.11 / 9.30 ±0.17 / 9.63 ms │ no change │
│ QQuery 74 │     177.63 / 180.03 ±2.51 / 183.70 ms │     175.98 / 181.10 ±5.65 / 191.23 ms │ no change │
│ QQuery 75 │     145.51 / 146.86 ±1.25 / 149.13 ms │     146.38 / 147.93 ±1.47 / 150.33 ms │ no change │
│ QQuery 76 │        35.58 / 35.88 ±0.29 / 36.38 ms │        35.34 / 35.87 ±0.46 / 36.52 ms │ no change │
│ QQuery 77 │        59.95 / 60.42 ±0.49 / 61.35 ms │        60.26 / 60.70 ±0.42 / 61.43 ms │ no change │
│ QQuery 78 │     188.05 / 191.81 ±4.01 / 199.22 ms │     187.70 / 191.78 ±3.19 / 195.27 ms │ no change │
│ QQuery 79 │        67.60 / 68.26 ±0.66 / 69.06 ms │        66.72 / 67.34 ±0.46 / 68.11 ms │ no change │
│ QQuery 80 │     100.87 / 101.21 ±0.28 / 101.63 ms │     100.03 / 101.28 ±1.64 / 104.53 ms │ no change │
│ QQuery 81 │        24.09 / 24.30 ±0.12 / 24.43 ms │        24.27 / 24.51 ±0.17 / 24.74 ms │ no change │
│ QQuery 82 │        16.37 / 16.55 ±0.19 / 16.91 ms │        16.57 / 16.72 ±0.17 / 17.03 ms │ no change │
│ QQuery 83 │        36.90 / 38.82 ±2.29 / 43.14 ms │        37.15 / 37.48 ±0.40 / 38.16 ms │ no change │
│ QQuery 84 │        43.61 / 43.96 ±0.34 / 44.57 ms │        44.04 / 45.58 ±1.86 / 49.12 ms │ no change │
│ QQuery 85 │     135.59 / 136.80 ±1.39 / 139.37 ms │     136.81 / 137.88 ±0.89 / 139.18 ms │ no change │
│ QQuery 86 │        24.79 / 25.64 ±0.98 / 27.53 ms │        25.15 / 25.53 ±0.22 / 25.76 ms │ no change │
│ QQuery 87 │        68.82 / 70.07 ±0.82 / 71.02 ms │        69.29 / 70.06 ±0.61 / 71.17 ms │ no change │
│ QQuery 88 │        61.33 / 61.95 ±0.45 / 62.63 ms │        62.63 / 63.33 ±0.57 / 64.00 ms │ no change │
│ QQuery 89 │        35.25 / 35.73 ±0.26 / 36.03 ms │        35.55 / 36.03 ±0.26 / 36.27 ms │ no change │
│ QQuery 90 │        16.65 / 16.84 ±0.17 / 17.15 ms │        16.98 / 17.23 ±0.20 / 17.48 ms │ no change │
│ QQuery 91 │        52.24 / 54.16 ±2.38 / 58.84 ms │        52.30 / 53.65 ±1.75 / 57.02 ms │ no change │
│ QQuery 92 │        29.88 / 30.31 ±0.50 / 31.18 ms │        29.84 / 30.43 ±0.40 / 30.97 ms │ no change │
│ QQuery 93 │        50.10 / 50.89 ±0.46 / 51.43 ms │        50.33 / 51.35 ±0.74 / 52.58 ms │ no change │
│ QQuery 94 │        37.79 / 38.44 ±0.65 / 39.69 ms │        37.56 / 38.59 ±0.57 / 39.14 ms │ no change │
│ QQuery 95 │        85.10 / 85.88 ±0.49 / 86.60 ms │        84.23 / 85.24 ±0.88 / 86.74 ms │ no change │
│ QQuery 96 │        24.11 / 24.27 ±0.22 / 24.69 ms │        24.29 / 24.50 ±0.19 / 24.83 ms │ no change │
│ QQuery 97 │        45.50 / 46.04 ±0.35 / 46.55 ms │        46.05 / 46.31 ±0.25 / 46.74 ms │ no change │
│ QQuery 98 │        41.76 / 42.66 ±0.54 / 43.28 ms │        42.44 / 43.15 ±0.62 / 43.97 ms │ no change │
│ QQuery 99 │        70.01 / 70.78 ±0.61 / 71.75 ms │        69.97 / 70.68 ±0.58 / 71.49 ms │ no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 10498.86ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 10448.87ms │
│ Average Time (HEAD)                               │   106.05ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   105.54ms │
│ Queries Faster                                    │          0 │
│ Queries Slower                                    │          0 │
│ Queries with No Change                            │         99 │
│ Queries with Failure                              │          0 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	55.0s
Peak memory	6.9 GiB
Avg memory	6.1 GiB
CPU user	234.8s
CPU sys	6.0s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	55.0s
Peak memory	7.0 GiB
Avg memory	6.3 GiB
CPU user	232.0s
CPU sys	5.8s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:47:32Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515459477-277-x4d6g 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:50:29Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.13 / 4.63 ±6.95 / 18.54 ms │          1.11 / 4.59 ±6.90 / 18.39 ms │     no change │
│ QQuery 1  │        12.84 / 12.94 ±0.08 / 13.04 ms │        12.42 / 12.73 ±0.18 / 12.95 ms │     no change │
│ QQuery 2  │        35.38 / 35.81 ±0.30 / 36.23 ms │        35.63 / 35.87 ±0.28 / 36.35 ms │     no change │
│ QQuery 3  │        30.33 / 30.94 ±0.92 / 32.76 ms │        30.38 / 30.89 ±0.60 / 32.06 ms │     no change │
│ QQuery 4  │     221.08 / 222.52 ±1.32 / 224.77 ms │     217.25 / 223.77 ±4.47 / 230.65 ms │     no change │
│ QQuery 5  │     271.34 / 273.02 ±1.66 / 276.10 ms │     267.27 / 271.69 ±2.28 / 273.41 ms │     no change │
│ QQuery 6  │           1.16 / 1.31 ±0.22 / 1.75 ms │           1.18 / 1.32 ±0.21 / 1.73 ms │     no change │
│ QQuery 7  │        13.82 / 14.01 ±0.16 / 14.24 ms │        14.29 / 14.44 ±0.14 / 14.67 ms │     no change │
│ QQuery 8  │     319.01 / 323.19 ±3.92 / 329.01 ms │     342.38 / 356.25 ±7.86 / 364.96 ms │  1.10x slower │
│ QQuery 9  │     446.98 / 451.24 ±3.38 / 455.61 ms │    463.18 / 475.35 ±10.89 / 489.79 ms │  1.05x slower │
│ QQuery 10 │        68.91 / 69.79 ±0.81 / 71.12 ms │        70.01 / 71.13 ±0.70 / 71.92 ms │     no change │
│ QQuery 11 │        80.17 / 81.70 ±1.04 / 83.41 ms │        80.46 / 81.98 ±1.59 / 84.88 ms │     no change │
│ QQuery 12 │     262.57 / 271.05 ±6.00 / 278.82 ms │     263.13 / 272.47 ±6.80 / 283.05 ms │     no change │
│ QQuery 13 │    360.50 / 369.16 ±10.13 / 388.45 ms │     357.78 / 364.59 ±3.90 / 369.19 ms │     no change │
│ QQuery 14 │     276.41 / 281.73 ±4.61 / 289.96 ms │     280.05 / 286.88 ±6.92 / 298.91 ms │     no change │
│ QQuery 15 │    264.25 / 276.30 ±11.55 / 292.24 ms │    294.81 / 307.60 ±14.15 / 333.58 ms │  1.11x slower │
│ QQuery 16 │    627.27 / 659.89 ±17.60 / 680.12 ms │    616.34 / 643.05 ±19.90 / 673.10 ms │     no change │
│ QQuery 17 │    624.54 / 639.67 ±10.36 / 653.36 ms │     602.95 / 613.57 ±9.74 / 631.66 ms │     no change │
│ QQuery 18 │ 1231.71 / 1264.30 ±32.09 / 1324.58 ms │ 1230.87 / 1249.03 ±13.77 / 1269.77 ms │     no change │
│ QQuery 19 │        29.95 / 33.49 ±3.93 / 38.81 ms │        27.88 / 34.23 ±7.63 / 44.83 ms │     no change │
│ QQuery 20 │     531.85 / 542.39 ±8.77 / 552.84 ms │     518.59 / 522.88 ±3.22 / 527.36 ms │     no change │
│ QQuery 21 │     590.04 / 600.69 ±8.53 / 615.62 ms │     589.86 / 595.16 ±4.54 / 603.56 ms │     no change │
│ QQuery 22 │  1047.46 / 1051.15 ±3.07 / 1054.60 ms │ 1085.28 / 1112.76 ±21.95 / 1146.31 ms │  1.06x slower │
│ QQuery 23 │ 3143.81 / 3215.59 ±61.28 / 3327.12 ms │ 3009.38 / 3115.16 ±74.14 / 3210.79 ms │     no change │
│ QQuery 24 │        43.61 / 45.10 ±1.49 / 47.69 ms │        41.73 / 49.54 ±6.86 / 58.93 ms │  1.10x slower │
│ QQuery 25 │     116.41 / 117.72 ±0.81 / 118.72 ms │     112.35 / 113.26 ±0.66 / 114.42 ms │     no change │
│ QQuery 26 │        44.00 / 45.32 ±1.39 / 47.91 ms │        41.64 / 42.41 ±0.68 / 43.37 ms │ +1.07x faster │
│ QQuery 27 │     668.27 / 677.75 ±9.71 / 696.11 ms │     666.49 / 674.74 ±7.19 / 686.27 ms │     no change │
│ QQuery 28 │ 3004.40 / 3024.76 ±13.42 / 3039.76 ms │ 3014.79 / 3056.61 ±36.42 / 3111.94 ms │     no change │
│ QQuery 29 │       40.04 / 51.77 ±14.67 / 76.55 ms │       41.49 / 49.56 ±15.74 / 81.03 ms │     no change │
│ QQuery 30 │     325.79 / 329.77 ±3.21 / 335.46 ms │    311.00 / 329.53 ±13.27 / 351.40 ms │     no change │
│ QQuery 31 │     296.89 / 313.26 ±9.84 / 323.06 ms │    278.41 / 291.30 ±11.42 / 310.81 ms │ +1.08x faster │
│ QQuery 32 │     934.13 / 949.51 ±9.91 / 963.93 ms │    914.90 / 929.16 ±15.07 / 958.29 ms │     no change │
│ QQuery 33 │ 1410.05 / 1512.63 ±79.05 / 1623.38 ms │ 1406.57 / 1430.30 ±12.88 / 1443.94 ms │ +1.06x faster │
│ QQuery 34 │ 1422.56 / 1450.90 ±17.81 / 1474.31 ms │ 1424.50 / 1486.88 ±73.50 / 1624.17 ms │     no change │
│ QQuery 35 │    272.88 / 291.31 ±26.72 / 344.13 ms │    276.79 / 319.68 ±35.69 / 379.77 ms │  1.10x slower │
│ QQuery 36 │        61.63 / 71.05 ±7.09 / 80.23 ms │      65.89 / 82.70 ±12.28 / 102.66 ms │  1.16x slower │
│ QQuery 37 │        34.65 / 35.40 ±0.76 / 36.80 ms │        36.38 / 40.19 ±5.70 / 51.48 ms │  1.14x slower │
│ QQuery 38 │        42.14 / 46.54 ±3.98 / 52.83 ms │        41.07 / 43.14 ±1.72 / 45.06 ms │ +1.08x faster │
│ QQuery 39 │     144.74 / 151.58 ±5.96 / 160.59 ms │     151.07 / 153.66 ±2.72 / 158.21 ms │     no change │
│ QQuery 40 │        13.68 / 16.08 ±3.85 / 23.74 ms │        15.04 / 15.47 ±0.36 / 15.96 ms │     no change │
│ QQuery 41 │        13.32 / 13.46 ±0.13 / 13.69 ms │        14.42 / 16.08 ±3.04 / 22.15 ms │  1.19x slower │
│ QQuery 42 │        12.88 / 14.68 ±3.42 / 21.51 ms │        14.07 / 14.26 ±0.12 / 14.40 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 19885.11ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 19835.88ms │
│ Average Time (HEAD)                               │   462.44ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   461.30ms │
│ Queries Faster                                    │          4 │
│ Queries Slower                                    │          9 │
│ Queries with No Change                            │         30 │
│ Queries with Failure                              │          0 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	29.8 GiB
Avg memory	22.9 GiB
CPU user	1032.4s
CPU sys	67.0s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	30.6 GiB
Avg memory	23.2 GiB
CPU user	1033.3s
CPU sys	67.3s
Peak spill	0 B

File an issue against this benchmark runner

Copilot

Pull request overview

This PR adds runtime row-group pruning for Parquet scans driven by TopK’s dynamic filter, closing the gap where row groups selected at file open couldn’t be re-pruned after the TopK threshold tightens during execution.

Changes:

Introduces a runtime RowGroupPruner that re-evaluates a dynamic predicate at decoder-run boundaries and skips row groups proven unreachable.
Forces per-row-group decoder splitting when the predicate is dynamic so the runtime pruner has a boundary at every RG.
Adds observability: dynamic_rg_pruning=eligible in EXPLAIN and a new metric row_groups_pruned_dynamic_filter in EXPLAIN ANALYZE, plus tests/SLTs updated accordingly.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
datafusion/datasource-parquet/src/push_decoder.rs	Adds `RowGroupPruner`, tracks row-group indices per decoder run, and skips prunable runs at runtime.
datafusion/datasource-parquet/src/opener/mod.rs	Forces per-RG runs for dynamic predicates; wires pending runs + runtime pruner into `PushDecoderStreamState`.
datafusion/datasource-parquet/src/access_plan.rs	Extends `split_runs` with `force_per_row_group` to avoid coalescing runs for dynamic predicates.
datafusion/datasource-parquet/src/source.rs	Adds `dynamic_rg_pruning=eligible` marker in `fmt_extra` and unit tests for the marker.
datafusion/datasource-parquet/src/row_group_filter.rs	Exposes `RowGroupPruningStatistics` to reuse stats adapter for runtime pruning.
datafusion/datasource-parquet/src/metrics.rs	Adds `row_groups_pruned_dynamic_filter` metric to `ParquetFileMetrics`.
datafusion/core/tests/parquet/mod.rs	Adds helper to read `row_groups_pruned_dynamic_filter` from metrics.
datafusion/core/tests/parquet/dynamic_row_group_pruning.rs	New integration tests validating metric fires for TopK and stays quiet otherwise.
datafusion/sqllogictest/test_files/dynamic_row_group_pruning.slt	New SLT covering both `EXPLAIN` marker and `EXPLAIN ANALYZE` metric value.
datafusion/sqllogictest/test_files/topk.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/statistics_registry.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/sort_pushdown.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/repartition_subset_satisfaction.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/push_down_filter_regression.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/push_down_filter_parquet.slt	Updates expected plans/metrics to include `dynamic_rg_pruning=eligible` and (where relevant) the new counter.
datafusion/sqllogictest/test_files/projection_pushdown.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/preserve_file_partitioning.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/limit.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/limit_pruning.slt	Updates expected metrics to include `row_groups_pruned_dynamic_filter=0` plus eligibility marker.
datafusion/sqllogictest/test_files/explain_analyze.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt	Updates expected plans/metrics to include eligibility marker and `row_groups_pruned_dynamic_filter=0` where applicable.
datafusion/sqllogictest/test_files/clickbench.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.

Comments suppressed due to low confidence (1)

datafusion/datasource-parquet/src/access_plan.rs:458

split_runs computes row_group_needs_filter as !fully_matched without considering the needs_filter argument. When force_per_row_group=true and the scan has no row filter (needs_filter=false), this will still mark all runs as needs_filter=true, causing the opener to treat them as filtered runs (e.g. attempting to fetch row filters / applying predicate-cache settings) even though no row-level filter exists. row_group_needs_filter should be derived as needs_filter && !fully_matched so the run metadata stays consistent with the caller’s capabilities.

        for (idx, (access, fully_matched)) in
            row_groups.into_iter().zip(fully_matched).enumerate()
        {
            if !access.should_scan() {
                continue;
            }

            let row_group_needs_filter = !fully_matched;
            // Coalesce consecutive RGs into a run only when (a) they share
            // the same filter requirement and (b) we're not forcing per-RG
            // splitting for runtime pruning.
            let can_coalesce = !force_per_row_group;
            if can_coalesce
                && let Some(run) = runs
                    .last_mut()
                    .filter(|run| run.needs_filter == row_group_needs_filter)
            {
                run.access_plan.set(idx, access);
                if fully_matched {
                    run.access_plan.mark_fully_matched(idx);
                }
            } else {
                let mut run_plan = ParquetAccessPlan::new_none(num_row_groups);
                run_plan.set(idx, access);
                if fully_matched {
                    run_plan.mark_fully_matched(idx);
                }
                runs.push(RowGroupRun::new(row_group_needs_filter, run_plan));
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adriangbot · 2026-05-22T05:57:44Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃       Change ┃
┡━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1    │ FAIL │                               FAIL │ incomparable │
│ Q2    │ FAIL │                               FAIL │ incomparable │
│ Q3    │ FAIL │                               FAIL │ incomparable │
│ Q4    │ FAIL │                               FAIL │ incomparable │
└───────┴──────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                 ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                 │ 0.00ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 0.00ms │
│ Average Time (HEAD)                               │ 0.00ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │ 0.00ms │
│ Queries Faster                                    │      0 │
│ Queries Slower                                    │      0 │
│ Queries with No Change                            │      0 │
│ Queries with Failure                              │      4 │
└───────────────────────────────────────────────────┴────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	3.7 GiB
Avg memory	3.7 GiB
CPU user	0.2s
CPU sys	0.1s
Peak spill	0 B

sort_pushdown_inexact — branch

Metric	Value
Wall time	5.0s
Peak memory	3.7 GiB
Avg memory	3.7 GiB
CPU user	0.1s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

Per Copilot review on apache#22450: `RowGroupPruner` was using a single `predicate_creation_errors` counter for both predicate construction (`build_pruning_predicate`) AND predicate evaluation (`PruningPredicate::prune`) failures. The log message also said "Ignoring error building..." when the failure was during evaluation. This misattributed evaluation failures and made the metric semantics inconsistent with the static row-group pruning path in `RowGroupAccessPlanFilter::prune_by_statistics`, which already separates the two. `RowGroupPruner::new` now takes both counters: - `predicate_creation_errors`: bumped on `build_pruning_predicate` failures. Wired to `prepared.predicate_creation_errors` from the opener — same field the static path uses. - `predicate_evaluation_errors`: bumped on `PruningPredicate::prune` failures. Wired to `prepared.file_metrics.predicate_evaluation_errors` — same field the static `prune_by_statistics` path uses, so the two paths accumulate into a shared counter. The error log message is updated to say "evaluating" so the metric and the log agree.

zhuqi-lucas · 2026-05-22T06:05:02Z

run benchmark sort_pushdown_inexact

adriangbot · 2026-05-22T06:08:17Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515622680-278-bcnbp 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (0828f1b) to a8f03fd (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T06:23:45Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃       Change ┃
┡━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1    │ FAIL │                               FAIL │ incomparable │
│ Q2    │ FAIL │                               FAIL │ incomparable │
│ Q3    │ FAIL │                               FAIL │ incomparable │
│ Q4    │ FAIL │                               FAIL │ incomparable │
└───────┴──────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                 ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                 │ 0.00ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 0.00ms │
│ Average Time (HEAD)                               │ 0.00ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │ 0.00ms │
│ Queries Faster                                    │      0 │
│ Queries Slower                                    │      0 │
│ Queries with No Change                            │      0 │
│ Queries with Failure                              │      4 │
└───────────────────────────────────────────────────┴────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	4.1 GiB
Avg memory	4.1 GiB
CPU user	0.1s
CPU sys	0.1s
Peak spill	0 B

sort_pushdown_inexact — branch

Metric	Value
Wall time	5.0s
Peak memory	4.1 GiB
Avg memory	4.1 GiB
CPU user	0.1s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-05-22T06:38:23Z

run benchmark topk_tpch

adriangbot · 2026-05-22T06:41:37Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515872406-280-dfq5d 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (0828f1b) to a8f03fd (merge-base) diff using: topk_tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T06:51:42Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark run_topk_tpch.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    2.14 / 2.74 ±0.76 / 4.10 ms │        2.12 / 2.79 ±0.68 / 4.02 ms │     no change │
│ Q2    │ 10.66 / 11.36 ±0.68 / 12.23 ms │        2.81 / 3.61 ±0.87 / 4.72 ms │ +3.15x faster │
│ Q3    │ 31.77 / 32.15 ±0.43 / 32.83 ms │     31.71 / 31.92 ±0.16 / 32.18 ms │     no change │
│ Q4    │ 11.83 / 12.29 ±0.77 / 13.82 ms │        3.13 / 3.25 ±0.13 / 3.48 ms │ +3.78x faster │
│ Q5    │  9.94 / 10.14 ±0.18 / 10.46 ms │      9.95 / 10.02 ±0.05 / 10.09 ms │     no change │
│ Q6    │ 17.19 / 17.39 ±0.15 / 17.56 ms │     17.11 / 17.36 ±0.37 / 18.09 ms │     no change │
│ Q7    │ 37.07 / 38.08 ±1.17 / 40.08 ms │     37.00 / 37.41 ±0.37 / 38.07 ms │     no change │
│ Q8    │ 28.13 / 28.59 ±0.60 / 29.71 ms │        6.86 / 7.16 ±0.42 / 7.98 ms │ +3.99x faster │
│ Q9    │ 35.34 / 36.86 ±1.54 / 38.77 ms │        8.36 / 8.50 ±0.08 / 8.60 ms │ +4.34x faster │
│ Q10   │ 54.13 / 55.29 ±1.83 / 58.93 ms │     12.77 / 13.00 ±0.45 / 13.89 ms │ +4.25x faster │
│ Q11   │    3.75 / 3.91 ±0.11 / 4.05 ms │        3.82 / 4.08 ±0.31 / 4.68 ms │     no change │
└───────┴────────────────────────────────┴────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 248.79ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 139.08ms │
│ Average Time (HEAD)                               │  22.62ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │  12.64ms │
│ Queries Faster                                    │        5 │
│ Queries Slower                                    │        0 │
│ Queries with No Change                            │        6 │
│ Queries with Failure                              │        0 │
└───────────────────────────────────────────────────┴──────────┘

Resource Usage

topk_tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	4.9 GiB
Avg memory	4.5 GiB
CPU user	11.4s
CPU sys	1.1s
Peak spill	0 B

topk_tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	6.5s
CPU sys	0.6s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-05-22T07:11:11Z

#22450 (comment)

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    2.14 / 2.74 ±0.76 / 4.10 ms │        2.12 / 2.79 ±0.68 / 4.02 ms │     no change │
│ Q2    │ 10.66 / 11.36 ±0.68 / 12.23 ms │        2.81 / 3.61 ±0.87 / 4.72 ms │ +3.15x faster │
│ Q3    │ 31.77 / 32.15 ±0.43 / 32.83 ms │     31.71 / 31.92 ±0.16 / 32.18 ms │     no change │
│ Q4    │ 11.83 / 12.29 ±0.77 / 13.82 ms │        3.13 / 3.25 ±0.13 / 3.48 ms │ +3.78x faster │
│ Q5    │  9.94 / 10.14 ±0.18 / 10.46 ms │      9.95 / 10.02 ±0.05 / 10.09 ms │     no change │
│ Q6    │ 17.19 / 17.39 ±0.15 / 17.56 ms │     17.11 / 17.36 ±0.37 / 18.09 ms │     no change │
│ Q7    │ 37.07 / 38.08 ±1.17 / 40.08 ms │     37.00 / 37.41 ±0.37 / 38.07 ms │     no change │
│ Q8    │ 28.13 / 28.59 ±0.60 / 29.71 ms │        6.86 / 7.16 ±0.42 / 7.98 ms │ +3.99x faster │
│ Q9    │ 35.34 / 36.86 ±1.54 / 38.77 ms │        8.36 / 8.50 ±0.08 / 8.60 ms │ +4.34x faster │
│ Q10   │ 54.13 / 55.29 ±1.83 / 58.93 ms │     12.77 / 13.00 ±0.45 / 13.89 ms │ +4.25x faster │
│ Q11   │    3.75 / 3.91 ±0.11 / 4.05 ms │        3.82 / 4.08 ±0.31 / 4.68 ms │     no change │
└───────┴────────────────────────────────┴────────────────────────────────────┴───────────────┘

cc @alamb @adriangb @Dandandan
This is matching my local test, also sort_pushdown_inexact will improve a lot!

Dandandan · 2026-05-22T07:11:22Z

Nice, impressive 🚀🚀🚀

github-actions Bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels May 22, 2026

zhuqi-lucas changed the title ~~feat(parquet): apply TopK threshold to row-group statistics mid-scan~~ feat(parquet): runtime row-group early stop via TopK dynamic filter May 22, 2026

zhuqi-lucas added 2 commits May 22, 2026 11:57

zhuqi-lucas marked this pull request as ready for review May 22, 2026 05:46

Copilot AI review requested due to automatic review settings May 22, 2026 05:46

Copilot started reviewing on behalf of zhuqi-lucas May 22, 2026 05:46 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread datafusion/datasource-parquet/src/opener/mod.rs

Comment thread datafusion/datasource-parquet/src/push_decoder.rs

zhuqi-lucas and others added 2 commits May 22, 2026 13:58

Merge branch 'main' into feat/topk-rg-level-dynamic-pruning

f3adbeb

zhuqi-lucas requested review from adriangb and alamb May 22, 2026 06:24

Conversation

zhuqi-lucas commented May 22, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Observability

Benchmarks (benchmarks/sort_pushdown_inexact, 5 iterations)

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Dandandan commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Benchmarks (`benchmarks/sort_pushdown_inexact`, 5 iterations)

zhuqi-lucas commented May 22, 2026 •

edited

Loading