Skip to content

feat(parquet): runtime row-group early stop via TopK dynamic filter#22450

Open
zhuqi-lucas wants to merge 5 commits into
apache:mainfrom
zhuqi-lucas:feat/topk-rg-level-dynamic-pruning
Open

feat(parquet): runtime row-group early stop via TopK dynamic filter#22450
zhuqi-lucas wants to merge 5 commits into
apache:mainfrom
zhuqi-lucas:feat/topk-rg-level-dynamic-pruning

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #22407.

Rationale for this change

DataFusion already prunes parquet at three granularities — file
(EarlyStoppingStream + FilePruner), row group at scan-startup
(PruningPredicateRowGroupAccessPlanFilter), and row inside an
open RG
(RowFilter).

There's a gap in the middle: once Layer 1 (RG-static) picks the row
groups at file open, that decision is frozen because the dynamic
filter is still lit(true) then. As TopK tightens its threshold at
runtime, subsequent RGs in the already-opened file keep getting decoded
even when their stats already prove they can't beat the threshold. This
is the dominant cost for ORDER BY ... LIMIT queries on multi-RG files
where file-level pruning can't help (single large file, or scrambled-RG
multi-file).

See the issue for a full architectural diagram and a concrete trace
showing where the wasted I/O / decompression / decode lives.

What changes are included in this PR?

Two coordinated pieces that close the gap:

  1. RowGroupPruner (in datafusion/datasource-parquet/src/push_decoder.rs)
    mirrors FilePruner's pattern at row-group granularity. Tracks
    snapshot_generation(&predicate) so the cached PruningPredicate
    is rebuilt only when the dynamic filter has actually moved, then
    evaluates against the next pending decoder run's row-group stats
    via the existing RowGroupPruningStatistics adapter. Errors fall
    back to "don't prune" — a flaky pruning path never silently drops
    data.

  2. Per-row-group decoder splitting when the predicate is dynamic.
    ParquetAccessPlan::split_runs previously coalesced consecutive
    same-fully_matched RGs into a single run. For ORDER BY + LIMIT
    the initial dynamic filter is lit(true), so the static
    fully-matched analysis marks nothing and split_runs collapsed
    every RG into one run — leaving no inter-run hook. A new
    force_per_row_group flag (set by is_dynamic_physical_expr)
    disables coalescing for dynamic predicates only, so static
    WHERE queries pay nothing.

PendingDecoderRun wraps each queued decoder with its row group
indices. PushDecoderStreamState::transition consults the pruner at
every run boundary and skips runs whose row groups are proved
unwinnable.

Observability

  • New Count metric row_groups_pruned_dynamic_filter on
    ParquetFileMetrics surfaces the runtime saving.
  • New dynamic_rg_pruning=eligible marker on ParquetSource's
    EXPLAIN (fmt_extra Default + Verbose) signals plan-time
    eligibility. Eligible rather than true because the static
    plan can't predict the runtime outcome.

Benchmarks (benchmarks/sort_pushdown_inexact, 5 iterations)

Query main this PR Δ
Q1 ORDER BY l_orderkey DESC LIMIT 100 6.99 ms 3.80 ms −46%
Q2 ORDER BY l_orderkey DESC LIMIT 1000 3.29 ms 1.33 ms −60%
Q3 SELECT * ... DESC LIMIT 100 11.17 ms 9.91 ms −11%
Q4 SELECT * ... DESC LIMIT 1000 9.28 ms 7.95 ms −14%

Narrow-projection queries gain the most — their per-RG cost is
dominated by metadata + sort-column read, which this PR eliminates
for unwinnable RGs. Wide-projection queries gain less because the
kept RG's all-column decode dominates total time, but still see
meaningful savings.

Are these changes tested?

Yes. Three layers:

  • 6 unit tests:
    • 3 in push_decoder.rs::tests: RowGroupPruner basic pruning,
      generation-tracked dynamic-filter updates, fallback when the
      predicate has no analyzable bounds.
    • 3 in source.rs::tests: dynamic_rg_pruning=eligible marker
      present on dynamic predicate, absent on static predicate, absent
      when there is no predicate at all.
  • 2 integration tests in
    datafusion/core/tests/parquet/dynamic_row_group_pruning.rs:
    asserts row_groups_pruned_dynamic_filter >= 1 end-to-end on a
    5-RG ORDER BY DESC LIMIT 5 scan, and asserts the metric stays at
    0 when there is no TopK (no spurious firing).
  • New SLT
    datafusion/sqllogictest/test_files/dynamic_row_group_pruning.slt:
    asserts both EXPLAIN surfaces — plain EXPLAIN shows
    dynamic_rg_pruning=eligible, and EXPLAIN ANALYZE pins
    row_groups_pruned_dynamic_filter=4 (five RGs, four pruned at
    runtime).

129 parquet unit + 204 parquet integration + SLT all pass.
cargo clippy --all-targets --all-features -- -D warnings clean.

Are there any user-facing changes?

Two visible additions, both opt-in via existing dynamic-filter
infrastructure:

  • New row_groups_pruned_dynamic_filter counter visible in
    EXPLAIN ANALYZE for queries whose plan carries a
    DynamicFilterPhysicalExpr (today: only TopK with
    enable_topk_dynamic_filter_pushdown=true, which is the default).
  • New dynamic_rg_pruning=eligible marker visible in EXPLAIN
    output for the same queries.

No config changes, no API breakage, no behavior change for queries
without a dynamic predicate.

Closes apache#22407.

## What

Adds runtime row-group pruning between push-decoder runs, driven by the
dynamic predicate a TopK `SortExec` pushes down via
`DynamicFilterPhysicalExpr`. As the heap fills, the threshold tightens,
and subsequent row groups whose statistics prove they cannot contribute
are skipped without ever invoking their decoder — zero IO, zero decode.

## Why

DataFusion already prunes parquet at three granularities — file
(`EarlyStoppingStream`), row group at scan-startup (`PruningPredicate`),
and row (`RowFilter`). There is a gap: once `Layer 1` selects a file's
row groups, that decision is **frozen** at scan startup, when the
dynamic filter is still `lit(true)`. As `TopK` tightens at runtime,
subsequent RGs in the already-opened file keep being decoded even when
stats prove they can't beat the threshold. This is the dominant cost
for `ORDER BY ... LIMIT` queries on multi-RG files. See apache#22407 for the
full architectural trace.

## How

Two coordinated pieces:

1. **`RowGroupPruner`** (in `push_decoder.rs`). Mirrors `FilePruner`'s
   pattern at row-group granularity: tracks `snapshot_generation` so
   the cached `PruningPredicate` is rebuilt only when the dynamic
   filter has actually moved; evaluates against the next pending run's
   row-group stats via the existing `RowGroupPruningStatistics`
   adapter from `row_group_filter.rs`. Errors fall back to "don't
   prune" — a flaky pruning path never silently drops data.

2. **Per-RG decoder splitting when the predicate is dynamic**.
   `RowGroupAccessPlan::split_runs` previously coalesced consecutive
   same-`fully_matched` RGs into a single run. For ORDER BY + LIMIT
   the initial dynamic filter is `lit(true)`, the static fully-matched
   analysis marks nothing, and `split_runs` collapsed every RG into
   one run — leaving no inter-run hook for runtime pruning. A new
   `force_per_row_group` flag (set by `is_dynamic_physical_expr`)
   disables coalescing for dynamic predicates only, so static-WHERE
   queries pay nothing.

Plumbing: `PendingDecoderRun` wraps each queued decoder with its row
group indices. `PushDecoderStreamState::transition` consults the
pruner at every run boundary and skips runs whose row groups are
proved unwinnable.

## Observability

- New `Count` metric `row_groups_pruned_dynamic_filter` on
  `ParquetFileMetrics` surfaces the runtime saving.
- New `dynamic_rg_pruning=eligible` marker on `ParquetSource`'s
  `EXPLAIN` (`fmt_extra` Default + Verbose) signals plan-time
  eligibility — *eligible* rather than *true* because the static plan
  can't predict the runtime outcome.

## Benchmarks (`benchmarks/sort_pushdown_inexact`, 5 iters)

| Query | main | this PR | Δ |
|---|---|---|---|
| Q1 `ORDER BY l_orderkey DESC LIMIT 100`  | 6.99 ms | 3.80 ms | **−46%** |
| Q2 `ORDER BY l_orderkey DESC LIMIT 1000` | 3.29 ms | 1.33 ms | **−60%** |
| Q3 `SELECT * ... DESC LIMIT 100`         | 11.17 ms | 9.91 ms | −11% |
| Q4 `SELECT * ... DESC LIMIT 1000`        | 9.28 ms | 7.95 ms | −14% |

Narrow-projection queries gain the most — their per-RG cost is
dominated by metadata + sort-column read, which this PR eliminates for
unwinnable RGs. Wide-projection queries gain less because the *kept*
RG's all-column decode dominates total time, but still see meaningful
savings.

## Test coverage

- 6 new unit tests: 3 on `RowGroupPruner` (basic pruning,
  generation-tracked dynamic updates, fallback when predicate has no
  analyzable bounds) + 3 on `fmt_extra` marker (present on dynamic
  predicate, absent on static, absent on no-predicate).
- 2 new integration tests in
  `datafusion/core/tests/parquet/dynamic_row_group_pruning.rs`:
  asserts `row_groups_pruned_dynamic_filter >= 1` end-to-end on a
  5-RG TopK query, and asserts the metric stays at 0 when no TopK is
  present (no spurious firing).
- New SLT
  `datafusion/sqllogictest/test_files/dynamic_row_group_pruning.slt`
  asserts both `EXPLAIN` surfaces: plain EXPLAIN shows
  `dynamic_rg_pruning=eligible`, and EXPLAIN ANALYZE pins
  `row_groups_pruned_dynamic_filter=4` (five RGs, four pruned).

129 parquet unit + 204 parquet integration + SLT all pass.
`cargo clippy -D warnings` clean.
@github-actions github-actions Bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels May 22, 2026
@zhuqi-lucas zhuqi-lucas changed the title feat(parquet): apply TopK threshold to row-group statistics mid-scan feat(parquet): runtime row-group early stop via TopK dynamic filter May 22, 2026
Two CI failures on PR apache#22450:

1. **cargo doc** — broken intra-doc link in
   `ParquetFileMetrics::row_groups_pruned_dynamic_filter`. Switch from
   `[\`row_groups_pruned_statistics\`]` to `[\`Self::row_groups_pruned_statistics\`]`
   so rustdoc can resolve it.

2. **sqllogictest substrait round-trip** — adding
   `dynamic_rg_pruning=eligible` to ParquetSource's `fmt_extra` output
   shifted every `EXPLAIN` line that already showed a `DynamicFilter`
   predicate. Add the marker to 13 SLT expectations:

   - clickbench, explain_analyze, limit, limit_pruning,
     dynamic_filter_pushdown_config, preserve_file_partitioning,
     projection_pushdown, push_down_filter_parquet,
     push_down_filter_regression, repartition_subset_satisfaction,
     sort_pushdown, statistics_registry, topk
   - 134 marker insertions total, all on `DataSourceExec:` lines whose
     predicate contains `DynamicFilter [`.

   Two summary-level analyze tests also need the new
   `row_groups_pruned_dynamic_filter=0` counter in their metrics block
   (`limit_pruning.slt`, `dynamic_filter_pushdown_config.slt`).
   Dev-level analyze output elides zero-valued counters so the other
   files don't need it.

No behavior change beyond what was already in the previous commit.
CI runs `cargo doc --document-private-items` which catches links on
private items (the previous fix only covered public items). The
`row_groups_pruned_dynamic` field's doc comment referenced
`[\`row_group_pruner\`]` — same-struct field, needs `Self::` to resolve.
@Dandandan
Copy link
Copy Markdown
Contributor

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-274-nhjxx 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-275-nknmm 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-276-gkpr8 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 37.91 / 39.47 ±2.07 / 43.42 ms │     37.82 / 38.74 ±0.94 / 40.41 ms │ no change │
│ QQuery 2  │ 19.53 / 19.69 ±0.21 / 20.10 ms │     19.91 / 20.59 ±0.57 / 21.21 ms │ no change │
│ QQuery 3  │ 34.33 / 35.15 ±0.46 / 35.69 ms │     32.22 / 34.22 ±1.50 / 35.76 ms │ no change │
│ QQuery 4  │ 16.91 / 17.09 ±0.17 / 17.40 ms │     17.03 / 17.66 ±0.67 / 18.79 ms │ no change │
│ QQuery 5  │ 41.39 / 41.63 ±0.22 / 42.01 ms │     39.52 / 40.94 ±0.75 / 41.54 ms │ no change │
│ QQuery 6  │ 15.93 / 15.99 ±0.06 / 16.09 ms │     15.90 / 16.11 ±0.16 / 16.33 ms │ no change │
│ QQuery 7  │ 46.71 / 49.37 ±3.17 / 55.06 ms │     45.70 / 47.23 ±1.55 / 49.93 ms │ no change │
│ QQuery 8  │ 43.93 / 44.87 ±0.82 / 46.09 ms │     44.03 / 44.42 ±0.37 / 45.10 ms │ no change │
│ QQuery 9  │ 48.69 / 50.08 ±1.04 / 51.92 ms │     48.60 / 49.70 ±0.91 / 51.25 ms │ no change │
│ QQuery 10 │ 63.20 / 63.42 ±0.21 / 63.74 ms │     62.93 / 63.36 ±0.39 / 64.02 ms │ no change │
│ QQuery 11 │ 13.16 / 13.34 ±0.16 / 13.64 ms │     13.04 / 13.27 ±0.26 / 13.77 ms │ no change │
│ QQuery 12 │ 23.70 / 24.54 ±0.93 / 26.30 ms │     23.40 / 24.02 ±0.42 / 24.44 ms │ no change │
│ QQuery 13 │ 33.54 / 35.52 ±1.26 / 37.06 ms │     33.30 / 35.22 ±1.09 / 36.63 ms │ no change │
│ QQuery 14 │ 24.96 / 25.10 ±0.09 / 25.20 ms │     24.90 / 25.37 ±0.64 / 26.62 ms │ no change │
│ QQuery 15 │ 30.72 / 30.88 ±0.08 / 30.95 ms │     30.34 / 30.93 ±0.49 / 31.80 ms │ no change │
│ QQuery 16 │ 14.44 / 14.65 ±0.16 / 14.84 ms │     14.67 / 14.84 ±0.24 / 15.30 ms │ no change │
│ QQuery 17 │ 72.04 / 73.15 ±1.03 / 74.89 ms │     74.86 / 75.90 ±0.62 / 76.76 ms │ no change │
│ QQuery 18 │ 61.21 / 62.59 ±1.05 / 63.75 ms │     62.12 / 63.07 ±0.64 / 64.05 ms │ no change │
│ QQuery 19 │ 33.14 / 33.64 ±0.83 / 35.29 ms │     33.47 / 33.73 ±0.34 / 34.40 ms │ no change │
│ QQuery 20 │ 36.90 / 37.57 ±0.77 / 38.86 ms │     37.22 / 37.47 ±0.24 / 37.89 ms │ no change │
│ QQuery 21 │ 56.14 / 57.72 ±1.20 / 59.48 ms │     53.82 / 55.70 ±1.59 / 58.18 ms │ no change │
│ QQuery 22 │ 23.10 / 23.81 ±0.50 / 24.61 ms │     23.31 / 23.99 ±0.93 / 25.83 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 809.27ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 806.50ms │
│ Average Time (HEAD)                               │  36.78ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │  36.66ms │
│ Queries Faster                                    │        0 │
│ Queries Slower                                    │        0 │
│ Queries with No Change                            │       22 │
│ Queries with Failure                              │        0 │
└───────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 5.1 GiB
CPU user 29.4s
CPU sys 1.9s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 5.0 GiB
CPU user 29.6s
CPU sys 1.8s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sort_pushdown_inexact

@zhuqi-lucas zhuqi-lucas marked this pull request as ready for review May 22, 2026 05:46
Copilot AI review requested due to automatic review settings May 22, 2026 05:46
@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │           5.69 / 6.30 ±0.88 / 8.05 ms │           5.78 / 6.31 ±0.95 / 8.21 ms │ no change │
│ QQuery 2  │        80.75 / 81.16 ±0.31 / 81.58 ms │        80.16 / 80.81 ±0.40 / 81.25 ms │ no change │
│ QQuery 3  │        29.38 / 29.57 ±0.13 / 29.71 ms │        28.91 / 29.15 ±0.16 / 29.38 ms │ no change │
│ QQuery 4  │     512.73 / 515.37 ±2.82 / 520.75 ms │     509.87 / 514.41 ±2.49 / 516.87 ms │ no change │
│ QQuery 5  │        50.96 / 51.70 ±0.61 / 52.66 ms │        50.75 / 51.13 ±0.44 / 51.94 ms │ no change │
│ QQuery 6  │        35.28 / 35.77 ±0.38 / 36.31 ms │        35.30 / 35.86 ±0.37 / 36.34 ms │ no change │
│ QQuery 7  │     109.30 / 110.73 ±1.72 / 112.86 ms │     108.59 / 109.56 ±0.94 / 110.99 ms │ no change │
│ QQuery 8  │        36.33 / 36.65 ±0.31 / 37.03 ms │        36.32 / 37.02 ±0.53 / 37.95 ms │ no change │
│ QQuery 9  │        53.25 / 55.51 ±2.05 / 58.81 ms │        53.15 / 54.23 ±0.72 / 54.96 ms │ no change │
│ QQuery 10 │        80.67 / 82.08 ±1.67 / 85.32 ms │        81.02 / 81.34 ±0.21 / 81.55 ms │ no change │
│ QQuery 11 │     314.18 / 317.53 ±4.18 / 325.76 ms │     310.62 / 316.44 ±4.99 / 322.48 ms │ no change │
│ QQuery 12 │        28.43 / 28.63 ±0.20 / 28.94 ms │        28.49 / 28.69 ±0.17 / 28.94 ms │ no change │
│ QQuery 13 │     125.87 / 126.80 ±1.06 / 128.82 ms │     126.23 / 126.71 ±0.54 / 127.68 ms │ no change │
│ QQuery 14 │     502.60 / 505.33 ±1.50 / 506.71 ms │     502.81 / 505.10 ±1.82 / 507.78 ms │ no change │
│ QQuery 15 │        60.38 / 61.69 ±1.71 / 65.03 ms │        60.49 / 61.18 ±0.66 / 62.17 ms │ no change │
│ QQuery 16 │           6.30 / 6.52 ±0.26 / 7.03 ms │           6.34 / 6.51 ±0.16 / 6.81 ms │ no change │
│ QQuery 17 │        80.24 / 81.16 ±0.67 / 82.22 ms │        80.30 / 80.98 ±0.58 / 81.73 ms │ no change │
│ QQuery 18 │     152.37 / 153.04 ±0.40 / 153.48 ms │     151.31 / 152.10 ±0.68 / 153.13 ms │ no change │
│ QQuery 19 │        40.79 / 41.20 ±0.25 / 41.56 ms │        41.17 / 41.46 ±0.21 / 41.79 ms │ no change │
│ QQuery 20 │        34.92 / 35.59 ±0.65 / 36.69 ms │        35.35 / 35.91 ±0.30 / 36.17 ms │ no change │
│ QQuery 21 │        16.69 / 16.98 ±0.24 / 17.33 ms │        16.78 / 17.01 ±0.23 / 17.45 ms │ no change │
│ QQuery 22 │        61.65 / 62.39 ±0.46 / 63.01 ms │        61.77 / 62.59 ±1.17 / 64.87 ms │ no change │
│ QQuery 23 │     480.28 / 482.80 ±1.89 / 485.52 ms │     479.56 / 482.95 ±3.05 / 487.66 ms │ no change │
│ QQuery 24 │     236.03 / 239.91 ±6.29 / 252.43 ms │     233.29 / 235.80 ±2.41 / 239.92 ms │ no change │
│ QQuery 25 │     114.42 / 114.91 ±0.67 / 116.14 ms │     112.30 / 114.90 ±1.52 / 116.61 ms │ no change │
│ QQuery 26 │        70.92 / 71.14 ±0.34 / 71.82 ms │        69.91 / 70.46 ±0.30 / 70.78 ms │ no change │
│ QQuery 27 │           6.42 / 6.56 ±0.16 / 6.87 ms │           6.47 / 6.64 ±0.22 / 7.08 ms │ no change │
│ QQuery 28 │        57.25 / 60.78 ±1.85 / 62.74 ms │        57.93 / 61.09 ±1.61 / 62.34 ms │ no change │
│ QQuery 29 │      98.46 / 100.38 ±2.61 / 105.53 ms │      98.98 / 101.66 ±3.64 / 108.85 ms │ no change │
│ QQuery 30 │        30.12 / 30.48 ±0.31 / 31.00 ms │        29.94 / 30.30 ±0.30 / 30.83 ms │ no change │
│ QQuery 31 │     111.59 / 113.79 ±2.44 / 118.38 ms │     111.55 / 112.71 ±1.71 / 116.05 ms │ no change │
│ QQuery 32 │        20.35 / 20.93 ±0.34 / 21.38 ms │        20.17 / 20.46 ±0.27 / 20.79 ms │ no change │
│ QQuery 33 │        38.68 / 39.14 ±0.35 / 39.58 ms │        38.31 / 38.57 ±0.20 / 38.80 ms │ no change │
│ QQuery 34 │           9.29 / 9.58 ±0.29 / 9.98 ms │          9.20 / 9.57 ±0.36 / 10.20 ms │ no change │
│ QQuery 35 │        80.49 / 81.05 ±0.48 / 81.93 ms │        80.76 / 81.51 ±0.49 / 82.23 ms │ no change │
│ QQuery 36 │           5.75 / 5.91 ±0.17 / 6.25 ms │           5.89 / 6.01 ±0.15 / 6.30 ms │ no change │
│ QQuery 37 │           6.78 / 6.95 ±0.11 / 7.06 ms │           6.72 / 6.94 ±0.27 / 7.42 ms │ no change │
│ QQuery 38 │        68.29 / 69.85 ±1.23 / 71.85 ms │        68.57 / 69.09 ±0.38 / 69.67 ms │ no change │
│ QQuery 39 │        98.14 / 98.47 ±0.32 / 99.05 ms │        97.99 / 98.65 ±0.51 / 99.53 ms │ no change │
│ QQuery 40 │        22.81 / 23.33 ±0.79 / 24.91 ms │        23.09 / 23.27 ±0.15 / 23.51 ms │ no change │
│ QQuery 41 │        11.03 / 11.79 ±1.11 / 13.97 ms │        11.24 / 11.71 ±0.44 / 12.29 ms │ no change │
│ QQuery 42 │        23.77 / 24.22 ±0.29 / 24.69 ms │        24.09 / 24.42 ±0.37 / 25.05 ms │ no change │
│ QQuery 43 │           4.64 / 4.76 ±0.17 / 5.09 ms │           4.80 / 4.91 ±0.17 / 5.24 ms │ no change │
│ QQuery 44 │        10.51 / 10.58 ±0.07 / 10.71 ms │        10.60 / 10.85 ±0.17 / 11.09 ms │ no change │
│ QQuery 45 │        39.70 / 40.69 ±0.70 / 41.49 ms │        40.41 / 40.86 ±0.33 / 41.33 ms │ no change │
│ QQuery 46 │        12.83 / 13.15 ±0.27 / 13.55 ms │        12.69 / 12.87 ±0.16 / 13.12 ms │ no change │
│ QQuery 47 │     229.18 / 232.49 ±2.73 / 236.09 ms │     228.68 / 231.85 ±1.91 / 233.47 ms │ no change │
│ QQuery 48 │     102.74 / 103.73 ±0.80 / 104.88 ms │     103.30 / 103.95 ±0.91 / 105.69 ms │ no change │
│ QQuery 49 │        78.52 / 79.33 ±0.62 / 80.41 ms │        78.96 / 79.36 ±0.45 / 80.18 ms │ no change │
│ QQuery 50 │        59.63 / 60.33 ±0.37 / 60.68 ms │        59.40 / 59.73 ±0.22 / 60.07 ms │ no change │
│ QQuery 51 │        92.53 / 95.36 ±1.81 / 97.37 ms │       92.09 / 95.60 ±4.02 / 102.85 ms │ no change │
│ QQuery 52 │        23.93 / 24.34 ±0.37 / 25.02 ms │        23.81 / 24.28 ±0.38 / 24.74 ms │ no change │
│ QQuery 53 │        29.46 / 29.71 ±0.16 / 29.96 ms │        29.04 / 29.30 ±0.20 / 29.57 ms │ no change │
│ QQuery 54 │        53.72 / 54.35 ±0.36 / 54.67 ms │        54.44 / 55.05 ±0.44 / 55.75 ms │ no change │
│ QQuery 55 │        23.52 / 24.34 ±1.07 / 26.44 ms │        23.21 / 23.56 ±0.24 / 23.91 ms │ no change │
│ QQuery 56 │        39.01 / 39.29 ±0.26 / 39.78 ms │        38.43 / 38.84 ±0.28 / 39.20 ms │ no change │
│ QQuery 57 │     178.37 / 180.09 ±1.83 / 183.56 ms │     175.76 / 176.93 ±1.12 / 178.61 ms │ no change │
│ QQuery 58 │     118.44 / 118.92 ±0.37 / 119.44 ms │     117.20 / 117.99 ±0.54 / 118.58 ms │ no change │
│ QQuery 59 │     117.61 / 119.84 ±2.42 / 123.96 ms │     117.87 / 118.49 ±0.72 / 119.91 ms │ no change │
│ QQuery 60 │        39.05 / 39.84 ±0.52 / 40.49 ms │        38.74 / 39.22 ±0.42 / 39.82 ms │ no change │
│ QQuery 61 │        12.52 / 12.63 ±0.10 / 12.81 ms │        12.61 / 12.79 ±0.24 / 13.25 ms │ no change │
│ QQuery 62 │        46.89 / 47.53 ±0.38 / 47.98 ms │        46.32 / 46.78 ±0.48 / 47.68 ms │ no change │
│ QQuery 63 │        29.70 / 31.05 ±2.04 / 35.10 ms │        29.42 / 29.66 ±0.25 / 30.09 ms │ no change │
│ QQuery 64 │     462.94 / 468.99 ±7.22 / 482.49 ms │     458.19 / 462.97 ±5.14 / 471.10 ms │ no change │
│ QQuery 65 │     146.94 / 150.20 ±2.58 / 152.60 ms │     149.10 / 151.28 ±1.81 / 153.79 ms │ no change │
│ QQuery 66 │        78.92 / 81.62 ±3.98 / 89.53 ms │        78.42 / 80.72 ±2.20 / 84.79 ms │ no change │
│ QQuery 67 │     245.43 / 251.17 ±5.20 / 259.23 ms │     248.87 / 251.06 ±2.35 / 255.17 ms │ no change │
│ QQuery 68 │        12.91 / 13.11 ±0.22 / 13.53 ms │        13.06 / 13.23 ±0.15 / 13.51 ms │ no change │
│ QQuery 69 │        76.13 / 79.82 ±4.99 / 89.27 ms │        76.72 / 77.40 ±0.76 / 78.84 ms │ no change │
│ QQuery 70 │     106.87 / 110.44 ±3.00 / 115.47 ms │     105.01 / 109.92 ±6.93 / 123.68 ms │ no change │
│ QQuery 71 │        35.58 / 35.96 ±0.27 / 36.41 ms │        35.37 / 36.07 ±0.57 / 36.94 ms │ no change │
│ QQuery 72 │ 2132.94 / 2183.71 ±42.17 / 2236.73 ms │ 2111.52 / 2158.63 ±35.37 / 2214.83 ms │ no change │
│ QQuery 73 │           9.03 / 9.24 ±0.22 / 9.65 ms │           9.11 / 9.30 ±0.17 / 9.63 ms │ no change │
│ QQuery 74 │     177.63 / 180.03 ±2.51 / 183.70 ms │     175.98 / 181.10 ±5.65 / 191.23 ms │ no change │
│ QQuery 75 │     145.51 / 146.86 ±1.25 / 149.13 ms │     146.38 / 147.93 ±1.47 / 150.33 ms │ no change │
│ QQuery 76 │        35.58 / 35.88 ±0.29 / 36.38 ms │        35.34 / 35.87 ±0.46 / 36.52 ms │ no change │
│ QQuery 77 │        59.95 / 60.42 ±0.49 / 61.35 ms │        60.26 / 60.70 ±0.42 / 61.43 ms │ no change │
│ QQuery 78 │     188.05 / 191.81 ±4.01 / 199.22 ms │     187.70 / 191.78 ±3.19 / 195.27 ms │ no change │
│ QQuery 79 │        67.60 / 68.26 ±0.66 / 69.06 ms │        66.72 / 67.34 ±0.46 / 68.11 ms │ no change │
│ QQuery 80 │     100.87 / 101.21 ±0.28 / 101.63 ms │     100.03 / 101.28 ±1.64 / 104.53 ms │ no change │
│ QQuery 81 │        24.09 / 24.30 ±0.12 / 24.43 ms │        24.27 / 24.51 ±0.17 / 24.74 ms │ no change │
│ QQuery 82 │        16.37 / 16.55 ±0.19 / 16.91 ms │        16.57 / 16.72 ±0.17 / 17.03 ms │ no change │
│ QQuery 83 │        36.90 / 38.82 ±2.29 / 43.14 ms │        37.15 / 37.48 ±0.40 / 38.16 ms │ no change │
│ QQuery 84 │        43.61 / 43.96 ±0.34 / 44.57 ms │        44.04 / 45.58 ±1.86 / 49.12 ms │ no change │
│ QQuery 85 │     135.59 / 136.80 ±1.39 / 139.37 ms │     136.81 / 137.88 ±0.89 / 139.18 ms │ no change │
│ QQuery 86 │        24.79 / 25.64 ±0.98 / 27.53 ms │        25.15 / 25.53 ±0.22 / 25.76 ms │ no change │
│ QQuery 87 │        68.82 / 70.07 ±0.82 / 71.02 ms │        69.29 / 70.06 ±0.61 / 71.17 ms │ no change │
│ QQuery 88 │        61.33 / 61.95 ±0.45 / 62.63 ms │        62.63 / 63.33 ±0.57 / 64.00 ms │ no change │
│ QQuery 89 │        35.25 / 35.73 ±0.26 / 36.03 ms │        35.55 / 36.03 ±0.26 / 36.27 ms │ no change │
│ QQuery 90 │        16.65 / 16.84 ±0.17 / 17.15 ms │        16.98 / 17.23 ±0.20 / 17.48 ms │ no change │
│ QQuery 91 │        52.24 / 54.16 ±2.38 / 58.84 ms │        52.30 / 53.65 ±1.75 / 57.02 ms │ no change │
│ QQuery 92 │        29.88 / 30.31 ±0.50 / 31.18 ms │        29.84 / 30.43 ±0.40 / 30.97 ms │ no change │
│ QQuery 93 │        50.10 / 50.89 ±0.46 / 51.43 ms │        50.33 / 51.35 ±0.74 / 52.58 ms │ no change │
│ QQuery 94 │        37.79 / 38.44 ±0.65 / 39.69 ms │        37.56 / 38.59 ±0.57 / 39.14 ms │ no change │
│ QQuery 95 │        85.10 / 85.88 ±0.49 / 86.60 ms │        84.23 / 85.24 ±0.88 / 86.74 ms │ no change │
│ QQuery 96 │        24.11 / 24.27 ±0.22 / 24.69 ms │        24.29 / 24.50 ±0.19 / 24.83 ms │ no change │
│ QQuery 97 │        45.50 / 46.04 ±0.35 / 46.55 ms │        46.05 / 46.31 ±0.25 / 46.74 ms │ no change │
│ QQuery 98 │        41.76 / 42.66 ±0.54 / 43.28 ms │        42.44 / 43.15 ±0.62 / 43.97 ms │ no change │
│ QQuery 99 │        70.01 / 70.78 ±0.61 / 71.75 ms │        69.97 / 70.68 ±0.58 / 71.49 ms │ no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 10498.86ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 10448.87ms │
│ Average Time (HEAD)                               │   106.05ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   105.54ms │
│ Queries Faster                                    │          0 │
│ Queries Slower                                    │          0 │
│ Queries with No Change                            │         99 │
│ Queries with Failure                              │          0 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 55.0s
Peak memory 6.9 GiB
Avg memory 6.1 GiB
CPU user 234.8s
CPU sys 6.0s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 55.0s
Peak memory 7.0 GiB
Avg memory 6.3 GiB
CPU user 232.0s
CPU sys 5.8s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515459477-277-x4d6g 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.13 / 4.63 ±6.95 / 18.54 ms │          1.11 / 4.59 ±6.90 / 18.39 ms │     no change │
│ QQuery 1  │        12.84 / 12.94 ±0.08 / 13.04 ms │        12.42 / 12.73 ±0.18 / 12.95 ms │     no change │
│ QQuery 2  │        35.38 / 35.81 ±0.30 / 36.23 ms │        35.63 / 35.87 ±0.28 / 36.35 ms │     no change │
│ QQuery 3  │        30.33 / 30.94 ±0.92 / 32.76 ms │        30.38 / 30.89 ±0.60 / 32.06 ms │     no change │
│ QQuery 4  │     221.08 / 222.52 ±1.32 / 224.77 ms │     217.25 / 223.77 ±4.47 / 230.65 ms │     no change │
│ QQuery 5  │     271.34 / 273.02 ±1.66 / 276.10 ms │     267.27 / 271.69 ±2.28 / 273.41 ms │     no change │
│ QQuery 6  │           1.16 / 1.31 ±0.22 / 1.75 ms │           1.18 / 1.32 ±0.21 / 1.73 ms │     no change │
│ QQuery 7  │        13.82 / 14.01 ±0.16 / 14.24 ms │        14.29 / 14.44 ±0.14 / 14.67 ms │     no change │
│ QQuery 8  │     319.01 / 323.19 ±3.92 / 329.01 ms │     342.38 / 356.25 ±7.86 / 364.96 ms │  1.10x slower │
│ QQuery 9  │     446.98 / 451.24 ±3.38 / 455.61 ms │    463.18 / 475.35 ±10.89 / 489.79 ms │  1.05x slower │
│ QQuery 10 │        68.91 / 69.79 ±0.81 / 71.12 ms │        70.01 / 71.13 ±0.70 / 71.92 ms │     no change │
│ QQuery 11 │        80.17 / 81.70 ±1.04 / 83.41 ms │        80.46 / 81.98 ±1.59 / 84.88 ms │     no change │
│ QQuery 12 │     262.57 / 271.05 ±6.00 / 278.82 ms │     263.13 / 272.47 ±6.80 / 283.05 ms │     no change │
│ QQuery 13 │    360.50 / 369.16 ±10.13 / 388.45 ms │     357.78 / 364.59 ±3.90 / 369.19 ms │     no change │
│ QQuery 14 │     276.41 / 281.73 ±4.61 / 289.96 ms │     280.05 / 286.88 ±6.92 / 298.91 ms │     no change │
│ QQuery 15 │    264.25 / 276.30 ±11.55 / 292.24 ms │    294.81 / 307.60 ±14.15 / 333.58 ms │  1.11x slower │
│ QQuery 16 │    627.27 / 659.89 ±17.60 / 680.12 ms │    616.34 / 643.05 ±19.90 / 673.10 ms │     no change │
│ QQuery 17 │    624.54 / 639.67 ±10.36 / 653.36 ms │     602.95 / 613.57 ±9.74 / 631.66 ms │     no change │
│ QQuery 18 │ 1231.71 / 1264.30 ±32.09 / 1324.58 ms │ 1230.87 / 1249.03 ±13.77 / 1269.77 ms │     no change │
│ QQuery 19 │        29.95 / 33.49 ±3.93 / 38.81 ms │        27.88 / 34.23 ±7.63 / 44.83 ms │     no change │
│ QQuery 20 │     531.85 / 542.39 ±8.77 / 552.84 ms │     518.59 / 522.88 ±3.22 / 527.36 ms │     no change │
│ QQuery 21 │     590.04 / 600.69 ±8.53 / 615.62 ms │     589.86 / 595.16 ±4.54 / 603.56 ms │     no change │
│ QQuery 22 │  1047.46 / 1051.15 ±3.07 / 1054.60 ms │ 1085.28 / 1112.76 ±21.95 / 1146.31 ms │  1.06x slower │
│ QQuery 23 │ 3143.81 / 3215.59 ±61.28 / 3327.12 ms │ 3009.38 / 3115.16 ±74.14 / 3210.79 ms │     no change │
│ QQuery 24 │        43.61 / 45.10 ±1.49 / 47.69 ms │        41.73 / 49.54 ±6.86 / 58.93 ms │  1.10x slower │
│ QQuery 25 │     116.41 / 117.72 ±0.81 / 118.72 ms │     112.35 / 113.26 ±0.66 / 114.42 ms │     no change │
│ QQuery 26 │        44.00 / 45.32 ±1.39 / 47.91 ms │        41.64 / 42.41 ±0.68 / 43.37 ms │ +1.07x faster │
│ QQuery 27 │     668.27 / 677.75 ±9.71 / 696.11 ms │     666.49 / 674.74 ±7.19 / 686.27 ms │     no change │
│ QQuery 28 │ 3004.40 / 3024.76 ±13.42 / 3039.76 ms │ 3014.79 / 3056.61 ±36.42 / 3111.94 ms │     no change │
│ QQuery 29 │       40.04 / 51.77 ±14.67 / 76.55 ms │       41.49 / 49.56 ±15.74 / 81.03 ms │     no change │
│ QQuery 30 │     325.79 / 329.77 ±3.21 / 335.46 ms │    311.00 / 329.53 ±13.27 / 351.40 ms │     no change │
│ QQuery 31 │     296.89 / 313.26 ±9.84 / 323.06 ms │    278.41 / 291.30 ±11.42 / 310.81 ms │ +1.08x faster │
│ QQuery 32 │     934.13 / 949.51 ±9.91 / 963.93 ms │    914.90 / 929.16 ±15.07 / 958.29 ms │     no change │
│ QQuery 33 │ 1410.05 / 1512.63 ±79.05 / 1623.38 ms │ 1406.57 / 1430.30 ±12.88 / 1443.94 ms │ +1.06x faster │
│ QQuery 34 │ 1422.56 / 1450.90 ±17.81 / 1474.31 ms │ 1424.50 / 1486.88 ±73.50 / 1624.17 ms │     no change │
│ QQuery 35 │    272.88 / 291.31 ±26.72 / 344.13 ms │    276.79 / 319.68 ±35.69 / 379.77 ms │  1.10x slower │
│ QQuery 36 │        61.63 / 71.05 ±7.09 / 80.23 ms │      65.89 / 82.70 ±12.28 / 102.66 ms │  1.16x slower │
│ QQuery 37 │        34.65 / 35.40 ±0.76 / 36.80 ms │        36.38 / 40.19 ±5.70 / 51.48 ms │  1.14x slower │
│ QQuery 38 │        42.14 / 46.54 ±3.98 / 52.83 ms │        41.07 / 43.14 ±1.72 / 45.06 ms │ +1.08x faster │
│ QQuery 39 │     144.74 / 151.58 ±5.96 / 160.59 ms │     151.07 / 153.66 ±2.72 / 158.21 ms │     no change │
│ QQuery 40 │        13.68 / 16.08 ±3.85 / 23.74 ms │        15.04 / 15.47 ±0.36 / 15.96 ms │     no change │
│ QQuery 41 │        13.32 / 13.46 ±0.13 / 13.69 ms │        14.42 / 16.08 ±3.04 / 22.15 ms │  1.19x slower │
│ QQuery 42 │        12.88 / 14.68 ±3.42 / 21.51 ms │        14.07 / 14.26 ±0.12 / 14.40 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 19885.11ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 19835.88ms │
│ Average Time (HEAD)                               │   462.44ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   461.30ms │
│ Queries Faster                                    │          4 │
│ Queries Slower                                    │          9 │
│ Queries with No Change                            │         30 │
│ Queries with Failure                              │          0 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 100.0s
Peak memory 29.8 GiB
Avg memory 22.9 GiB
CPU user 1032.4s
CPU sys 67.0s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 30.6 GiB
Avg memory 23.2 GiB
CPU user 1033.3s
CPU sys 67.3s
Peak spill 0 B

File an issue against this benchmark runner

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds runtime row-group pruning for Parquet scans driven by TopK’s dynamic filter, closing the gap where row groups selected at file open couldn’t be re-pruned after the TopK threshold tightens during execution.

Changes:

  • Introduces a runtime RowGroupPruner that re-evaluates a dynamic predicate at decoder-run boundaries and skips row groups proven unreachable.
  • Forces per-row-group decoder splitting when the predicate is dynamic so the runtime pruner has a boundary at every RG.
  • Adds observability: dynamic_rg_pruning=eligible in EXPLAIN and a new metric row_groups_pruned_dynamic_filter in EXPLAIN ANALYZE, plus tests/SLTs updated accordingly.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
datafusion/datasource-parquet/src/push_decoder.rs Adds RowGroupPruner, tracks row-group indices per decoder run, and skips prunable runs at runtime.
datafusion/datasource-parquet/src/opener/mod.rs Forces per-RG runs for dynamic predicates; wires pending runs + runtime pruner into PushDecoderStreamState.
datafusion/datasource-parquet/src/access_plan.rs Extends split_runs with force_per_row_group to avoid coalescing runs for dynamic predicates.
datafusion/datasource-parquet/src/source.rs Adds dynamic_rg_pruning=eligible marker in fmt_extra and unit tests for the marker.
datafusion/datasource-parquet/src/row_group_filter.rs Exposes RowGroupPruningStatistics to reuse stats adapter for runtime pruning.
datafusion/datasource-parquet/src/metrics.rs Adds row_groups_pruned_dynamic_filter metric to ParquetFileMetrics.
datafusion/core/tests/parquet/mod.rs Adds helper to read row_groups_pruned_dynamic_filter from metrics.
datafusion/core/tests/parquet/dynamic_row_group_pruning.rs New integration tests validating metric fires for TopK and stays quiet otherwise.
datafusion/sqllogictest/test_files/dynamic_row_group_pruning.slt New SLT covering both EXPLAIN marker and EXPLAIN ANALYZE metric value.
datafusion/sqllogictest/test_files/topk.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/statistics_registry.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/sort_pushdown.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/repartition_subset_satisfaction.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/push_down_filter_regression.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/push_down_filter_parquet.slt Updates expected plans/metrics to include dynamic_rg_pruning=eligible and (where relevant) the new counter.
datafusion/sqllogictest/test_files/projection_pushdown.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/preserve_file_partitioning.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/limit.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/limit_pruning.slt Updates expected metrics to include row_groups_pruned_dynamic_filter=0 plus eligibility marker.
datafusion/sqllogictest/test_files/explain_analyze.slt Updates expected plans to include dynamic_rg_pruning=eligible.
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt Updates expected plans/metrics to include eligibility marker and row_groups_pruned_dynamic_filter=0 where applicable.
datafusion/sqllogictest/test_files/clickbench.slt Updates expected plans to include dynamic_rg_pruning=eligible.
Comments suppressed due to low confidence (1)

datafusion/datasource-parquet/src/access_plan.rs:458

  • split_runs computes row_group_needs_filter as !fully_matched without considering the needs_filter argument. When force_per_row_group=true and the scan has no row filter (needs_filter=false), this will still mark all runs as needs_filter=true, causing the opener to treat them as filtered runs (e.g. attempting to fetch row filters / applying predicate-cache settings) even though no row-level filter exists. row_group_needs_filter should be derived as needs_filter && !fully_matched so the run metadata stays consistent with the caller’s capabilities.
        for (idx, (access, fully_matched)) in
            row_groups.into_iter().zip(fully_matched).enumerate()
        {
            if !access.should_scan() {
                continue;
            }

            let row_group_needs_filter = !fully_matched;
            // Coalesce consecutive RGs into a run only when (a) they share
            // the same filter requirement and (b) we're not forcing per-RG
            // splitting for runtime pruning.
            let can_coalesce = !force_per_row_group;
            if can_coalesce
                && let Some(run) = runs
                    .last_mut()
                    .filter(|run| run.needs_filter == row_group_needs_filter)
            {
                run.access_plan.set(idx, access);
                if fully_matched {
                    run.access_plan.mark_fully_matched(idx);
                }
            } else {
                let mut run_plan = ParquetAccessPlan::new_none(num_row_groups);
                run_plan.set(idx, access);
                if fully_matched {
                    run_plan.mark_fully_matched(idx);
                }
                runs.push(RowGroupRun::new(row_group_needs_filter, run_plan));
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/datasource-parquet/src/opener/mod.rs
Comment thread datafusion/datasource-parquet/src/push_decoder.rs
@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃       Change ┃
┡━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1    │ FAIL │                               FAIL │ incomparable │
│ Q2    │ FAIL │                               FAIL │ incomparable │
│ Q3    │ FAIL │                               FAIL │ incomparable │
│ Q4    │ FAIL │                               FAIL │ incomparable │
└───────┴──────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                 ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                 │ 0.00ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 0.00ms │
│ Average Time (HEAD)                               │ 0.00ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │ 0.00ms │
│ Queries Faster                                    │      0 │
│ Queries Slower                                    │      0 │
│ Queries with No Change                            │      0 │
│ Queries with Failure                              │      4 │
└───────────────────────────────────────────────────┴────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 3.7 GiB
Avg memory 3.7 GiB
CPU user 0.2s
CPU sys 0.1s
Peak spill 0 B

sort_pushdown_inexact — branch

Metric Value
Wall time 5.0s
Peak memory 3.7 GiB
Avg memory 3.7 GiB
CPU user 0.1s
CPU sys 0.1s
Peak spill 0 B

File an issue against this benchmark runner

zhuqi-lucas and others added 2 commits May 22, 2026 13:58
Per Copilot review on apache#22450: `RowGroupPruner` was using a single
`predicate_creation_errors` counter for both predicate construction
(`build_pruning_predicate`) AND predicate evaluation
(`PruningPredicate::prune`) failures. The log message also said
"Ignoring error building..." when the failure was during evaluation.
This misattributed evaluation failures and made the metric semantics
inconsistent with the static row-group pruning path in
`RowGroupAccessPlanFilter::prune_by_statistics`, which already
separates the two.

`RowGroupPruner::new` now takes both counters:

- `predicate_creation_errors`: bumped on `build_pruning_predicate`
  failures. Wired to `prepared.predicate_creation_errors` from the
  opener — same field the static path uses.
- `predicate_evaluation_errors`: bumped on `PruningPredicate::prune`
  failures. Wired to `prepared.file_metrics.predicate_evaluation_errors`
  — same field the static `prune_by_statistics` path uses, so the two
  paths accumulate into a shared counter.

The error log message is updated to say "evaluating" so the metric and
the log agree.
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark sort_pushdown_inexact

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515622680-278-bcnbp 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (0828f1b) to a8f03fd (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃       Change ┃
┡━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1    │ FAIL │                               FAIL │ incomparable │
│ Q2    │ FAIL │                               FAIL │ incomparable │
│ Q3    │ FAIL │                               FAIL │ incomparable │
│ Q4    │ FAIL │                               FAIL │ incomparable │
└───────┴──────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                 ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                 │ 0.00ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 0.00ms │
│ Average Time (HEAD)                               │ 0.00ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │ 0.00ms │
│ Queries Faster                                    │      0 │
│ Queries Slower                                    │      0 │
│ Queries with No Change                            │      0 │
│ Queries with Failure                              │      4 │
└───────────────────────────────────────────────────┴────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 4.1 GiB
Avg memory 4.1 GiB
CPU user 0.1s
CPU sys 0.1s
Peak spill 0 B

sort_pushdown_inexact — branch

Metric Value
Wall time 5.0s
Peak memory 4.1 GiB
Avg memory 4.1 GiB
CPU user 0.1s
CPU sys 0.1s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas zhuqi-lucas requested review from adriangb and alamb May 22, 2026 06:24
@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

run benchmark topk_tpch

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515872406-280-dfq5d 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (0828f1b) to a8f03fd (merge-base) diff using: topk_tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark run_topk_tpch.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    2.14 / 2.74 ±0.76 / 4.10 ms │        2.12 / 2.79 ±0.68 / 4.02 ms │     no change │
│ Q2    │ 10.66 / 11.36 ±0.68 / 12.23 ms │        2.81 / 3.61 ±0.87 / 4.72 ms │ +3.15x faster │
│ Q3    │ 31.77 / 32.15 ±0.43 / 32.83 ms │     31.71 / 31.92 ±0.16 / 32.18 ms │     no change │
│ Q4    │ 11.83 / 12.29 ±0.77 / 13.82 ms │        3.13 / 3.25 ±0.13 / 3.48 ms │ +3.78x faster │
│ Q5    │  9.94 / 10.14 ±0.18 / 10.46 ms │      9.95 / 10.02 ±0.05 / 10.09 ms │     no change │
│ Q6    │ 17.19 / 17.39 ±0.15 / 17.56 ms │     17.11 / 17.36 ±0.37 / 18.09 ms │     no change │
│ Q7    │ 37.07 / 38.08 ±1.17 / 40.08 ms │     37.00 / 37.41 ±0.37 / 38.07 ms │     no change │
│ Q8    │ 28.13 / 28.59 ±0.60 / 29.71 ms │        6.86 / 7.16 ±0.42 / 7.98 ms │ +3.99x faster │
│ Q9    │ 35.34 / 36.86 ±1.54 / 38.77 ms │        8.36 / 8.50 ±0.08 / 8.60 ms │ +4.34x faster │
│ Q10   │ 54.13 / 55.29 ±1.83 / 58.93 ms │     12.77 / 13.00 ±0.45 / 13.89 ms │ +4.25x faster │
│ Q11   │    3.75 / 3.91 ±0.11 / 4.05 ms │        3.82 / 4.08 ±0.31 / 4.68 ms │     no change │
└───────┴────────────────────────────────┴────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 248.79ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 139.08ms │
│ Average Time (HEAD)                               │  22.62ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │  12.64ms │
│ Queries Faster                                    │        5 │
│ Queries Slower                                    │        0 │
│ Queries with No Change                            │        6 │
│ Queries with Failure                              │        0 │
└───────────────────────────────────────────────────┴──────────┘

Resource Usage

topk_tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 4.9 GiB
Avg memory 4.5 GiB
CPU user 11.4s
CPU sys 1.1s
Peak spill 0 B

topk_tpch — branch

Metric Value
Wall time 5.0s
Peak memory 4.4 GiB
Avg memory 4.4 GiB
CPU user 6.5s
CPU sys 0.6s
Peak spill 0 B

File an issue against this benchmark runner

@zhuqi-lucas
Copy link
Copy Markdown
Contributor Author

zhuqi-lucas commented May 22, 2026

#22450 (comment)

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ QueryHEAD ┃ feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q12.14 / 2.74 ±0.76 / 4.10 ms │        2.12 / 2.79 ±0.68 / 4.02 ms │     no change │
│ Q210.66 / 11.36 ±0.68 / 12.23 ms │        2.81 / 3.61 ±0.87 / 4.72 ms │ +3.15x faster │
│ Q331.77 / 32.15 ±0.43 / 32.83 ms │     31.71 / 31.92 ±0.16 / 32.18 ms │     no change │
│ Q411.83 / 12.29 ±0.77 / 13.82 ms │        3.13 / 3.25 ±0.13 / 3.48 ms │ +3.78x faster │
│ Q59.94 / 10.14 ±0.18 / 10.46 ms │      9.95 / 10.02 ±0.05 / 10.09 ms │     no change │
│ Q617.19 / 17.39 ±0.15 / 17.56 ms │     17.11 / 17.36 ±0.37 / 18.09 ms │     no change │
│ Q737.07 / 38.08 ±1.17 / 40.08 ms │     37.00 / 37.41 ±0.37 / 38.07 ms │     no change │
│ Q828.13 / 28.59 ±0.60 / 29.71 ms │        6.86 / 7.16 ±0.42 / 7.98 ms │ +3.99x faster │
│ Q935.34 / 36.86 ±1.54 / 38.77 ms │        8.36 / 8.50 ±0.08 / 8.60 ms │ +4.34x faster │
│ Q1054.13 / 55.29 ±1.83 / 58.93 ms │     12.77 / 13.00 ±0.45 / 13.89 ms │ +4.25x faster │
│ Q113.75 / 3.91 ±0.11 / 4.05 ms │        3.82 / 4.08 ±0.31 / 4.68 ms │     no change │
└───────┴────────────────────────────────┴────────────────────────────────────┴───────────────┘

cc @alamb @adriangb @Dandandan
This is matching my local test, also sort_pushdown_inexact will improve a lot!

@Dandandan
Copy link
Copy Markdown
Contributor

Nice, impressive 🚀🚀🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Runtime row-group early stop via TopK dynamic filter

4 participants