feat: add sort_pushdown_inexact benchmark for RG reorder by zhuqi-lucas · Pull Request #21674 · apache/datafusion

zhuqi-lucas · 2026-04-16T12:28:08Z

Which issue does this PR close?

Rationale for this change

The existing sort_pushdown benchmarks only cover the Exact path (sort elimination). The Inexact path — where TopK is preserved and row group reorder by statistics helps threshold convergence — had no benchmark to measure its impact.

What changes are included in this PR?

New benchmark: sort_pushdown_inexact with 4 DESC LIMIT queries (narrow/wide rows, small/large LIMIT)
Data generation: single large parquet file with shuffled row groups (simulates append-heavy workloads)
Enables pushdown_filters in sort_pushdown benchmarks so TopK's dynamic filter is pushed to the parquet reader for late materialization

How to run

./bench.sh data sort_pushdown_inexact
./bench.sh run sort_pushdown_inexact

Or on GKE: @alamb benchmark sort_pushdown_inexact

Are these changes tested?

Benchmark code only — validated locally.

Are there any user-facing changes?

No. New benchmark only.

Copilot

Pull request overview

Adds a new benchmark scenario to measure DataFusion’s sort-pushdown Inexact path behavior (TopK preserved) and the impact of row-group reorder by statistics, while also enabling Parquet filter pushdown so TopK’s dynamic filters can benefit from late materialization.

Changes:

Enable execution.parquet.pushdown_filters in the sort_pushdown benchmark runner to allow TopK dynamic filter pushdown to Parquet decoding.
Add sort_pushdown_inexact to bench.sh, including data generation (single Parquet file) and a dedicated run target.
Add 4 new SQL queries under benchmarks/queries/sort_pushdown_inexact/ covering narrow/wide and small/large LIMIT variants for ORDER BY ... DESC.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
benchmarks/src/sort_pushdown.rs	Turns on Parquet filter pushdown for sort_pushdown benchmark runs.
benchmarks/bench.sh	Wires in `sort_pushdown_inexact` data + run targets and generates a single-file dataset.
benchmarks/queries/sort_pushdown_inexact/q1.sql	DESC TopK (narrow) query for Inexact path coverage.
benchmarks/queries/sort_pushdown_inexact/q2.sql	DESC TopK (narrow) with larger LIMIT to stress threshold convergence.
benchmarks/queries/sort_pushdown_inexact/q3.sql	DESC TopK (wide `SELECT *`) to show late materialization benefits.
benchmarks/queries/sort_pushdown_inexact/q4.sql	Wide `SELECT *` with larger LIMIT for cumulative effects.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T12:36:35Z

+run_sort_pushdown_inexact() {
+    INEXACT_DIR="${DATA_DIR}/sort_pushdown_inexact"
+    RESULTS_FILE="${RESULTS_DIR}/sort_pushdown_inexact.json"
+    echo "Running sort pushdown Inexact benchmark (row group reorder by statistics)..."
+    debug_run $CARGO_COMMAND --bin dfbench -- sort-pushdown --sorted --iterations 5 --path "${INEXACT_DIR}" --queries-path "${SCRIPT_DIR}/queries/sort_pushdown_inexact" -o "${RESULTS_FILE}" ${QUERY_ARG} ${LATENCY_ARG}


run_sort_pushdown_inexact passes --sorted, which tells DataFusion the file is ordered by l_orderkey (via WITH ORDER). However, the generated shuffled.parquet is intentionally not globally sorted by l_orderkey (row groups are arranged by the bucket key). This benchmark therefore relies on the planner staying on an Inexact path (keeping TopK/Sort for correctness). Please document this assumption explicitly and/or add a guard to ensure the benchmark can’t silently become incorrect if future optimizations start treating the reversed scan as Exact and eliminate the ordering operator.

I'm not sure what the --sorted flag is supposed to do but worth checking @zhuqi-lucas

--sorted adds WITH ORDER (l_orderkey ASC) which is needed to trigger the reverse scan path in try_pushdown_sort — currently the only path where reorder_by_statistics is called.

After reviewing this more carefully, I realized we need two benchmark suites to cover the full optimization path:

sort_pushdown_inexact (with --sorted, DESC queries) — tests the reverse scan path where RG reorder is already supported

sort_pushdown_inexact_unsorted (without --sorted, ASC+DESC queries) — tests the Unsupported path where RG reorder will be supported in a follow-up PR (feat: reorder row groups by statistics during sort pushdown #21580)

Updated in the latest push. This way each follow-up PR can run its corresponding benchmark to show the improvement.

adriangb · 2026-04-16T12:41:30Z

+        let mut config = self.common.config()?;
+        // Enable filter pushdown so TopK's dynamic filter is pushed to the
+        // parquet reader for late materialization — only sort-column rows
+        // pass the filter, non-sort columns are skipped for filtered rows.
+        config.options_mut().execution.parquet.pushdown_filters = true;


Does this enable it for all benchmarks? Generally we control this on a per-run basis.

Good catch @adriangb ! Fixed — reverted the sort_pushdown.rs change and moved pushdown_filters to an env var only for the inexact benchmark run. Other sort_pushdown benchmarks are unaffected.

adriangb · 2026-04-16T12:53:54Z

+run_sort_pushdown_inexact() {
+    INEXACT_DIR="${DATA_DIR}/sort_pushdown_inexact"
+    RESULTS_FILE="${RESULTS_DIR}/sort_pushdown_inexact.json"
+    echo "Running sort pushdown Inexact benchmark (row group reorder by statistics)..."
+    debug_run $CARGO_COMMAND --bin dfbench -- sort-pushdown --sorted --iterations 5 --path "${INEXACT_DIR}" --queries-path "${SCRIPT_DIR}/queries/sort_pushdown_inexact" -o "${RESULTS_FILE}" ${QUERY_ARG} ${LATENCY_ARG}


I'm not sure what the --sorted flag is supposed to do but worth checking @zhuqi-lucas

adriangb · 2026-04-16T12:56:02Z

+    # Use datafusion-cli to bucket rows into 64 groups by a deterministic
+    # scrambler, then sort within each bucket by orderkey. This produces
+    # ~64 RG-sized segments where each has a tight orderkey range but the
+    # segments appear in scrambled (non-sorted) order in the file.


I think another interesting benchmark, which is our case at least, is when there is overlap between the row groups but some general logic. E.g.:

rg,min,max 0,0,5 1,4,7 2,5,32

In our case, this happens because there's a stream of data that's coming in with time stamps, and it should arrive at around the same time it was created, but there are always some network delays, time skew, etc. that means that it's not perfect. But data that arrived 30 minutes later is guaranteed to have timestamps in a different range than data that arrive 30 minutes before it.

Great suggestion! Partially overlapping RGs from streaming data is a very realistic scenario. I will add a benchmark variant for this pattern when I update the PR — something like time-ordered chunks with small overlaps between adjacent chunks to simulate network delays / time skew.

Did you plan to include it in this PR or a followup?

Will add it in this PR — I'll create a third data variant with partially overlapping RGs (simulating streaming data with network delays) alongside the current shuffled data.

Amazing thanks so much

Added in the latest push — sort_pushdown_inexact_overlap generates a file with partially overlapping RGs (±2500 orderkey jitter between adjacent 100K-row chunks, simulating streaming data with network delays). 4 DESC LIMIT queries to match your use case.

@adriangb

Add three benchmark suites for the Inexact sort pushdown path: 1. sort_pushdown_inexact (--sorted, DESC queries): Tests reverse scan path where RG reorder by statistics is applied. Single file with shuffled (non-overlapping) row groups. 2. sort_pushdown_inexact_unsorted (no WITH ORDER): Tests Unsupported path for future RG reorder support without declared file ordering. ASC + DESC queries. 3. sort_pushdown_inexact_overlap (--sorted, DESC queries): Partially overlapping row groups simulating streaming data with network jitter — RGs are mostly in order but have ±2500 orderkey overlap between adjacent chunks. This matches the real-world pattern described by @adriangb. All use pushdown_filters via env var for late materialization. Closes apache#21582

adriangb

LGTM!

adriangb · 2026-04-17T13:33:57Z

run benchmark sort_pushdown_inexact_overlap

adriangbot · 2026-04-17T13:34:53Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4268402463-1421-h7zl5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-pushdown-inexact-benchmark (f1215b9) to 7bfa3fb (merge-base) diff
BENCH_NAME=sort_pushdown_inexact_overlap
BENCH_COMMAND=cargo bench --features=parquet --bench sort_pushdown_inexact_overlap
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-17T13:34:59Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    struct_query_sql
    substr
    substr_index
    substring
    sum
    to_char
    to_hex
    to_local_time
    to_time
    to_timestamp
    topk_aggregate
    topk_repartition
    translate
    trim
    trunc
    unhex
    upper
    uuid
    window_query_sql
    with_hashes

File an issue against this benchmark runner

adriangb · 2026-04-17T14:13:39Z

run benchmark sort_pushdown_inexact_overlap

adriangbot · 2026-04-17T14:14:10Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4268806819-1422-9cxfj 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-pushdown-inexact-benchmark (f1215b9) to 7bfa3fb (merge-base) diff using: sort_pushdown_inexact_overlap
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-17T14:43:47Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_sort-pushdown-inexact-benchmark
--------------------
Benchmark sort_pushdown_inexact_overlap.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_sort-pushdown-inexact-benchmark ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    4.89 / 5.96 ±0.89 / 7.55 ms │          4.63 / 5.61 ±1.02 / 7.44 ms │ +1.06x faster │
│ Q2    │    4.99 / 5.38 ±0.28 / 5.67 ms │          4.95 / 5.28 ±0.26 / 5.66 ms │     no change │
│ Q3    │ 13.87 / 14.66 ±0.61 / 15.73 ms │       14.43 / 14.93 ±0.42 / 15.62 ms │     no change │
│ Q4    │ 14.49 / 14.62 ±0.11 / 14.80 ms │       14.64 / 14.68 ±0.05 / 14.77 ms │     no change │
└───────┴────────────────────────────────┴──────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                                   ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                                   │ 40.62ms │
│ Total Time (feat_sort-pushdown-inexact-benchmark)   │ 40.50ms │
│ Average Time (HEAD)                                 │ 10.16ms │
│ Average Time (feat_sort-pushdown-inexact-benchmark) │ 10.12ms │
│ Queries Faster                                      │       1 │
│ Queries Slower                                      │       0 │
│ Queries with No Change                              │       3 │
│ Queries with Failure                                │       0 │
└─────────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact_overlap — base (merge-base)

Metric	Value
Wall time	0.4s
Peak memory	4.7 GiB
Avg memory	4.7 GiB
CPU user	1.8s
CPU sys	0.2s
Peak spill	0 B

sort_pushdown_inexact_overlap — branch

Metric	Value
Wall time	0.4s
Peak memory	4.7 GiB
Avg memory	4.7 GiB
CPU user	1.8s
CPU sys	0.2s
Peak spill	0 B

File an issue against this benchmark runner

Copilot AI review requested due to automatic review settings April 16, 2026 12:28

Copilot started reviewing on behalf of zhuqi-lucas April 16, 2026 12:28 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

adriangb reviewed Apr 16, 2026

View reviewed changes

zhuqi-lucas mentioned this pull request Apr 16, 2026

feat: reorder row groups by statistics during sort pushdown #21580

Open

zhuqi-lucas force-pushed the feat/sort-pushdown-inexact-benchmark branch from ab5fe05 to bc87f85 Compare April 16, 2026 12:48

adriangb reviewed Apr 16, 2026

View reviewed changes

zhuqi-lucas force-pushed the feat/sort-pushdown-inexact-benchmark branch from bc87f85 to 9290773 Compare April 16, 2026 14:21

zhuqi-lucas force-pushed the feat/sort-pushdown-inexact-benchmark branch from 9290773 to f1215b9 Compare April 17, 2026 08:21

adriangb approved these changes Apr 17, 2026

View reviewed changes

adriangb added this pull request to the merge queue Apr 17, 2026

Merged via the queue into apache:main with commit afc0784 Apr 17, 2026
31 checks passed

Conversation

zhuqi-lucas commented Apr 16, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How to run

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

adriangb commented Apr 17, 2026

Uh oh!

adriangbot commented Apr 17, 2026

Uh oh!

adriangbot commented Apr 17, 2026

Uh oh!

Uh oh!

adriangb commented Apr 17, 2026

Uh oh!

adriangbot commented Apr 17, 2026

Uh oh!

adriangbot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhuqi-lucas Apr 16, 2026 •

edited

Loading

zhuqi-lucas Apr 16, 2026 •

edited

Loading