Skip to content

feat: add sort_pushdown_inexact benchmark for RG reorder#21674

Merged
adriangb merged 1 commit intoapache:mainfrom
zhuqi-lucas:feat/sort-pushdown-inexact-benchmark
Apr 17, 2026
Merged

feat: add sort_pushdown_inexact benchmark for RG reorder#21674
adriangb merged 1 commit intoapache:mainfrom
zhuqi-lucas:feat/sort-pushdown-inexact-benchmark

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #21582

Rationale for this change

The existing sort_pushdown benchmarks only cover the Exact path (sort elimination). The Inexact path — where TopK is preserved and row group reorder by statistics helps threshold convergence — had no benchmark to measure its impact.

What changes are included in this PR?

  • New benchmark: sort_pushdown_inexact with 4 DESC LIMIT queries (narrow/wide rows, small/large LIMIT)
  • Data generation: single large parquet file with shuffled row groups (simulates append-heavy workloads)
  • Enables pushdown_filters in sort_pushdown benchmarks so TopK's dynamic filter is pushed to the parquet reader for late materialization

How to run

./bench.sh data sort_pushdown_inexact
./bench.sh run sort_pushdown_inexact

Or on GKE: @alamb benchmark sort_pushdown_inexact

Are these changes tested?

Benchmark code only — validated locally.

Are there any user-facing changes?

No. New benchmark only.

Copilot AI review requested due to automatic review settings April 16, 2026 12:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new benchmark scenario to measure DataFusion’s sort-pushdown Inexact path behavior (TopK preserved) and the impact of row-group reorder by statistics, while also enabling Parquet filter pushdown so TopK’s dynamic filters can benefit from late materialization.

Changes:

  • Enable execution.parquet.pushdown_filters in the sort_pushdown benchmark runner to allow TopK dynamic filter pushdown to Parquet decoding.
  • Add sort_pushdown_inexact to bench.sh, including data generation (single Parquet file) and a dedicated run target.
  • Add 4 new SQL queries under benchmarks/queries/sort_pushdown_inexact/ covering narrow/wide and small/large LIMIT variants for ORDER BY ... DESC.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
benchmarks/src/sort_pushdown.rs Turns on Parquet filter pushdown for sort_pushdown benchmark runs.
benchmarks/bench.sh Wires in sort_pushdown_inexact data + run targets and generates a single-file dataset.
benchmarks/queries/sort_pushdown_inexact/q1.sql DESC TopK (narrow) query for Inexact path coverage.
benchmarks/queries/sort_pushdown_inexact/q2.sql DESC TopK (narrow) with larger LIMIT to stress threshold convergence.
benchmarks/queries/sort_pushdown_inexact/q3.sql DESC TopK (wide SELECT *) to show late materialization benefits.
benchmarks/queries/sort_pushdown_inexact/q4.sql Wide SELECT * with larger LIMIT for cumulative effects.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread benchmarks/bench.sh
Comment thread benchmarks/bench.sh
Comment on lines +1193 to +1197
run_sort_pushdown_inexact() {
INEXACT_DIR="${DATA_DIR}/sort_pushdown_inexact"
RESULTS_FILE="${RESULTS_DIR}/sort_pushdown_inexact.json"
echo "Running sort pushdown Inexact benchmark (row group reorder by statistics)..."
debug_run $CARGO_COMMAND --bin dfbench -- sort-pushdown --sorted --iterations 5 --path "${INEXACT_DIR}" --queries-path "${SCRIPT_DIR}/queries/sort_pushdown_inexact" -o "${RESULTS_FILE}" ${QUERY_ARG} ${LATENCY_ARG}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_sort_pushdown_inexact passes --sorted, which tells DataFusion the file is ordered by l_orderkey (via WITH ORDER). However, the generated shuffled.parquet is intentionally not globally sorted by l_orderkey (row groups are arranged by the bucket key). This benchmark therefore relies on the planner staying on an Inexact path (keeping TopK/Sort for correctness). Please document this assumption explicitly and/or add a guard to ensure the benchmark can’t silently become incorrect if future optimizations start treating the reversed scan as Exact and eliminate the ordering operator.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the --sorted flag is supposed to do but worth checking @zhuqi-lucas

Copy link
Copy Markdown
Contributor Author

@zhuqi-lucas zhuqi-lucas Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--sorted adds WITH ORDER (l_orderkey ASC) which is needed to trigger the reverse scan path in try_pushdown_sort — currently the only path where reorder_by_statistics is called.

After reviewing this more carefully, I realized we need two benchmark suites to cover the full optimization path:

  1. sort_pushdown_inexact (with --sorted, DESC queries) — tests the reverse scan path where RG reorder is already supported
  2. sort_pushdown_inexact_unsorted (without --sorted, ASC+DESC queries) — tests the Unsupported path where RG reorder will be supported in a follow-up PR (feat: reorder row groups by statistics during sort pushdown #21580)

Updated in the latest push. This way each follow-up PR can run its corresponding benchmark to show the improvement.

Comment thread benchmarks/bench.sh
Comment thread benchmarks/src/sort_pushdown.rs Outdated
Comment on lines +162 to +166
let mut config = self.common.config()?;
// Enable filter pushdown so TopK's dynamic filter is pushed to the
// parquet reader for late materialization — only sort-column rows
// pass the filter, non-sort columns are skipped for filtered rows.
config.options_mut().execution.parquet.pushdown_filters = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this enable it for all benchmarks? Generally we control this on a per-run basis.

Copy link
Copy Markdown
Contributor Author

@zhuqi-lucas zhuqi-lucas Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @adriangb ! Fixed — reverted the sort_pushdown.rs change and moved pushdown_filters to an env var only for the inexact benchmark run. Other sort_pushdown benchmarks are unaffected.

Comment thread benchmarks/bench.sh
Comment on lines +1193 to +1197
run_sort_pushdown_inexact() {
INEXACT_DIR="${DATA_DIR}/sort_pushdown_inexact"
RESULTS_FILE="${RESULTS_DIR}/sort_pushdown_inexact.json"
echo "Running sort pushdown Inexact benchmark (row group reorder by statistics)..."
debug_run $CARGO_COMMAND --bin dfbench -- sort-pushdown --sorted --iterations 5 --path "${INEXACT_DIR}" --queries-path "${SCRIPT_DIR}/queries/sort_pushdown_inexact" -o "${RESULTS_FILE}" ${QUERY_ARG} ${LATENCY_ARG}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the --sorted flag is supposed to do but worth checking @zhuqi-lucas

Comment thread benchmarks/bench.sh
Comment on lines +1168 to +1171
# Use datafusion-cli to bucket rows into 64 groups by a deterministic
# scrambler, then sort within each bucket by orderkey. This produces
# ~64 RG-sized segments where each has a tight orderkey range but the
# segments appear in scrambled (non-sorted) order in the file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another interesting benchmark, which is our case at least, is when there is overlap between the row groups but some general logic. E.g.:

rg,min,max
0,0,5
1,4,7
2,5,32

In our case, this happens because there's a stream of data that's coming in with time stamps, and it should arrive at around the same time it was created, but there are always some network delays, time skew, etc. that means that it's not perfect. But data that arrived 30 minutes later is guaranteed to have timestamps in a different range than data that arrive 30 minutes before it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Partially overlapping RGs from streaming data is a very realistic scenario. I will add a benchmark variant for this pattern when I update the PR — something like time-ordered chunks with small overlaps between adjacent chunks to simulate network delays / time skew.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you plan to include it in this PR or a followup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it in this PR — I'll create a third data variant with partially overlapping RGs (simulating streaming data with network delays) alongside the current shuffled data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing thanks so much

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the latest push — sort_pushdown_inexact_overlap generates a file with partially overlapping RGs (±2500 orderkey jitter between adjacent 100K-row chunks, simulating streaming data with network delays). 4 DESC LIMIT queries to match your use case.

@zhuqi-lucas zhuqi-lucas force-pushed the feat/sort-pushdown-inexact-benchmark branch from bc87f85 to 9290773 Compare April 16, 2026 14:21
Add three benchmark suites for the Inexact sort pushdown path:

1. sort_pushdown_inexact (--sorted, DESC queries):
   Tests reverse scan path where RG reorder by statistics is applied.
   Single file with shuffled (non-overlapping) row groups.

2. sort_pushdown_inexact_unsorted (no WITH ORDER):
   Tests Unsupported path for future RG reorder support without
   declared file ordering. ASC + DESC queries.

3. sort_pushdown_inexact_overlap (--sorted, DESC queries):
   Partially overlapping row groups simulating streaming data with
   network jitter — RGs are mostly in order but have ±2500 orderkey
   overlap between adjacent chunks. This matches the real-world
   pattern described by @adriangb.

All use pushdown_filters via env var for late materialization.

Closes apache#21582
@zhuqi-lucas zhuqi-lucas force-pushed the feat/sort-pushdown-inexact-benchmark branch from 9290773 to f1215b9 Compare April 17, 2026 08:21
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@adriangb adriangb added this pull request to the merge queue Apr 17, 2026
@adriangb
Copy link
Copy Markdown
Contributor

run benchmark sort_pushdown_inexact_overlap

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4268402463-1421-h7zl5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-pushdown-inexact-benchmark (f1215b9) to 7bfa3fb (merge-base) diff
BENCH_NAME=sort_pushdown_inexact_overlap
BENCH_COMMAND=cargo bench --features=parquet --bench sort_pushdown_inexact_overlap
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    struct_query_sql
    substr
    substr_index
    substring
    sum
    to_char
    to_hex
    to_local_time
    to_time
    to_timestamp
    topk_aggregate
    topk_repartition
    translate
    trim
    trunc
    unhex
    upper
    uuid
    window_query_sql
    with_hashes

File an issue against this benchmark runner

Merged via the queue into apache:main with commit afc0784 Apr 17, 2026
31 checks passed
@adriangb
Copy link
Copy Markdown
Contributor

run benchmark sort_pushdown_inexact_overlap

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4268806819-1422-9cxfj 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-pushdown-inexact-benchmark (f1215b9) to 7bfa3fb (merge-base) diff using: sort_pushdown_inexact_overlap
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_sort-pushdown-inexact-benchmark
--------------------
Benchmark sort_pushdown_inexact_overlap.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_sort-pushdown-inexact-benchmark ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    4.89 / 5.96 ±0.89 / 7.55 ms │          4.63 / 5.61 ±1.02 / 7.44 ms │ +1.06x faster │
│ Q2    │    4.99 / 5.38 ±0.28 / 5.67 ms │          4.95 / 5.28 ±0.26 / 5.66 ms │     no change │
│ Q3    │ 13.87 / 14.66 ±0.61 / 15.73 ms │       14.43 / 14.93 ±0.42 / 15.62 ms │     no change │
│ Q4    │ 14.49 / 14.62 ±0.11 / 14.80 ms │       14.64 / 14.68 ±0.05 / 14.77 ms │     no change │
└───────┴────────────────────────────────┴──────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Benchmark Summary                                   ┃         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ Total Time (HEAD)                                   │ 40.62ms │
│ Total Time (feat_sort-pushdown-inexact-benchmark)   │ 40.50ms │
│ Average Time (HEAD)                                 │ 10.16ms │
│ Average Time (feat_sort-pushdown-inexact-benchmark) │ 10.12ms │
│ Queries Faster                                      │       1 │
│ Queries Slower                                      │       0 │
│ Queries with No Change                              │       3 │
│ Queries with Failure                                │       0 │
└─────────────────────────────────────────────────────┴─────────┘

Resource Usage

sort_pushdown_inexact_overlap — base (merge-base)

Metric Value
Wall time 0.4s
Peak memory 4.7 GiB
Avg memory 4.7 GiB
CPU user 1.8s
CPU sys 0.2s
Peak spill 0 B

sort_pushdown_inexact_overlap — branch

Metric Value
Wall time 0.4s
Peak memory 4.7 GiB
Avg memory 4.7 GiB
CPU user 1.8s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add benchmark for sort pushdown Inexact path (row group reorder)

4 participants