Skip to content

test: benchmarks and SLT tests for push-down TopK through join#22760

Merged
adriangb merged 2 commits into
apache:mainfrom
pydantic:push-down-topk-bench-tests
Jun 4, 2026
Merged

test: benchmarks and SLT tests for push-down TopK through join#22760
adriangb merged 2 commits into
apache:mainfrom
pydantic:push-down-topk-bench-tests

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented Jun 4, 2026

Which issue does this PR close?

Rationale for this change

This splits the test and benchmark scaffolding out of #21621 so the
PushDownTopKThroughJoin optimizer rule itself can be reviewed in
isolation, with a small, focused diff.

The benchmark and SLT files here do not depend on the rule. They are
committed first so that:

  1. The benchmark can measure the rule's effect against a baseline that
    does not register it.
  2. The follow-up rule PR's diff shows exactly which plans change, since
    the EXPLAIN plans here capture the current (pre-rule) behavior.

What changes are included in this PR?

  • A push_down_topk benchmark (dfbench push-down-topk) that runs
    ORDER BY <cols> LIMIT N queries over outer joins against TPC-H
    customer/orders/nation, plus its query files under
    benchmarks/queries/push_down_topk/.
  • push_down_topk_through_join.slt covering the scenarios the rule
    handles: preserved-side sort keys, ineligible join types
    (inner/full/semi/anti), ON-clause filters, projection and
    SubqueryAlias resolution, existing child sorts, ties, multi-level
    joins, OFFSET, and volatile expressions.

The EXPLAIN plans assert current behavior (TopK not yet pushed through
the join). The follow-up PR that adds the rule updates those plans in
place; the query-result checks hold regardless of whether the rule is
enabled.

The new optimizer rule, the push_down_limit.rs changes, and the
optimizer_rule_reference.md update from #21621 are intentionally left
for the follow-up PR.

Are these changes tested?

Yes — this PR is the tests. push_down_topk_through_join.slt passes
against main, and the benchmark binary compiles and runs.

Are there any user-facing changes?

No. No API changes; only new benchmark and test files plus benchmark CLI
wiring.

Splits the test and benchmark scaffolding out of apache#21621 so the
`PushDownTopKThroughJoin` optimizer rule can be reviewed on its own.

- Adds a `push_down_topk` benchmark (`dfbench push-down-topk`) that runs
  ORDER BY ... LIMIT queries over outer joins against TPC-H data, so the
  rule's effect can be measured against a baseline that does not register
  it.
- Adds `push_down_topk_through_join.slt` covering the scenarios the rule
  handles (preserved-side sort keys, ineligible join types, semi/anti
  joins, projection/alias resolution, ties, multi-level joins,
  volatility). The EXPLAIN plans capture current (pre-rule) behavior so
  the follow-up rule PR's diff shows exactly which plans change; the
  query-result checks hold either way.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Jun 4, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adriangb and @SubhamSinghal

Comment thread benchmarks/src/push_down_topk.rs Outdated
// specific language governing permissions and limitations
// under the License.

//! Benchmark for `push_down_topk_through_join`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we really need a whole special executor for this benchmark? Can we use the new benchmark runner stuff that @Omega359 is working on?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially copied over verbatim but I will rework them into the new framework.

Addresses review feedback: replace the bespoke `push_down_topk` Rust
executor with declarative `.benchmark` files under
`sql_benchmarks/push_down_topk/`, run by the existing `cargo bench
--bench sql` harness.

- Removes `benchmarks/src/push_down_topk.rs` and its `dfbench` subcommand
  wiring (`lib.rs`, `dfbench.rs` now match main).
- Adds `sql_benchmarks/push_down_topk/{benchmarks/q01..q05.benchmark,
  init/load.sql,init/cleanup.sql}`, reusing the TPC-H parquet data.
- Wires `push_down_topk` into `bench.sh` (data + run) and documents the
  suite in the sql_benchmarks README.

Verified all five queries load, assert, and run via
`BENCH_NAME=push_down_topk cargo bench --bench sql` against a generated
TPC-H dataset.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adriangb adriangb enabled auto-merge June 4, 2026 15:19
@adriangb adriangb added this pull request to the merge queue Jun 4, 2026
Merged via the queue into apache:main with commit 249b599 Jun 4, 2026
57 of 58 checks passed
@adriangb adriangb deleted the push-down-topk-bench-tests branch June 4, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants