test: benchmarks and SLT tests for push-down TopK through join#22760
Merged
Conversation
Splits the test and benchmark scaffolding out of apache#21621 so the `PushDownTopKThroughJoin` optimizer rule can be reviewed on its own. - Adds a `push_down_topk` benchmark (`dfbench push-down-topk`) that runs ORDER BY ... LIMIT queries over outer joins against TPC-H data, so the rule's effect can be measured against a baseline that does not register it. - Adds `push_down_topk_through_join.slt` covering the scenarios the rule handles (preserved-side sort keys, ineligible join types, semi/anti joins, projection/alias resolution, ties, multi-level joins, volatility). The EXPLAIN plans capture current (pre-rule) behavior so the follow-up rule PR's diff shows exactly which plans change; the query-result checks hold either way. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alamb
approved these changes
Jun 4, 2026
Contributor
alamb
left a comment
There was a problem hiding this comment.
Thank you @adriangb and @SubhamSinghal
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| //! Benchmark for `push_down_topk_through_join`. |
Contributor
There was a problem hiding this comment.
So we really need a whole special executor for this benchmark? Can we use the new benchmark runner stuff that @Omega359 is working on?
Contributor
Author
There was a problem hiding this comment.
I initially copied over verbatim but I will rework them into the new framework.
Addresses review feedback: replace the bespoke `push_down_topk` Rust
executor with declarative `.benchmark` files under
`sql_benchmarks/push_down_topk/`, run by the existing `cargo bench
--bench sql` harness.
- Removes `benchmarks/src/push_down_topk.rs` and its `dfbench` subcommand
wiring (`lib.rs`, `dfbench.rs` now match main).
- Adds `sql_benchmarks/push_down_topk/{benchmarks/q01..q05.benchmark,
init/load.sql,init/cleanup.sql}`, reusing the TPC-H parquet data.
- Wires `push_down_topk` into `bench.sh` (data + run) and documents the
suite in the sql_benchmarks README.
Verified all five queries load, assert, and run via
`BENCH_NAME=push_down_topk cargo bench --bench sql` against a generated
TPC-H dataset.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This splits the test and benchmark scaffolding out of #21621 so the
PushDownTopKThroughJoinoptimizer rule itself can be reviewed inisolation, with a small, focused diff.
The benchmark and SLT files here do not depend on the rule. They are
committed first so that:
does not register it.
the EXPLAIN plans here capture the current (pre-rule) behavior.
What changes are included in this PR?
push_down_topkbenchmark (dfbench push-down-topk) that runsORDER BY <cols> LIMIT Nqueries over outer joins against TPC-Hcustomer/orders/nation, plus its query files underbenchmarks/queries/push_down_topk/.push_down_topk_through_join.sltcovering the scenarios the rulehandles: preserved-side sort keys, ineligible join types
(inner/full/semi/anti),
ON-clause filters, projection andSubqueryAliasresolution, existing child sorts, ties, multi-leveljoins,
OFFSET, and volatile expressions.The EXPLAIN plans assert current behavior (TopK not yet pushed through
the join). The follow-up PR that adds the rule updates those plans in
place; the query-result checks hold regardless of whether the rule is
enabled.
The new optimizer rule, the
push_down_limit.rschanges, and theoptimizer_rule_reference.mdupdate from #21621 are intentionally leftfor the follow-up PR.
Are these changes tested?
Yes — this PR is the tests.
push_down_topk_through_join.sltpassesagainst
main, and the benchmark binary compiles and runs.Are there any user-facing changes?
No. No API changes; only new benchmark and test files plus benchmark CLI
wiring.