[feature](fe) Push down limit into CTE producer #63675
Open
CalvinKirs wants to merge 4 commits into
Open
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Push row limits from CTE consumers into the CTE producer when every consumer is bounded by a plain row-preserving limit, using the maximum rows needed across consumers.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- env MAVEN_OPTS=-Xmx4g -XX:MaxMetaspaceSize=1g ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CollectLimitAboveConsumerTest,org.apache.doris.nereids.rules.rewrite.RewriteCteChildrenLimitPushdownTest
- Behavior changed: Yes. Adds CTE producer limit pushdown for bounded CTE consumers.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add planner-level coverage for CTE limit pushdown boundaries, including offset handling, producer output pruning, max rows across limited consumers, full-row consumers, and filter boundaries.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- env MAVEN_OPTS=-Xmx4g -XX:MaxMetaspaceSize=1g ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CteLimitPushdownPlanTest
- env MAVEN_OPTS=-Xmx4g -XX:MaxMetaspaceSize=1g ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CollectLimitAboveConsumerTest,org.apache.doris.nereids.rules.rewrite.RewriteCteChildrenLimitPushdownTest,org.apache.doris.nereids.rules.rewrite.CteLimitPushdownPlanTest
- Red check: temporarily disabled producer-side limit construction; CteLimitPushdownPlanTest failed 3 positive cases as expected.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add planner and regression coverage for CTE limit pushdown edge cases described in the design, including local split limits, non-row-preserving consumers, and nonmatching Filter/TopN/Join/Aggregate/Window shapes.
### Release note
None
### Check List (For Author)
- Test:
- Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CollectLimitAboveConsumerTest
- Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CteLimitPushdownPlanTest
- Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.RewriteCteChildrenLimitPushdownTest
- Regression test: ./run-regression-test.sh --run --conf /tmp/cte_limit_pushdown_regression-conf.groovy -d nereids_rules_p0/cte_limit_pushdown -s test_cte_limit_pushdown
- Behavior changed: No
- Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Remove a redundant row-preserving project check after the pattern guard and keep the maximum collected CTE consumer limit when the same consumer is collected multiple times.
### Release note
None
### Check List (For Author)
- Test:
- Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CollectLimitAboveConsumerTest
- Regression test: ./run-regression-test.sh --run --conf /tmp/cte_limit_pushdown_regression-conf.groovy -d nereids_rules_p0/cte_limit_pushdown -s test_cte_limit_pushdown
- Behavior changed: No
- Does this need documentation: No
Member
Author
|
run buildall |
Member
Author
|
/review |
Contributor
TPC-H: Total hot run time: 31805 ms |
Contributor
TPC-DS: Total hot run time: 172322 ms |
Contributor
FE Regression Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds CTE producer-side limit pushdown in Nereids.
When all CTE consumers only need a bounded number of rows, the optimizer collects the required row count from each consumer, takes
the maximum value, and pushes that limit into the CTE producer. The original consumer-side limit is still kept.
The rule only handles safe shapes:
The project must be row-preserving.
Scenarios
1. Direct Limit
The consumer only needs 10 rows, so the CTE producer can produce at most 10 rows.
2. Project + Limit
A normal project only prunes columns and does not change row count, so the producer can still be limited to 10 rows.
3. Multiple Consumers + Limit
For multiple CTE consumers, the producer limit is:
In this case, the pushed producer limit is 20.
If any consumer needs full CTE data, pushdown is skipped:
4. Limit + Offset
The consumer needs to skip 100 rows and then return 10 rows, so the producer must provide at least 110 rows.
The producer side only truncates rows and does not apply offset:
5. SplitLimit
Doris may split this into local/global limits. The local limit closest to the CTE consumer already represents
limit + offset.The collector uses the local limit value directly and does not add offset again.
6. Filter + Limit Is Not Matched
Filter can reduce rows before limit, so the producer may need more than 10 input rows. This rule does not push limit through filter.
7. TopN Is Not Matched
ORDER BY ... LIMITis TopN. It needs the first N rows after ordering, so it cannot be treated as a normal limit.8. Join / Aggregate / Window / Sort Are Not Matched
These operators can change row cardinality or ordering semantics. Unless other rules have already rewritten the shape into
Limit -> CTEConsumerorLimit -> Project -> CTEConsumer, this collector skips them.