-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Eliminate multi limit-offset nodes to emptyRelation #2823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2823 +/- ##
==========================================
- Coverage 85.25% 85.21% -0.04%
==========================================
Files 274 275 +1
Lines 48762 48829 +67
==========================================
+ Hits 41570 41608 +38
- Misses 7192 7221 +29
Continue to review full report at Codecov.
|
|
cc @ming535 - would you like to review this PR? |
|
@AssHero Hi thanks for your work! I looked into this PR, and have some concerns on logical plan node like |
@ming535 Thanks!But I didn't see any comments here, please let me know if I miss something? |
|
|
||
| fn eliminate_limit( | ||
| _optimizer: &EliminateLimit, | ||
| upper_offset: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can use ancestor_skip instead of upper_offset to make it consistent with code in limit_push_down
| input, | ||
| .. | ||
| }) => { | ||
| // If upper's offset is equal or greater than current's limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If ancestor's offset/skip is equal or greater than current's fetch,
// replaces the current plan with an LogicalPlan::EmptyRelation.
// For example, in the following query, the subquery test_b should be optimized as an empty table: SELECT * FROM (SELECT * FROM test_a LIMIT 5) AS test_b LIMIT 2 OFFSET 5;
| None => {} | ||
| } | ||
|
|
||
| let offset = match offset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may want to use this let skip = skip.unwrap_or(0)
| LogicalPlan::Limit(Limit { | ||
| skip: offset, | ||
| fetch: limit, | ||
| input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be we should keep use skip and fetch instead of offset and limit to keep it consistent with code in limit_push_down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this, Thanks!
| Self {} | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also add some comments in top this file to clarify the intention of this optimizer rule in line 18 and line 27.
//! Optimizer rule to replace...
| _ => { | ||
| // For those plans(projection/sort/..) which do not affect the output rows of sub-plans, we still use upper_offset; | ||
| // otherwise, use 0 instead. | ||
| let offset = match plan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this case: a Join has an input node that has limit 10, and the ancestor of Join has offset 11. If the optimizer is allowed to optimize in this situation, then the result might be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For join plan, the upper_offset is set to 0, and the limit node for the join is not affected.
let offset = match plan {
LogicalPlan::Projection { .. } | LogicalPlan::Sort { .. } => upper_offset,
_ => 0, // join/agg/filter and any others...
};
|
Improvements according to the comments. |
| plan: &LogicalPlan, | ||
| _optimizer_config: &OptimizerConfig, | ||
| ) -> Result<LogicalPlan> { | ||
| match plan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
match plan {
LogicalPlan::Limit(Limit {skip, Some(fetch), input, ..}})...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit with only skip should be handled here.
For query: SELECT * FROM (SELECT * FROM test_a LIMIT 5) AS test_b OFFSET 5;
what do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT * FROM (SELECT * FROM test_a LIMIT 5) AS test_b OFFSET 5;
In the above query, the current plan is the one with only fetch (the subquery).
I thought when the current node is Limit with only skip, it should be handled the same as Projection, Sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ancestor LIMIT node is OFFSET 5 for query "SELECT * FROM (SELECT * FROM test_a LIMIT 5) AS test_b OFFSET 5".
For this case, I think we should use 5 as ancestor's skip to optimizer inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limit with only skip should be handled here. For query: SELECT * FROM (SELECT * FROM test_a LIMIT 5) AS test_b OFFSET 5;
what do you think about this?
I see, I misunderstood your code.
| \n TableScan: test"; | ||
| assert_optimized_plan_eq(&plan, expected); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, can we add more tests here to handle all combination of skip and fetch between ancestor and child?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'll add more test cases.
|
Add more test cases. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @AssHero for the code and @ming535 for the review -- I think the code looks good to me.
I wonder if you have seen queries that produce 0 rows much in practice (as in I wonder how common will this optimization kick in / be useful)
| // select * from (select * from xxx limit 5) a limit 2 offset 5; | ||
| match fetch { | ||
| Some(fetch) => { | ||
| if *fetch == 0 || ancestor_skip >= *fetch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| // otherwise, use NotRelevant instead. | ||
| let ancestor = match plan { | ||
| LogicalPlan::Projection { .. } | LogicalPlan::Sort { .. } => ancestor, | ||
| _ => &Ancestor::NotRelevant, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if Ancester::Unknown would make the intent clearer compared to Ancester::NotRelevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if
Ancester::Unknownwould make the intent clearer compared toAncester::NotRelevant?
This is consistent with limit_push_down.
| } | ||
|
|
||
| #[test] | ||
| fn limit_fetch_with_ancestor_limit_fetch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Thanks again all! |
* eliminate multi limit-offset * refine the code * add more test cases
* eliminate multi limit-offset * refine the code * add more test cases
* eliminate multi limit-offset * refine the code * add more test cases
Which issue does this PR close?
Closes #2822
Rationale for this change
For logical plans, if upper offset is equal or greater than current's limit, replaces with an [LogicalPlan::EmptyRelation].
For this query: select * from (select * from t1 limit 2 offset 2 ) a order by a.column1 offset 2, we get
the Logical Plan before:
Limit: skip=2, fetch=None
-Sort: #a.column1 ASC NULLS LAST
--Projection: #a.column1
---Limit: skip=2, fetch=2
----Projection: #t1.column1, alias=a
-----TableScan: t1 projection=Some([column1])
And after:
Limit: skip=2, fetch=None
-Sort: #a.column1 ASC NULLS LAST
--Projection: #a.column1
---EmptyRelation
What changes are included in this PR?
add rules to eliminate multi limit-offset nodes to emptyRelation in datafusion/optimizer/src/eliminate_limit.rs