-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-12600] [table-planner-blink] Introduce planner rules to do deterministic rewriting on RelNode #8520
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
import org.apache.calcite.rel.RelNode | ||
import org.apache.calcite.rel.core.{Join, JoinRelType, Values} | ||
|
||
object FlinkPruneEmptyRules { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you intend to introduce more than one rule in this object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PruneEmptyRules
in Calcite contains more than one rules, I intend to keep calcite style for this rule. Maybe later, we need to copy other rules from PruneEmptyRules
to this file.
|
||
/** | ||
* Planner rule that rewrites filter condition like: | ||
* `(select count(*) from T) > 0` to `exists(select * from T)`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why semi join is better than aggregate?
BTW, i think for query like this, a more efficient one is convert to exists(select * from T limit 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The estimation for SEMI/ANTI join is very inaccurate,so we intend to do deterministic rewriting on SEMI/ANTI join. And we can put this rule to CBO after we improve estimation of SEMI/ANTI join.
yes, we can do similarly rewriting for exists(select * from T limit 1)
later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, the original query is aggregate, but you want to convert it to a semi join, knowing that the estimation of semi join is not reliable.
Take a step back, why semi join is better than aggregate in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original query is a scalar query, which will be converted to aggregate + join by Calcite rules.
for example, full sql like SELECT * FROM x WHERE (SELECT COUNT(*) FROM y WHERE d > 10) > 0
,the logical plan converted by Calcite rules is:
LogicalProject(a=[$0], b=[$1], c=[$2])
+- LogicalProject(a=[$0], b=[$1], c=[$2])
+- LogicalFilter(condition=[>($3, 0)])
+- LogicalJoin(condition=[true], joinType=[left])
:- LogicalTableScan(table=[[x]])
+- LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
+- LogicalProject($f0=[0])
+- LogicalFilter(condition=[>($0, 10)])
+- LogicalTableScan(table=[[y]])
i will update the class comments to make it more clear.
…erministic rewriting on RelNode rules include: 1. FlinkLimit0RemoveRule, that rewrites `limit 0` to empty Values 2. FlinkRewriteSubQueryRule, that rewrites a Filter with condition: `(select count from T) > 0` to a Filter with condition: `exists(select * from T)`, which could be converted to SEMI Join by FlinkSubQueryRemoveRule 3. ReplaceIntersectWithSemiJoinRule, that rewrites distinct Intersect to a distinct Aggregate on a SEMI Join 4.ReplaceMinusWithAntiJoinRule, that rewrites distinct Minus to a distinct Aggregate on an ANTI Join ps, introduce FlinkPruneEmptyRules#JOIN_RIGHT_INSTANCE to handle ANTI join with empty right
update comments for FlinkRewriteSubQueryRule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
What is the purpose of the change
Introduce planner rules to do deterministic rewriting on RelNode
Brief change log
limit 0
to empty Values(select count from T) > 0
to a Filter with condition:exists(select * from T)
, which could be converted to SEMI Join by FlinkSubQueryRemoveRuleVerifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation