New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-5837] RexUtil#pullFactors output's order should be deterministic even when the RexNode kind is OR #3316
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks good to me, I only left one minor comment. And I've approved the CI for the first-time contributor.
88402f0
to
08b208f
Compare
Hi @libenchao,thank you very much for reviewing my PR. I have modified the code based on your suggestions I aslo change the PR commits,make the commits start with upper-case letter |
fc0dbe9
to
9e46c9e
Compare
Hi @libenchao , @asolimando ,could you help me approve the workflow job. |
I'm very grateful for that If anyone can help me approve the workflow job. :) |
|
||
"AND(=(?0.a, ?0.b), ?0.c, " | ||
+ "SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), " | ||
+ "OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11)," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good.
@LakeShen I have a small question.
If it's out of order, what might it look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good.
@LakeShen I have a small question. If it's out of order, what might it look like?
Hi @JiajunBernoulli ,thank you very much for reviewing my pr.
Let me explain this using tpc-h q19.
Sometimes the sql plan is :
LogicalAggregate(group=[{}], revenue=[SUM($0)])
LogicalProject($f0=[*($5, -(1, $6))])
LogicalFilter(condition=[AND(=($16, $1), SEARCH($14, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), =($13, 'DELIVER IN PERSON'), OR(AND(=($19, 'Brand#12'), SEARCH($22, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7)), >=($4, 1), <=($4, +(1, 10)), SEARCH($21, Sarg[[1..5]])), AND(=($19, 'Brand#23'), SEARCH($22, Sarg['MED BAG':CHAR(8), 'MED BOX':CHAR(8), 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8)), >=($4, 10), <=($4, +(10, 10)), SEARCH($21, Sarg[[1..10]])), AND(=($19, 'Brand#34'), SEARCH($22, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)), >=($4, 20), <=($4, +(20, 10)), SEARCH($21, Sarg[[1..15]]))))])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[tpch, LINEITEM]])
LogicalTableScan(table=[[tpch, PART]])
Sometimes the sql plan is :
LogicalAggregate(group=[{}], revenue=[SUM($0)])
LogicalProject($f0=[*($5, -(1, $6))])
LogicalFilter(condition=[AND(=($16, $1), =($13, 'DELIVER IN PERSON'), SEARCH($14, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), OR(AND(=($19, 'Brand#12'), SEARCH($22, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7)), >=($4, 1), <=($4, +(1, 10)), SEARCH($21, Sarg[[1..5]])), AND(=($19, 'Brand#23'), SEARCH($22, Sarg['MED BAG':CHAR(8), 'MED BOX':CHAR(8), 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8)), >=($4, 10), <=($4, +(10, 10)), SEARCH($21, Sarg[[1..10]])), AND(=($19, 'Brand#34'), SEARCH($22, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)), >=($4, 20), <=($4, +(20, 10)), SEARCH($21, Sarg[[1..15]]))))])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[tpch, LINEITEM]])
LogicalTableScan(table=[[tpch, PART]])
Although the content of the conditions in the calcite single test and the tpch q19 conditions above were not 100% identical in text, the overall conditions were in the same format.
This has no effect on the SQL execution results, but it is difficult for me to monitor my plan because of the variability of the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good.
@LakeShen I have a small question. If it's out of order, what might it look like?
Hi @JiajunBernoulli ,I have reproduced it locally using my test case.
Sometimes the RexNode is :
AND(=(?0.a, ?0.b), SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), ?0.c, OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11), SEARCH(?0.k, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7))), AND(=(?0.j, 'Brand#13'), >=(?0.h, 10), <=(?0.h, 20), SEARCH(?0.k, Sarg['MED BOX':CHAR(8), 'MED CASE', 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8))), AND(=(?0.j, 'Brand#14'), >=(?0.h, 20), <=(?0.h, 30), SEARCH(?0.k, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)))))
Sometimes the RexNode is :
AND(?0.c, SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), =(?0.a, ?0.b), OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11), SEARCH(?0.k, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7))), AND(=(?0.j, 'Brand#13'), >=(?0.h, 10), <=(?0.h, 20), SEARCH(?0.k, Sarg['MED BOX':CHAR(8), 'MED CASE', 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8))), AND(=(?0.j, 'Brand#14'), >=(?0.h, 20), <=(?0.h, 30), SEARCH(?0.k, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)))))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good.
@LakeShen I have a small question. If it's out of order, what might it look like?
Hi @JiajunBernoulli ,I have reproduced it locally using my test case.
Sometimes the RexNode is :
AND(=(?0.a, ?0.b), SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), ?0.c, OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11), SEARCH(?0.k, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7))), AND(=(?0.j, 'Brand#13'), >=(?0.h, 10), <=(?0.h, 20), SEARCH(?0.k, Sarg['MED BOX':CHAR(8), 'MED CASE', 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8))), AND(=(?0.j, 'Brand#14'), >=(?0.h, 20), <=(?0.h, 30), SEARCH(?0.k, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)))))
Sometimes the RexNode is :
AND(?0.c, SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), =(?0.a, ?0.b), OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11), SEARCH(?0.k, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7))), AND(=(?0.j, 'Brand#13'), >=(?0.h, 10), <=(?0.h, 20), SEARCH(?0.k, Sarg['MED BOX':CHAR(8), 'MED CASE', 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8))), AND(=(?0.j, 'Brand#14'), >=(?0.h, 20), <=(?0.h, 30), SEARCH(?0.k, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)))))
@LakeShen, @JiajunBernoulli, I feel that the key point here is determinism for OR, or at least this is what I would search for in Jira, but the title does not really reflect it. What if we rephrased the ticket's title along the line of "RexUtil#pullFactors output's order should be deterministic even when the RexNode kind is OR"? |
…stic even when the RexNode kind is OR
9e46c9e
to
1f6bf26
Compare
Hi @asolimando ,thank you very much for your advice, I have rephrased the ticket's title. |
Hi @asolimando ,I modified PR according to your suggestion. If you have time, please help me approve the CI/CD workflow. If you have any suggestions, please let me know, I would really appreciate it. |
Kudos, SonarCloud Quality Gate passed! |
Hi @asolimando @libenchao @JiajunBernoulli ,how about this PR:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of the pending points @LakeShen, LGTM, will merge in around 24 hours if nobody else has more comments by then.
No description provided.