Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-5837] RexUtil#pullFactors output's order should be deterministic even when the RexNode kind is OR #3316

Merged
merged 1 commit into from Aug 2, 2023

Conversation

LakeShen
Copy link
Contributor

No description provided.

Copy link
Member

@libenchao libenchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes looks good to me, I only left one minor comment. And I've approved the CI for the first-time contributor.

core/src/main/java/org/apache/calcite/rex/RexUtil.java Outdated Show resolved Hide resolved
@LakeShen LakeShen changed the title [CALCITE-5837] make RexUtil#pullFactors method RexNode condition ordered when input RexNode's kind is OR [CALCITE-5837] Make RexUtil#pullFactors method RexNode condition ordered when input RexNode's kind is OR Jul 14, 2023
@LakeShen
Copy link
Contributor Author

The changes looks good to me, I only left one minor comment. And I've approved the CI for the first-time contributor.

Hi @libenchao,thank you very much for reviewing my PR. I have modified the code based on your suggestions

I aslo change the PR commits,make the commits start with upper-case letter

@LakeShen LakeShen force-pushed the fix_CALCITE-5837 branch 4 times, most recently from fc0dbe9 to 9e46c9e Compare July 18, 2023 19:40
@LakeShen
Copy link
Contributor Author

Hi @libenchao , @asolimando ,could you help me approve the workflow job.

@LakeShen
Copy link
Contributor Author

I'm very grateful for that If anyone can help me approve the workflow job. :)


"AND(=(?0.a, ?0.b), ?0.c, "
+ "SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), "
+ "OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11),"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good.

@LakeShen I have a small question.
If it's out of order, what might it look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good.

@LakeShen I have a small question. If it's out of order, what might it look like?

Hi @JiajunBernoulli ,thank you very much for reviewing my pr.
Let me explain this using tpc-h q19.
Sometimes the sql plan is :

LogicalAggregate(group=[{}], revenue=[SUM($0)])
  LogicalProject($f0=[*($5, -(1, $6))])
    LogicalFilter(condition=[AND(=($16, $1), SEARCH($14, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), =($13, 'DELIVER IN PERSON'), OR(AND(=($19, 'Brand#12'), SEARCH($22, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7)), >=($4, 1), <=($4, +(1, 10)), SEARCH($21, Sarg[[1..5]])), AND(=($19, 'Brand#23'), SEARCH($22, Sarg['MED BAG':CHAR(8), 'MED BOX':CHAR(8), 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8)), >=($4, 10), <=($4, +(10, 10)), SEARCH($21, Sarg[[1..10]])), AND(=($19, 'Brand#34'), SEARCH($22, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)), >=($4, 20), <=($4, +(20, 10)), SEARCH($21, Sarg[[1..15]]))))])
      LogicalJoin(condition=[true], joinType=[inner])
        LogicalTableScan(table=[[tpch, LINEITEM]])
        LogicalTableScan(table=[[tpch, PART]])

Sometimes the sql plan is :

LogicalAggregate(group=[{}], revenue=[SUM($0)])
  LogicalProject($f0=[*($5, -(1, $6))])
    LogicalFilter(condition=[AND(=($16, $1), =($13, 'DELIVER IN PERSON'), SEARCH($14, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), OR(AND(=($19, 'Brand#12'), SEARCH($22, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7)), >=($4, 1), <=($4, +(1, 10)), SEARCH($21, Sarg[[1..5]])), AND(=($19, 'Brand#23'), SEARCH($22, Sarg['MED BAG':CHAR(8), 'MED BOX':CHAR(8), 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8)), >=($4, 10), <=($4, +(10, 10)), SEARCH($21, Sarg[[1..10]])), AND(=($19, 'Brand#34'), SEARCH($22, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)), >=($4, 20), <=($4, +(20, 10)), SEARCH($21, Sarg[[1..15]]))))])
      LogicalJoin(condition=[true], joinType=[inner])
        LogicalTableScan(table=[[tpch, LINEITEM]])
        LogicalTableScan(table=[[tpch, PART]])

Although the content of the conditions in the calcite single test and the tpch q19 conditions above were not 100% identical in text, the overall conditions were in the same format.

This has no effect on the SQL execution results, but it is difficult for me to monitor my plan because of the variability of the plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good.

@LakeShen I have a small question. If it's out of order, what might it look like?

Hi @JiajunBernoulli ,I have reproduced it locally using my test case.
Sometimes the RexNode is :

AND(=(?0.a, ?0.b), SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), ?0.c, OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11), SEARCH(?0.k, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7))), AND(=(?0.j, 'Brand#13'), >=(?0.h, 10), <=(?0.h, 20), SEARCH(?0.k, Sarg['MED BOX':CHAR(8), 'MED CASE', 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8))), AND(=(?0.j, 'Brand#14'), >=(?0.h, 20), <=(?0.h, 30), SEARCH(?0.k, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)))))

Sometimes the RexNode is :

AND(?0.c, SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), =(?0.a, ?0.b), OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11), SEARCH(?0.k, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7))), AND(=(?0.j, 'Brand#13'), >=(?0.h, 10), <=(?0.h, 20), SEARCH(?0.k, Sarg['MED BOX':CHAR(8), 'MED CASE', 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8))), AND(=(?0.j, 'Brand#14'), >=(?0.h, 20), <=(?0.h, 30), SEARCH(?0.k, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)))))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good.

@LakeShen I have a small question. If it's out of order, what might it look like?

Hi @JiajunBernoulli ,I have reproduced it locally using my test case.
Sometimes the RexNode is :

AND(=(?0.a, ?0.b), SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), ?0.c, OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11), SEARCH(?0.k, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7))), AND(=(?0.j, 'Brand#13'), >=(?0.h, 10), <=(?0.h, 20), SEARCH(?0.k, Sarg['MED BOX':CHAR(8), 'MED CASE', 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8))), AND(=(?0.j, 'Brand#14'), >=(?0.h, 20), <=(?0.h, 30), SEARCH(?0.k, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)))))

Sometimes the RexNode is :

AND(?0.c, SEARCH(?0.i, Sarg['AIR':CHAR(7), 'AIR REG']:CHAR(7)), =(?0.a, ?0.b), OR(AND(=(?0.j, 'Brand#12'), >=(?0.h, 1), <=(?0.h, 11), SEARCH(?0.k, Sarg['SM BOX':CHAR(7), 'SM CASE', 'SM PACK', 'SM PKG':CHAR(7)]:CHAR(7))), AND(=(?0.j, 'Brand#13'), >=(?0.h, 10), <=(?0.h, 20), SEARCH(?0.k, Sarg['MED BOX':CHAR(8), 'MED CASE', 'MED PACK', 'MED PKG':CHAR(8)]:CHAR(8))), AND(=(?0.j, 'Brand#14'), >=(?0.h, 20), <=(?0.h, 30), SEARCH(?0.k, Sarg['LG BOX':CHAR(7), 'LG CASE', 'LG PACK', 'LG PKG':CHAR(7)]:CHAR(7)))))

@asolimando
Copy link
Member

@LakeShen, @JiajunBernoulli, I feel that the key point here is determinism for OR, or at least this is what I would search for in Jira, but the title does not really reflect it.

What if we rephrased the ticket's title along the line of "RexUtil#pullFactors output's order should be deterministic even when the RexNode kind is OR"?

@LakeShen LakeShen changed the title [CALCITE-5837] Make RexUtil#pullFactors method RexNode condition ordered when input RexNode's kind is OR [CALCITE-5837] RexUtil#pullFactors output's order should be deterministic even when the RexNode kind is OR Jul 31, 2023
@LakeShen
Copy link
Contributor Author

@LakeShen, @JiajunBernoulli, I feel that the key point here is determinism for OR, or at least this is what I would search for in Jira, but the title does not really reflect it.

What if we rephrased the ticket's title along the line of "RexUtil#pullFactors output's order should be deterministic even when the RexNode kind is OR"?

Hi @asolimando ,thank you very much for your advice, I have rephrased the ticket's title.

@LakeShen
Copy link
Contributor Author

Hi @asolimando ,I modified PR according to your suggestion. If you have time, please help me approve the CI/CD workflow.

If you have any suggestions, please let me know, I would really appreciate it.

@sonarcloud
Copy link

sonarcloud bot commented Jul 31, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@LakeShen
Copy link
Contributor Author

LakeShen commented Aug 1, 2023

Hi @asolimando @libenchao @JiajunBernoulli ,how about this PR:)
If you have any suggestions, please let me know, I would really appreciate it.

Copy link
Member

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of the pending points @LakeShen, LGTM, will merge in around 24 hours if nobody else has more comments by then.

@asolimando asolimando merged commit 11546ce into apache:main Aug 2, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants