[CALCITE-4364] `a IN (1, 2) AND a = 1` should be simplified to `a = 1` by danny0405 · Pull Request #2238 · apache/calcite

danny0405 · 2020-10-30T11:19:41Z

No description provided.

julianhyde · 2020-11-01T04:50:17Z

The fix doesn't seem right to me. We should merge two sargs if they apply to the same argument, regardless of whether they are all points.

Also, please apply the fix after CALCITE-4352.

danny0405 · 2020-11-02T07:24:49Z

We should merge two sargs if they apply to the same argument

Totally agree, and actually this patch makes more sargs merged.

julianhyde · 2020-11-02T20:13:35Z

core/src/main/java/org/apache/calcite/rex/RexSimplify.java

      }
    }

+    /** Checks whether it is worth to fix and convert to {@code SEARCH} calls. */


s/Checks/Returns/; "Checks" often means that the method will throw if the check fails, whereas "Returns" is unambiguous.

Put the criteria in the javadoc, and describe the purpose of newTermsCnt.

I wouldn't abbreviate count to cnt. Save to characters, but convert a word into a non-word.

julianhyde · 2020-11-02T20:17:07Z

Regarding the commit message:

a in (1, 2) and a = 1 should be simplified to a=1

Use upper-case for SQL, and spaces around =.

eolivelli · 2020-11-02T20:23:07Z

core/src/main/java/org/apache/calcite/rex/RexSimplify.java

+      // Fix and converts to SEARCH if:
+      // 1. A Sarg has complexity greater than 1;
+      // 2. The terms are reduced as simpler Sarg points.
+      return map.values().stream().anyMatch(b -> b.complexity() > 1)


Streams are cool but they are very expensive.
Probably we are in some hot path here.
Is it worth to use a simple loop?

+1 for this suggestion.
By using a simple loop, we can combine the two stream expressions?
In addition, checking condition newTermsCnt == 1 is cheap, can we move it forward?

Sorry i found that it is hard to keep the decision branches in just one for loop and make the logic clear and clean. So i would not follow that.

vlsi · 2020-11-02T20:51:23Z

core/src/main/java/org/apache/calcite/rex/RexSimplify.java

    final SargCollector sargCollector = new SargCollector(rexBuilder, true);
    operands.forEach(t -> sargCollector.accept(t, terms));
-    if (sargCollector.map.values().stream().anyMatch(b -> b.complexity() > 1)) {
+    if (sargCollector.needToFix(terms.size())) {


What if we always convert to sarg format? Are there drawbacks?

The drawback is there are many unnecessary plan change.

What do you mean unnecessary plan change?
Do you mean plan changes when compared with 1.25 or 1.26?

I guess 1.26 is not really viable (since there are significant issues with Sarg), so I would skip 1.26 from consideration with regard to plan changes.

For example, a simple $0=1 would be converted to Sarg which is meaningless.

liyafan82 · 2020-11-03T03:18:18Z

core/src/test/java/org/apache/calcite/rex/RexProgramTest.java

    checkSimplify2(e, "SEARCH(?0.int0, Sarg[10])", "=(?0.int0, 10)");
  }

+  @Test void testSimplifyInAnd() {


Maybe we need another test case:

deptno in (20, 10) and deptno = 30
==> false?

danny0405 · 2020-11-04T08:45:37Z

@vlsi Do you have other comments ? I'm planning to merge in the following 24 hours.

vlsi · 2020-11-04T08:49:49Z

I believe the logic with needToFix makes no sense. Sarg replacement should be on per-sarg basis rather than "all or nothing"

danny0405 · 2020-11-04T09:20:01Z

I believe the logic with needToFix makes no sense. Sarg replacement should be on per-sarg basis rather than "all or nothing"

I'm not sure, because the Sarg structure as an intermediate can make more simplification possibilities. So when there are fix condition match, it does not harm to try to make each term as a SEARCH Sarg if possible.

But anyway, the logic is already there and this PR is an improvement. So i would merge it soon.

godfreyhe · 2020-11-04T12:38:19Z

core/src/test/java/org/apache/calcite/rex/RexProgramTest.java

+        eq(vInt(), literal(30))),
+        "false");
+  }
+


I find deptno > 0 or deptno in (20, 10) can' t be simplified as deptno > 0

Thanks, i have updated the test case.

vlsi · 2020-11-04T19:45:42Z

@danny0405 , before you commit more fixes to RexSimplify could I kindly ask you to review the regression which was introduced a year ago https://issues.apache.org/jira/browse/CALCITE-3457?focusedCommentId=17224481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17224481 ?

You've committed CALCITE-3457 and it turns out to cause AssertionErrors on certain inputs.
Would you please fix or revert 3457 before you proceed with committing more changes to RexSimplify?

java.lang.AssertionError
	at org.apache.calcite.rex.RexSimplify.validateStrongPolicy(RexSimplify.java:903)
	at org.apache.calcite.rex.RexSimplify.simplifyIs2(RexSimplify.java:746)
	at org.apache.calcite.rex.RexSimplify.simplifyIs1(RexSimplify.java:713)
	at org.apache.calcite.rex.RexSimplify.simplifyIs(RexSimplify.java:684)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:295)
	at org.apache.calcite.rex.RexSimplify.simplifyOrTerms(RexSimplify.java:567)
	at org.apache.calcite.rex.RexSimplify.simplifyOr(RexSimplify.java:1771)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:276)
	at org.apache.calcite.rex.RexSimplify.simplifyComparison(RexSimplify.java:368)
	at org.apache.calcite.rex.RexSimplify.simplifyComparison(RexSimplify.java:345)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:302)
	at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:243)
	at org.apache.calcite.rex.RexProgramTestBase.checkSimplify3_(RexProgramTestBase.java:134)
	at org.apache.calcite.rex.RexProgramTestBase.checkSimplify(RexProgramTestBase.java:93)
	at org.apache.calcite.rex.RexProgramTest.reproducerFor3457(RexProgramTest.java:514)

danny0405 · 2020-11-05T07:10:07Z

@danny0405 , before you commit more fixes to RexSimplify could I kindly ask you to review the regression which was introduced a year ago https://issues.apache.org/jira/browse/CALCITE-3457?focusedCommentId=17224481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17224481 ?

You've committed CALCITE-3457 and it turns out to cause AssertionErrors on certain inputs.
Would you please fix or revert 3457 before you proceed with committing more changes to RexSimplify?

java.lang.AssertionError
	at org.apache.calcite.rex.RexSimplify.validateStrongPolicy(RexSimplify.java:903)
	at org.apache.calcite.rex.RexSimplify.simplifyIs2(RexSimplify.java:746)
	at org.apache.calcite.rex.RexSimplify.simplifyIs1(RexSimplify.java:713)
	at org.apache.calcite.rex.RexSimplify.simplifyIs(RexSimplify.java:684)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:295)
	at org.apache.calcite.rex.RexSimplify.simplifyOrTerms(RexSimplify.java:567)
	at org.apache.calcite.rex.RexSimplify.simplifyOr(RexSimplify.java:1771)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:276)
	at org.apache.calcite.rex.RexSimplify.simplifyComparison(RexSimplify.java:368)
	at org.apache.calcite.rex.RexSimplify.simplifyComparison(RexSimplify.java:345)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:302)
	at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:243)
	at org.apache.calcite.rex.RexProgramTestBase.checkSimplify3_(RexProgramTestBase.java:134)
	at org.apache.calcite.rex.RexProgramTestBase.checkSimplify(RexProgramTestBase.java:93)
	at org.apache.calcite.rex.RexProgramTest.reproducerFor3457(RexProgramTest.java:514)

This bug was not a blocker of this one, i don't think we should fix it first before this patch. And this is open community, everyone can contribute if they have time to, although i'm the code reviewer, it does not mean i "have to" fix the bug and no one can ensure that he always commits no-bug codes.

vlsi · 2020-11-05T07:19:54Z

3457 is very related because it was the reason to disable fuzzer testing which is a significant test case for RexSimplify logic.

For instance, you've just pushed canNotMerge which has comments like "if the semantics change".
How do you know you have test cases to cover the change?

danny0405 · 2020-11-05T07:34:35Z

3457 is very related because it was the reason to disable fuzzer testing which is a significant test case for RexSimplify logic.

For instance, you've just pushed canNotMerge which has comments like "if the semantics change".
How do you know you have test cases to cover the change?

RexProgramTest already does that.

Can we not make the fuzzer testing a random one ? It is hard to debug and figure out where is wrong. Although some stacktrace throws from 3457 code, that does not mean 3457's code is wrong. Each pr that changes the nullability can cause it fails.

danny0405 · 2020-11-05T07:41:45Z

core/src/test/java/org/apache/calcite/rex/RexProgramTest.java

        or(ne(bRef, literal(1)),
            eq(bRef, literal(1)));
-    checkSimplifyFilter(neOrEq, "OR(<>(?0.b, 1), =(?0.b, 1))");
+    checkSimplifyFilter(neOrEq, "SEARCH(?0.b, Sarg[NOT NULL])");


An improvement change.

danny0405 · 2020-11-05T07:41:57Z

core/src/test/java/org/apache/calcite/rex/RexProgramTest.java

                    le(vInt(), literal(1))))),
-        "AND(=(?0.int2, 2), OR(=(?0.int3, 3), AND(>=(?0.int0, 1), <=(?0.int0, 1))))",
+        "AND(=(?0.int2, 2), OR(=(?0.int3, 3), SEARCH(?0.int0, Sarg[1])))",
        "AND(=(?0.int2, 2), OR(=(?0.int3, 3), =(?0.int0, 1)))");


An improvement change.

core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml

vlsi · 2020-11-05T07:49:55Z

RexProgramTest already does that.

It has a very limited set of checks. Do you have an exhaustive check? I do not see that, so you can't claim RexProgramTest covers your changes.

Although some stacktrace throws from 3457 code, that does not mean 3457's code is wrong

The test case was added right after 3457 was merged. The test worked OK before 3754, and it started to fail after 3457. I would say it is very likely the test case failure is caused by 3457.

Can we not make the fuzzer testing a random one ? It is hard to debug and figure out where is wrong.

First of all, reproducerFor3457 is a static test case, with no randomization at all.
Then, the removal of random from fuzzer would ruin the purpose of the test: it randomizes input data, so it explores the search space better. As it fails, it prints exact source code that re-creates the problematic expression.
You copy-paste it to a separate test method and that that is it. That was the way I created reproducerFor3457.
An alternative option is to put the seed into singleFuzzyTest. Then you have a single test which you could.

Have you tried that? Could you clarify what is the exact issue you have with reproducing/debugging the issues?

danny0405 · 2020-11-05T08:13:33Z

RexProgramTest already does that.

It has a very limited set of checks. Do you have an exhaustive check? I do not see that, so you can't claim RexProgramTest covers your changes.

Although some stacktrace throws from 3457 code, that does not mean 3457's code is wrong

The test case was added right after 3457 was merged. The test worked OK before 3754, and it started to fail after 3457. I would say it is very likely the test case failure is caused by 3457.

Can we not make the fuzzer testing a random one ? It is hard to debug and figure out where is wrong.

First of all, reproducerFor3457 is a static test case, with no randomization at all.
Then, the removal of random from fuzzer would ruin the purpose of the test: it randomizes input data, so it explores the search space better. As it fails, it prints exact source code that re-creates the problematic expression.
You copy-paste it to a separate test method and that that is it. That was the way I created reproducerFor3457.
An alternative option is to put the seed into singleFuzzyTest. Then you have a single test which you could.

Have you tried that? Could you clarify what is the exact issue you have with reproducing/debugging the issues?

I don't want to argue something, please review if you have time https://github.com/apache/calcite/pull/2246/files.

vlsi · 2020-11-05T09:32:12Z

I don't want to argue something, please review if you have time https://github.com/apache/calcite/pull/2246/files.

That looks good.

Would you please fix npe in sarg as well?

julianhyde · 2020-11-05T21:37:09Z

@danny0405 The comment threads on this PR are too long and too forked. I can't see what is the current consensus. I have concerns about generating a large number of diffs.

Please move discussion to the JIRA case, and do not merge until we have consensus there.

julianhyde · 2020-11-09T18:50:29Z

core/src/main/java/org/apache/calcite/util/RangeSets.java

+  public static <C extends Comparable<C>> boolean isOpenInterval(RangeSet<C> rangeSet) {
+    final Set<Range<C>> ranges = rangeSet.asRanges();
+    final Range<C> range = ranges.iterator().next();
+    return ranges.size() == 1


Will throw if ranges.isEmpty() because you iterate before you check size. Add a test where this is called on an empty range set.

Agree, have added the tests.

julianhyde · 2020-11-09T18:53:13Z

core/src/main/java/org/apache/calcite/util/Sarg.java

+   * Returns whether this Sarg can be expanded to more simple form, e.g.
+   * the IN call or single comparison.
+   */
+  public boolean isSimple() {


There is no objective definition of 'simple'. I would claim that 'x <> 5' is simple, and so is 'x IS NOT NULL'. You might disagree.

So, this method doesn't belong on Sarg. It belongs in whichever piece of code needs a particular definition of 'simple'.

julianhyde · 2020-11-09T18:57:02Z

core/src/test/java/org/apache/calcite/rex/RexProgramTest.java

-        "AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))";
+    final String simplified = "AND(SEARCH($0, Sarg[(0..10)]), IS NOT NULL($1))";
+    final String expanded = "AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))";
    checkSimplify(expr, simplified)


Do you think we should flatten those ANDs in AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))? I know it might be difficult to achieve efficiently.

I agree, have promoted it in the new commit.

danny0405 force-pushed the CALCITE-4364 branch 2 times, most recently from ba956b3 to 306544a Compare November 2, 2020 07:13

julianhyde reviewed Nov 2, 2020

View reviewed changes

eolivelli reviewed Nov 2, 2020

View reviewed changes

vlsi reviewed Nov 2, 2020

View reviewed changes

danny0405 force-pushed the CALCITE-4364 branch 2 times, most recently from eec5954 to d5f20ff Compare November 3, 2020 03:07

liyafan82 reviewed Nov 3, 2020

View reviewed changes

danny0405 changed the title ~~[CALCITE-4364] a in (1, 2) and a = 1 should be simplified to a=1~~ [CALCITE-4364] a IN (1, 2) AND a = 1 should be simplified to a = 1 Nov 3, 2020

danny0405 force-pushed the CALCITE-4364 branch 3 times, most recently from c1ff54f to 4c5ca31 Compare November 3, 2020 06:59

godfreyhe reviewed Nov 4, 2020

View reviewed changes

danny0405 force-pushed the CALCITE-4364 branch from 4c5ca31 to 061ba0a Compare November 5, 2020 07:02

danny0405 commented Nov 5, 2020

View reviewed changes

core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml Show resolved Hide resolved

danny0405 force-pushed the CALCITE-4364 branch from 061ba0a to 2d3554f Compare November 5, 2020 08:34

danny0405 force-pushed the CALCITE-4364 branch 3 times, most recently from d36d91a to 74ec77a Compare November 6, 2020 06:38

julianhyde reviewed Nov 9, 2020

View reviewed changes

[CALCITE-4364] a IN (1, 2) AND a = 1 should be simplified to a = 1

117c44e

danny0405 force-pushed the CALCITE-4364 branch from 6ac6756 to 117c44e Compare November 12, 2020 04:08

danny0405 merged commit 5e9943a into apache:master Nov 12, 2020

Conversation

danny0405 commented Oct 30, 2020

Uh oh!

julianhyde commented Nov 1, 2020

Uh oh!

danny0405 commented Nov 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julianhyde commented Nov 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liyafan82 Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 commented Nov 4, 2020

Uh oh!

vlsi commented Nov 4, 2020

Uh oh!

danny0405 commented Nov 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

godfreyhe Nov 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vlsi commented Nov 4, 2020

Uh oh!

danny0405 commented Nov 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vlsi commented Nov 5, 2020

Uh oh!

danny0405 commented Nov 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vlsi commented Nov 5, 2020

Uh oh!

danny0405 commented Nov 5, 2020

Uh oh!

vlsi commented Nov 5, 2020

Uh oh!

julianhyde commented Nov 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

liyafan82 Nov 3, 2020 •

edited

Loading

danny0405 Nov 5, 2020 •

edited

Loading

danny0405 commented Nov 4, 2020 •

edited

Loading

godfreyhe Nov 4, 2020 •

edited

Loading

danny0405 commented Nov 5, 2020 •

edited

Loading