Skip to content

[CALCITE-4364] a IN (1, 2) AND a = 1 should be simplified to a = 1#2238

Merged
danny0405 merged 1 commit intoapache:masterfrom
danny0405:CALCITE-4364
Nov 12, 2020
Merged

[CALCITE-4364] a IN (1, 2) AND a = 1 should be simplified to a = 1#2238
danny0405 merged 1 commit intoapache:masterfrom
danny0405:CALCITE-4364

Conversation

@danny0405
Copy link
Contributor

No description provided.

@julianhyde
Copy link
Contributor

The fix doesn't seem right to me. We should merge two sargs if they apply to the same argument, regardless of whether they are all points.

Also, please apply the fix after CALCITE-4352.

@danny0405 danny0405 force-pushed the CALCITE-4364 branch 2 times, most recently from ba956b3 to 306544a Compare November 2, 2020 07:13
@danny0405
Copy link
Contributor Author

We should merge two sargs if they apply to the same argument

Totally agree, and actually this patch makes more sargs merged.

}
}

/** Checks whether it is worth to fix and convert to {@code SEARCH} calls. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Checks/Returns/; "Checks" often means that the method will throw if the check fails, whereas "Returns" is unambiguous.

Put the criteria in the javadoc, and describe the purpose of newTermsCnt.

I wouldn't abbreviate count to cnt. Save to characters, but convert a word into a non-word.

@julianhyde
Copy link
Contributor

Regarding the commit message:

a in (1, 2) and a = 1 should be simplified to a=1

Use upper-case for SQL, and spaces around =.

// Fix and converts to SEARCH if:
// 1. A Sarg has complexity greater than 1;
// 2. The terms are reduced as simpler Sarg points.
return map.values().stream().anyMatch(b -> b.complexity() > 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streams are cool but they are very expensive.
Probably we are in some hot path here.
Is it worth to use a simple loop?

Copy link
Contributor

@liyafan82 liyafan82 Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this suggestion.
By using a simple loop, we can combine the two stream expressions?
In addition, checking condition newTermsCnt == 1 is cheap, can we move it forward?

Copy link
Contributor Author

@danny0405 danny0405 Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry i found that it is hard to keep the decision branches in just one for loop and make the logic clear and clean. So i would not follow that.

final SargCollector sargCollector = new SargCollector(rexBuilder, true);
operands.forEach(t -> sargCollector.accept(t, terms));
if (sargCollector.map.values().stream().anyMatch(b -> b.complexity() > 1)) {
if (sargCollector.needToFix(terms.size())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we always convert to sarg format? Are there drawbacks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drawback is there are many unnecessary plan change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean unnecessary plan change?
Do you mean plan changes when compared with 1.25 or 1.26?

I guess 1.26 is not really viable (since there are significant issues with Sarg), so I would skip 1.26 from consideration with regard to plan changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, a simple $0=1 would be converted to Sarg which is meaningless.

@danny0405 danny0405 force-pushed the CALCITE-4364 branch 2 times, most recently from eec5954 to d5f20ff Compare November 3, 2020 03:07
checkSimplify2(e, "SEARCH(?0.int0, Sarg[10])", "=(?0.int0, 10)");
}

@Test void testSimplifyInAnd() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need another test case:

deptno in (20, 10) and deptno = 30
==> false?

@danny0405 danny0405 changed the title [CALCITE-4364] a in (1, 2) and a = 1 should be simplified to a=1 [CALCITE-4364] a IN (1, 2) AND a = 1 should be simplified to a = 1 Nov 3, 2020
@danny0405 danny0405 force-pushed the CALCITE-4364 branch 3 times, most recently from c1ff54f to 4c5ca31 Compare November 3, 2020 06:59
@danny0405
Copy link
Contributor Author

@vlsi Do you have other comments ? I'm planning to merge in the following 24 hours.

@vlsi
Copy link
Contributor

vlsi commented Nov 4, 2020

I believe the logic with needToFix makes no sense. Sarg replacement should be on per-sarg basis rather than "all or nothing"

@danny0405
Copy link
Contributor Author

danny0405 commented Nov 4, 2020

I believe the logic with needToFix makes no sense. Sarg replacement should be on per-sarg basis rather than "all or nothing"

I'm not sure, because the Sarg structure as an intermediate can make more simplification possibilities. So when there are fix condition match, it does not harm to try to make each term as a SEARCH Sarg if possible.

But anyway, the logic is already there and this PR is an improvement. So i would merge it soon.

eq(vInt(), literal(30))),
"false");
}

Copy link
Contributor

@godfreyhe godfreyhe Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find deptno > 0 or deptno in (20, 10) can' t be simplified as deptno > 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, i have updated the test case.

@vlsi
Copy link
Contributor

vlsi commented Nov 4, 2020

@danny0405 , before you commit more fixes to RexSimplify could I kindly ask you to review the regression which was introduced a year ago https://issues.apache.org/jira/browse/CALCITE-3457?focusedCommentId=17224481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17224481 ?

You've committed CALCITE-3457 and it turns out to cause AssertionErrors on certain inputs.
Would you please fix or revert 3457 before you proceed with committing more changes to RexSimplify?

java.lang.AssertionError
	at org.apache.calcite.rex.RexSimplify.validateStrongPolicy(RexSimplify.java:903)
	at org.apache.calcite.rex.RexSimplify.simplifyIs2(RexSimplify.java:746)
	at org.apache.calcite.rex.RexSimplify.simplifyIs1(RexSimplify.java:713)
	at org.apache.calcite.rex.RexSimplify.simplifyIs(RexSimplify.java:684)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:295)
	at org.apache.calcite.rex.RexSimplify.simplifyOrTerms(RexSimplify.java:567)
	at org.apache.calcite.rex.RexSimplify.simplifyOr(RexSimplify.java:1771)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:276)
	at org.apache.calcite.rex.RexSimplify.simplifyComparison(RexSimplify.java:368)
	at org.apache.calcite.rex.RexSimplify.simplifyComparison(RexSimplify.java:345)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:302)
	at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:243)
	at org.apache.calcite.rex.RexProgramTestBase.checkSimplify3_(RexProgramTestBase.java:134)
	at org.apache.calcite.rex.RexProgramTestBase.checkSimplify(RexProgramTestBase.java:93)
	at org.apache.calcite.rex.RexProgramTest.reproducerFor3457(RexProgramTest.java:514)

@danny0405
Copy link
Contributor Author

danny0405 commented Nov 5, 2020

@danny0405 , before you commit more fixes to RexSimplify could I kindly ask you to review the regression which was introduced a year ago https://issues.apache.org/jira/browse/CALCITE-3457?focusedCommentId=17224481&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17224481 ?

You've committed CALCITE-3457 and it turns out to cause AssertionErrors on certain inputs.
Would you please fix or revert 3457 before you proceed with committing more changes to RexSimplify?

java.lang.AssertionError
	at org.apache.calcite.rex.RexSimplify.validateStrongPolicy(RexSimplify.java:903)
	at org.apache.calcite.rex.RexSimplify.simplifyIs2(RexSimplify.java:746)
	at org.apache.calcite.rex.RexSimplify.simplifyIs1(RexSimplify.java:713)
	at org.apache.calcite.rex.RexSimplify.simplifyIs(RexSimplify.java:684)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:295)
	at org.apache.calcite.rex.RexSimplify.simplifyOrTerms(RexSimplify.java:567)
	at org.apache.calcite.rex.RexSimplify.simplifyOr(RexSimplify.java:1771)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:276)
	at org.apache.calcite.rex.RexSimplify.simplifyComparison(RexSimplify.java:368)
	at org.apache.calcite.rex.RexSimplify.simplifyComparison(RexSimplify.java:345)
	at org.apache.calcite.rex.RexSimplify.simplify(RexSimplify.java:302)
	at org.apache.calcite.rex.RexSimplify.simplifyUnknownAs(RexSimplify.java:243)
	at org.apache.calcite.rex.RexProgramTestBase.checkSimplify3_(RexProgramTestBase.java:134)
	at org.apache.calcite.rex.RexProgramTestBase.checkSimplify(RexProgramTestBase.java:93)
	at org.apache.calcite.rex.RexProgramTest.reproducerFor3457(RexProgramTest.java:514)

This bug was not a blocker of this one, i don't think we should fix it first before this patch. And this is open community, everyone can contribute if they have time to, although i'm the code reviewer, it does not mean i "have to" fix the bug and no one can ensure that he always commits no-bug codes.

@vlsi
Copy link
Contributor

vlsi commented Nov 5, 2020

3457 is very related because it was the reason to disable fuzzer testing which is a significant test case for RexSimplify logic.

For instance, you've just pushed canNotMerge which has comments like "if the semantics change".
How do you know you have test cases to cover the change?

@danny0405
Copy link
Contributor Author

3457 is very related because it was the reason to disable fuzzer testing which is a significant test case for RexSimplify logic.

For instance, you've just pushed canNotMerge which has comments like "if the semantics change".
How do you know you have test cases to cover the change?

RexProgramTest already does that.

Can we not make the fuzzer testing a random one ? It is hard to debug and figure out where is wrong. Although some stacktrace throws from 3457 code, that does not mean 3457's code is wrong. Each pr that changes the nullability can cause it fails.

or(ne(bRef, literal(1)),
eq(bRef, literal(1)));
checkSimplifyFilter(neOrEq, "OR(<>(?0.b, 1), =(?0.b, 1))");
checkSimplifyFilter(neOrEq, "SEARCH(?0.b, Sarg[NOT NULL])");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An improvement change.

le(vInt(), literal(1))))),
"AND(=(?0.int2, 2), OR(=(?0.int3, 3), AND(>=(?0.int0, 1), <=(?0.int0, 1))))",
"AND(=(?0.int2, 2), OR(=(?0.int3, 3), SEARCH(?0.int0, Sarg[1])))",
"AND(=(?0.int2, 2), OR(=(?0.int3, 3), =(?0.int0, 1)))");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An improvement change.

@vlsi
Copy link
Contributor

vlsi commented Nov 5, 2020

RexProgramTest already does that.

It has a very limited set of checks. Do you have an exhaustive check? I do not see that, so you can't claim RexProgramTest covers your changes.

Although some stacktrace throws from 3457 code, that does not mean 3457's code is wrong

The test case was added right after 3457 was merged. The test worked OK before 3754, and it started to fail after 3457. I would say it is very likely the test case failure is caused by 3457.

Can we not make the fuzzer testing a random one ? It is hard to debug and figure out where is wrong.

First of all, reproducerFor3457 is a static test case, with no randomization at all.
Then, the removal of random from fuzzer would ruin the purpose of the test: it randomizes input data, so it explores the search space better. As it fails, it prints exact source code that re-creates the problematic expression.
You copy-paste it to a separate test method and that that is it. That was the way I created reproducerFor3457.
An alternative option is to put the seed into singleFuzzyTest. Then you have a single test which you could.

Have you tried that? Could you clarify what is the exact issue you have with reproducing/debugging the issues?

@danny0405
Copy link
Contributor Author

RexProgramTest already does that.

It has a very limited set of checks. Do you have an exhaustive check? I do not see that, so you can't claim RexProgramTest covers your changes.

Although some stacktrace throws from 3457 code, that does not mean 3457's code is wrong

The test case was added right after 3457 was merged. The test worked OK before 3754, and it started to fail after 3457. I would say it is very likely the test case failure is caused by 3457.

Can we not make the fuzzer testing a random one ? It is hard to debug and figure out where is wrong.

First of all, reproducerFor3457 is a static test case, with no randomization at all.
Then, the removal of random from fuzzer would ruin the purpose of the test: it randomizes input data, so it explores the search space better. As it fails, it prints exact source code that re-creates the problematic expression.
You copy-paste it to a separate test method and that that is it. That was the way I created reproducerFor3457.
An alternative option is to put the seed into singleFuzzyTest. Then you have a single test which you could.

Have you tried that? Could you clarify what is the exact issue you have with reproducing/debugging the issues?

I don't want to argue something, please review if you have time https://github.com/apache/calcite/pull/2246/files.

@vlsi
Copy link
Contributor

vlsi commented Nov 5, 2020

I don't want to argue something, please review if you have time https://github.com/apache/calcite/pull/2246/files.

That looks good.

Would you please fix npe in sarg as well?

@julianhyde
Copy link
Contributor

@danny0405 The comment threads on this PR are too long and too forked. I can't see what is the current consensus. I have concerns about generating a large number of diffs.

Please move discussion to the JIRA case, and do not merge until we have consensus there.

@danny0405 danny0405 force-pushed the CALCITE-4364 branch 3 times, most recently from d36d91a to 74ec77a Compare November 6, 2020 06:38
public static <C extends Comparable<C>> boolean isOpenInterval(RangeSet<C> rangeSet) {
final Set<Range<C>> ranges = rangeSet.asRanges();
final Range<C> range = ranges.iterator().next();
return ranges.size() == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will throw if ranges.isEmpty() because you iterate before you check size. Add a test where this is called on an empty range set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, have added the tests.

* Returns whether this Sarg can be expanded to more simple form, e.g.
* the IN call or single comparison.
*/
public boolean isSimple() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no objective definition of 'simple'. I would claim that 'x <> 5' is simple, and so is 'x IS NOT NULL'. You might disagree.

So, this method doesn't belong on Sarg. It belongs in whichever piece of code needs a particular definition of 'simple'.

"AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))";
final String simplified = "AND(SEARCH($0, Sarg[(0..10)]), IS NOT NULL($1))";
final String expanded = "AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))";
checkSimplify(expr, simplified)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should flatten those ANDs in AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))? I know it might be difficult to achieve efficiently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, have promoted it in the new commit.

@danny0405 danny0405 merged commit 5e9943a into apache:master Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants