[CALCITE-4364] a IN (1, 2) AND a = 1 should be simplified to a = 1#2238
[CALCITE-4364] a IN (1, 2) AND a = 1 should be simplified to a = 1#2238danny0405 merged 1 commit intoapache:masterfrom
a IN (1, 2) AND a = 1 should be simplified to a = 1#2238Conversation
|
The fix doesn't seem right to me. We should merge two sargs if they apply to the same argument, regardless of whether they are all points. Also, please apply the fix after CALCITE-4352. |
ba956b3 to
306544a
Compare
Totally agree, and actually this patch makes more sargs merged. |
| } | ||
| } | ||
|
|
||
| /** Checks whether it is worth to fix and convert to {@code SEARCH} calls. */ |
There was a problem hiding this comment.
s/Checks/Returns/; "Checks" often means that the method will throw if the check fails, whereas "Returns" is unambiguous.
Put the criteria in the javadoc, and describe the purpose of newTermsCnt.
I wouldn't abbreviate count to cnt. Save to characters, but convert a word into a non-word.
|
Regarding the commit message:
Use upper-case for SQL, and spaces around |
| // Fix and converts to SEARCH if: | ||
| // 1. A Sarg has complexity greater than 1; | ||
| // 2. The terms are reduced as simpler Sarg points. | ||
| return map.values().stream().anyMatch(b -> b.complexity() > 1) |
There was a problem hiding this comment.
Streams are cool but they are very expensive.
Probably we are in some hot path here.
Is it worth to use a simple loop?
There was a problem hiding this comment.
+1 for this suggestion.
By using a simple loop, we can combine the two stream expressions?
In addition, checking condition newTermsCnt == 1 is cheap, can we move it forward?
There was a problem hiding this comment.
Sorry i found that it is hard to keep the decision branches in just one for loop and make the logic clear and clean. So i would not follow that.
| final SargCollector sargCollector = new SargCollector(rexBuilder, true); | ||
| operands.forEach(t -> sargCollector.accept(t, terms)); | ||
| if (sargCollector.map.values().stream().anyMatch(b -> b.complexity() > 1)) { | ||
| if (sargCollector.needToFix(terms.size())) { |
There was a problem hiding this comment.
What if we always convert to sarg format? Are there drawbacks?
There was a problem hiding this comment.
The drawback is there are many unnecessary plan change.
There was a problem hiding this comment.
What do you mean unnecessary plan change?
Do you mean plan changes when compared with 1.25 or 1.26?
I guess 1.26 is not really viable (since there are significant issues with Sarg), so I would skip 1.26 from consideration with regard to plan changes.
There was a problem hiding this comment.
For example, a simple $0=1 would be converted to Sarg which is meaningless.
eec5954 to
d5f20ff
Compare
| checkSimplify2(e, "SEARCH(?0.int0, Sarg[10])", "=(?0.int0, 10)"); | ||
| } | ||
|
|
||
| @Test void testSimplifyInAnd() { |
There was a problem hiding this comment.
Maybe we need another test case:
deptno in (20, 10) and deptno = 30
==> false?
a in (1, 2) and a = 1 should be simplified to a=1a IN (1, 2) AND a = 1 should be simplified to a = 1
c1ff54f to
4c5ca31
Compare
|
@vlsi Do you have other comments ? I'm planning to merge in the following 24 hours. |
|
I believe the logic with |
I'm not sure, because the But anyway, the logic is already there and this PR is an improvement. So i would merge it soon. |
| eq(vInt(), literal(30))), | ||
| "false"); | ||
| } | ||
|
|
There was a problem hiding this comment.
I find deptno > 0 or deptno in (20, 10) can' t be simplified as deptno > 0
There was a problem hiding this comment.
Thanks, i have updated the test case.
|
@danny0405 , before you commit more fixes to You've committed CALCITE-3457 and it turns out to cause AssertionErrors on certain inputs. |
4c5ca31 to
061ba0a
Compare
This bug was not a blocker of this one, i don't think we should fix it first before this patch. And this is open community, everyone can contribute if they have time to, although i'm the code reviewer, it does not mean i "have to" fix the bug and no one can ensure that he always commits no-bug codes. |
|
3457 is very related because it was the reason to disable fuzzer testing which is a significant test case for RexSimplify logic. For instance, you've just pushed |
Can we not make the fuzzer testing a random one ? It is hard to debug and figure out where is wrong. Although some stacktrace throws from 3457 code, that does not mean 3457's code is wrong. Each pr that changes the nullability can cause it fails. |
| or(ne(bRef, literal(1)), | ||
| eq(bRef, literal(1))); | ||
| checkSimplifyFilter(neOrEq, "OR(<>(?0.b, 1), =(?0.b, 1))"); | ||
| checkSimplifyFilter(neOrEq, "SEARCH(?0.b, Sarg[NOT NULL])"); |
There was a problem hiding this comment.
An improvement change.
| le(vInt(), literal(1))))), | ||
| "AND(=(?0.int2, 2), OR(=(?0.int3, 3), AND(>=(?0.int0, 1), <=(?0.int0, 1))))", | ||
| "AND(=(?0.int2, 2), OR(=(?0.int3, 3), SEARCH(?0.int0, Sarg[1])))", | ||
| "AND(=(?0.int2, 2), OR(=(?0.int3, 3), =(?0.int0, 1)))"); |
There was a problem hiding this comment.
An improvement change.
It has a very limited set of checks. Do you have an exhaustive check? I do not see that, so you can't claim
The test case was added right after 3457 was merged. The test worked OK before 3754, and it started to fail after 3457. I would say it is very likely the test case failure is caused by 3457.
First of all, Have you tried that? Could you clarify what is the exact issue you have with reproducing/debugging the issues? |
I don't want to argue something, please review if you have time https://github.com/apache/calcite/pull/2246/files. |
061ba0a to
2d3554f
Compare
That looks good. Would you please fix npe in sarg as well? |
|
@danny0405 The comment threads on this PR are too long and too forked. I can't see what is the current consensus. I have concerns about generating a large number of diffs. Please move discussion to the JIRA case, and do not merge until we have consensus there. |
d36d91a to
74ec77a
Compare
| public static <C extends Comparable<C>> boolean isOpenInterval(RangeSet<C> rangeSet) { | ||
| final Set<Range<C>> ranges = rangeSet.asRanges(); | ||
| final Range<C> range = ranges.iterator().next(); | ||
| return ranges.size() == 1 |
There was a problem hiding this comment.
Will throw if ranges.isEmpty() because you iterate before you check size. Add a test where this is called on an empty range set.
There was a problem hiding this comment.
Agree, have added the tests.
| * Returns whether this Sarg can be expanded to more simple form, e.g. | ||
| * the IN call or single comparison. | ||
| */ | ||
| public boolean isSimple() { |
There was a problem hiding this comment.
There is no objective definition of 'simple'. I would claim that 'x <> 5' is simple, and so is 'x IS NOT NULL'. You might disagree.
So, this method doesn't belong on Sarg. It belongs in whichever piece of code needs a particular definition of 'simple'.
| "AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))"; | ||
| final String simplified = "AND(SEARCH($0, Sarg[(0..10)]), IS NOT NULL($1))"; | ||
| final String expanded = "AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))"; | ||
| checkSimplify(expr, simplified) |
There was a problem hiding this comment.
Do you think we should flatten those ANDs in AND(AND(>($0, 0), <($0, 10)), IS NOT NULL($1))? I know it might be difficult to achieve efficiently.
There was a problem hiding this comment.
I agree, have promoted it in the new commit.
6ac6756 to
117c44e
Compare
No description provided.