-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-2838] Simplification: Remove redundant IS TRUE/IS NOT FALSE … #1044
Conversation
if (kind == SqlKind.IS_NOT_FALSE && unknownAs == RexUnknownAs.TRUE) { | ||
return simplify(argument, unknownAs); | ||
} | ||
final RexNode a = simplify(call.getOperands().get(0), RexUnknownAs.UNKNOWN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does look like O(N^2)
.
Can you please avoid simplifying the same nodes again and again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this shouldn't be N^2
; because it actually drops the "IS X" part and continues with the recursion on the argument.
I've deliberitely placed the existing simplify
call after the checks of redundancy.
Thank you for taking a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess simplifyIs
is used in simplifyAnd2 as well, so a generic simplify
looks very odd inside simplifyIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh...it seems simplifyIs2
is also invoking simplify
- I think recursion should happen as the first thing when entering sub-simplification methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgyrtkirk , I agree: https://issues.apache.org/jira/browse/CALCITE-2449 :)
Hey @kgyrtkirk , I see you have a nice PR here. What do you think on Would you please avoid adding Thanks |
I kinda went a little defensive after the jira comments :) |
Well, in the code you do hardcode IS TRUE/IS NOT FALSE, and you had nothing regarding IS NOT TRUE/IS FALSE. I think the rest operations should be covered either in the code or in comments that explain why handing those makes no sense. For instance, if replacement of IS_FALSE+UnknownAs.FALSE to NOT(x) is worth doing, a comment is required to clarify why it works. By the way, why don't you consider IS_TRUE+UnknownAs.TRUE and other Unknown.As.TRUE cases? |
sure, I'll include a brief explanation about why these work. I think we can only add these 4 right now.
I think in that case the |
Can you elaborate? |
An alternate way to think about
Let's consider
Question is: is there an What could be used to construct
I think it's not possible to give a an So I think the existing 4 is all what we can do right now; I'm considering adding something like the following as a comment:
Note that both
|
@@ -262,11 +262,12 @@ private void checkUnknownAs(RexNode node, RexUnknownAs unknownAs) { | |||
} | |||
} | |||
} | |||
if (opt.getType().isNullable() && !node.getType().isNullable()) { | |||
if (unknownAs == RexUnknownAs.UNKNOWN | |||
&& opt.getType().isNullable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means a simplified node may change nullability if RexUnknownAs is not UNKNOWN,during planner node transformation, sometimes we need all the node types must be equal including nullability, e.g. VolcanoPlanner RelSubsets, what do you think about this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danny0405 thank you for taking a look
I would like to try to convince you that that will not happen:
- simplification in non-UNKNOWN mode should not have any effect on a RelNode's rowType; there is matchNullability in ReduceExpressions which is another line of defense against that; this test is running directly RexSimplify.
- a Project may never run with anything other than RexUnknownAs.UNKNOWN mode - that would lead to interesting things quickly
- setting RexUnknownAs to anything which is not UNKNOWN means that it's doing something closely related to boolean logic: it might be a Filter or a Join condition both of these are setting unknownAs.FALSE mode because both of them operates in an implicit "condtion IS TRUE" fashion
- I think one way to "remove" this unknownAs option would be to force the special call sites; like join/filter to add an "is true" around the condition - and then the actual filter/join implementation may remove the extra "IS TRUE" clause; if it could operate in UnknownAs.FALSE mode
- the good side of this would be that: there would be no UnknownAs anymore outside of RexSimplify and that Filter/Join would always seen a non-nullable boolean type at all time.
- by this patch It may only change between boolean non-nullable to boolean nullable ; because the input mode "UnknownAs.FALSE" means that null will be handled as false.
- I've taken a look at all the callsites which might lead to a RexSimplify running with
RexUnknownAs.FALSE
- and all of them have been coming from ReduceExpressionsRule- for Calc, Project: with RexUnknownAs.UNKNOWN
- for Join, Filter: with RexUnknownAs.FALSE
Right now I think that this will not have any adverse effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgyrtkirk Thx for you clarification. I think the the ReduceExpressions is the key to defense against the nullability change. That make sense.
@vlsi , thanks for your feedback. Do you think this PR is ready to be merged now? |
…checks Earlier expressions like ((x IS TRUE) IS TRUE) were left as is, the new behaviour recognizes if the IS TRUE/IS NOT FALSE check is redundant. In case ((x IS TRUE) IS TRUE) is a filter expression, it is simplified to 'x'.
0b3d1f4
to
72457f7
Compare
…checks
Earlier expressions like ((x IS TRUE) IS TRUE) were left as is, the new behaviour
recognizes if the IS TRUE/IS NOT FALSE check is redundant.
In case ((x IS TRUE) IS TRUE) is a filter expression, it is simplified to 'x'.