[CALCITE-2838] Simplification: Remove redundant IS TRUE/IS NOT FALSE … #1044

kgyrtkirk · 2019-02-14T13:29:58Z

…checks

Earlier expressions like ((x IS TRUE) IS TRUE) were left as is, the new behaviour
recognizes if the IS TRUE/IS NOT FALSE check is redundant.
In case ((x IS TRUE) IS TRUE) is a filter expression, it is simplified to 'x'.

vlsi · 2019-02-14T13:43:46Z

core/src/main/java/org/apache/calcite/rex/RexSimplify.java

+    if (kind == SqlKind.IS_NOT_FALSE && unknownAs == RexUnknownAs.TRUE) {
+      return simplify(argument, unknownAs);
+    }
+    final RexNode a = simplify(call.getOperands().get(0), RexUnknownAs.UNKNOWN);


This does look like O(N^2).
Can you please avoid simplifying the same nodes again and again?

I think this shouldn't be N^2 ; because it actually drops the "IS X" part and continues with the recursion on the argument.
I've deliberitely placed the existing simplify call after the checks of redundancy.

Thank you for taking a look!

I guess simplifyIs is used in simplifyAnd2 as well, so a generic simplify looks very odd inside simplifyIs

ugh...it seems simplifyIs2 is also invoking simplify - I think recursion should happen as the first thing when entering sub-simplification methods

@kgyrtkirk , I agree: https://issues.apache.org/jira/browse/CALCITE-2449 :)

vlsi · 2019-02-14T13:55:32Z

Hey @kgyrtkirk , I see you have a nice PR here.

What do you think on IS_NOT_TRUE / IS_FALSE?

Would you please avoid adding O(N^2) behaviors to RexSimplify?

Thanks

kgyrtkirk · 2019-02-14T14:50:10Z

I kinda went a little defensive after the jira comments :)
If you mean; rewrite x IS FALSE to not x in case of UnknownAs.False ; I think that will work, I'll include it

vlsi · 2019-02-14T16:49:15Z

If you mean; rewrite x IS FALSE to not x in case of UnknownAs.False ; I think that will work, I'll include it

Well, in the code you do hardcode IS TRUE/IS NOT FALSE, and you had nothing regarding IS NOT TRUE/IS FALSE. I think the rest operations should be covered either in the code or in comments that explain why handing those makes no sense.

For instance, if replacement of IS_FALSE+UnknownAs.FALSE to NOT(x) is worth doing, a comment is required to clarify why it works.
If the replacement does not play well (e.g. it increases expression size with no benefit), then a comment is required to clarify things.

By the way, why don't you consider IS_TRUE+UnknownAs.TRUE and other Unknown.As.TRUE cases?

kgyrtkirk · 2019-02-14T18:56:50Z

sure, I'll include a brief explanation about why these work. I think we can only add these 4 right now.

By the way, why don't you consider IS_TRUE+UnknownAs.TRUE and other Unknown.As.TRUE cases?

I think in that case the [UAT] x IS_TRUE actualy marks that we are switching from UnknownAs.TRUE to UnknownAs.FALSE ; so it can't be removed.

vlsi · 2019-02-14T19:14:35Z

we are switching from UnknownAs.TRUE to UnknownAs.FALSE ; so it can't be removed.

Can you elaborate?

kgyrtkirk · 2019-02-14T22:04:56Z

By the way, why don't you consider IS_TRUE+UnknownAs.TRUE and other Unknown.As.TRUE cases?

we are switching from UnknownAs.TRUE to UnknownAs.FALSE ; so it can't be removed.

An alternate way to think about UnknownAs modes is by IS X operators.

x	x IS TRUE	[UAF] x	x IS NOT FALSE	[UAT] x
-1	-1	-1	-1	-1
0	-1	-1	1	1
1	1	1	1	1

Let's consider [UAT] x IS TRUE; to move to proper 3-valued logic, I would instead write:
(x IS TRUE) IS NOT FALSE

x	x IS TRUE	(x IS TRUE) IS NOT FALSE	F(X) IS NOT FALSE
-1	-1	-1	-1
0	-1	-1	-1
1	1	1	1

Question is: is there an F which could satisfy the truth table?

What could be used to construct F?

IS X operators are most probably out of scope; since we are trying to replace one of them
NOT is good; but has no effect on "UNKNOWN"
CASE, AND, OR, COALESCE .... adds much more complexity

I think it's not possible to give a an F in this case; which might be better.

So I think the existing 4 is all what we can do right now; I'm considering adding something like the following as a comment:

UnknownAs.FALSE corresponds to x IS TRUE
UnknownAs.TRUE to x IS NOT FALSE

Note that both UnknownAs.TRUE and UnknownAs.FALSE only changes the meaning of Unknown

if we are already in UnknownAs.FALSE mode; x IS TRUE can be simiplified to x without any side effects
similarily in UnknownAs.TRUE mode ; x IS NOT FALSE can be simplified to x
x IS FALSE could be rewritten to (NOT x) IS TRUE and from there the 1. rule applies
x IS NOT TRUE can be rewritten to (NOT x) IS NOT FALSE and from there the 2. rule applies

danny0405 · 2019-02-16T11:34:16Z

core/src/test/java/org/apache/calcite/test/fuzzer/RexProgramFuzzyTest.java

@@ -262,11 +262,12 @@ private void checkUnknownAs(RexNode node, RexUnknownAs unknownAs) {
        }
      }
    }
-    if (opt.getType().isNullable() && !node.getType().isNullable()) {
+    if (unknownAs == RexUnknownAs.UNKNOWN
+        && opt.getType().isNullable()


This means a simplified node may change nullability if RexUnknownAs is not UNKNOWN，during planner node transformation, sometimes we need all the node types must be equal including nullability, e.g. VolcanoPlanner RelSubsets, what do you think about this ?

@danny0405 thank you for taking a look
I would like to try to convince you that that will not happen:

simplification in non-UNKNOWN mode should not have any effect on a RelNode's rowType; there is matchNullability in ReduceExpressions which is another line of defense against that; this test is running directly RexSimplify.

a Project may never run with anything other than RexUnknownAs.UNKNOWN mode - that would lead to interesting things quickly

setting RexUnknownAs to anything which is not UNKNOWN means that it's doing something closely related to boolean logic: it might be a Filter or a Join condition both of these are setting unknownAs.FALSE mode because both of them operates in an implicit "condtion IS TRUE" fashion

I think one way to "remove" this unknownAs option would be to force the special call sites; like join/filter to add an "is true" around the condition - and then the actual filter/join implementation may remove the extra "IS TRUE" clause; if it could operate in UnknownAs.FALSE mode

the good side of this would be that: there would be no UnknownAs anymore outside of RexSimplify and that Filter/Join would always seen a non-nullable boolean type at all time.

by this patch It may only change between boolean non-nullable to boolean nullable ; because the input mode "UnknownAs.FALSE" means that null will be handled as false.

I've taken a look at all the callsites which might lead to a RexSimplify running with RexUnknownAs.FALSE - and all of them have been coming from ReduceExpressionsRule

for Calc, Project: with RexUnknownAs.UNKNOWN

for Join, Filter: with RexUnknownAs.FALSE

Right now I think that this will not have any adverse effects.

@kgyrtkirk Thx for you clarification. I think the the ReduceExpressions is the key to defense against the nullability change. That make sense.

jcamachor · 2019-02-21T04:46:33Z

@vlsi , thanks for your feedback. Do you think this PR is ready to be merged now?

…checks Earlier expressions like ((x IS TRUE) IS TRUE) were left as is, the new behaviour recognizes if the IS TRUE/IS NOT FALSE check is redundant. In case ((x IS TRUE) IS TRUE) is a filter expression, it is simplified to 'x'.

vlsi reviewed Feb 14, 2019

View reviewed changes

danny0405 reviewed Feb 16, 2019

View reviewed changes

kgyrtkirk force-pushed the 2838-is-true branch from 0b3d1f4 to 72457f7 Compare February 23, 2019 08:52

kgyrtkirk merged commit c462838 into apache:master Feb 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-2838] Simplification: Remove redundant IS TRUE/IS NOT FALSE … #1044

[CALCITE-2838] Simplification: Remove redundant IS TRUE/IS NOT FALSE … #1044

kgyrtkirk commented Feb 14, 2019

vlsi Feb 14, 2019

kgyrtkirk Feb 14, 2019

vlsi Feb 14, 2019

kgyrtkirk Feb 14, 2019

vlsi Feb 14, 2019

vlsi commented Feb 14, 2019

kgyrtkirk commented Feb 14, 2019

vlsi commented Feb 14, 2019

kgyrtkirk commented Feb 14, 2019

vlsi commented Feb 14, 2019

kgyrtkirk commented Feb 14, 2019

danny0405 Feb 16, 2019

kgyrtkirk Feb 18, 2019

danny0405 Feb 20, 2019

jcamachor commented Feb 21, 2019

[CALCITE-2838] Simplification: Remove redundant IS TRUE/IS NOT FALSE … #1044

[CALCITE-2838] Simplification: Remove redundant IS TRUE/IS NOT FALSE … #1044

Conversation

kgyrtkirk commented Feb 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vlsi commented Feb 14, 2019

kgyrtkirk commented Feb 14, 2019

vlsi commented Feb 14, 2019

kgyrtkirk commented Feb 14, 2019

vlsi commented Feb 14, 2019

kgyrtkirk commented Feb 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcamachor commented Feb 21, 2019