Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-2838] Simplification: Remove redundant IS TRUE/IS NOT FALSE … #1044

Merged
merged 1 commit into from
Feb 23, 2019

Conversation

kgyrtkirk
Copy link
Member

…checks

Earlier expressions like ((x IS TRUE) IS TRUE) were left as is, the new behaviour
recognizes if the IS TRUE/IS NOT FALSE check is redundant.
In case ((x IS TRUE) IS TRUE) is a filter expression, it is simplified to 'x'.

if (kind == SqlKind.IS_NOT_FALSE && unknownAs == RexUnknownAs.TRUE) {
return simplify(argument, unknownAs);
}
final RexNode a = simplify(call.getOperands().get(0), RexUnknownAs.UNKNOWN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does look like O(N^2).
Can you please avoid simplifying the same nodes again and again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shouldn't be N^2 ; because it actually drops the "IS X" part and continues with the recursion on the argument.
I've deliberitely placed the existing simplify call after the checks of redundancy.

Thank you for taking a look!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess simplifyIs is used in simplifyAnd2 as well, so a generic simplify looks very odd inside simplifyIs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh...it seems simplifyIs2 is also invoking simplify - I think recursion should happen as the first thing when entering sub-simplification methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlsi
Copy link
Contributor

vlsi commented Feb 14, 2019

Hey @kgyrtkirk , I see you have a nice PR here.

What do you think on IS_NOT_TRUE / IS_FALSE?

Would you please avoid adding O(N^2) behaviors to RexSimplify?

Thanks

@kgyrtkirk
Copy link
Member Author

I kinda went a little defensive after the jira comments :)
If you mean; rewrite x IS FALSE to not x in case of UnknownAs.False ; I think that will work, I'll include it

@vlsi
Copy link
Contributor

vlsi commented Feb 14, 2019

If you mean; rewrite x IS FALSE to not x in case of UnknownAs.False ; I think that will work, I'll include it

Well, in the code you do hardcode IS TRUE/IS NOT FALSE, and you had nothing regarding IS NOT TRUE/IS FALSE. I think the rest operations should be covered either in the code or in comments that explain why handing those makes no sense.

For instance, if replacement of IS_FALSE+UnknownAs.FALSE to NOT(x) is worth doing, a comment is required to clarify why it works.
If the replacement does not play well (e.g. it increases expression size with no benefit), then a comment is required to clarify things.

By the way, why don't you consider IS_TRUE+UnknownAs.TRUE and other Unknown.As.TRUE cases?

@kgyrtkirk
Copy link
Member Author

sure, I'll include a brief explanation about why these work. I think we can only add these 4 right now.

By the way, why don't you consider IS_TRUE+UnknownAs.TRUE and other Unknown.As.TRUE cases?

I think in that case the [UAT] x IS_TRUE actualy marks that we are switching from UnknownAs.TRUE to UnknownAs.FALSE ; so it can't be removed.

@vlsi
Copy link
Contributor

vlsi commented Feb 14, 2019

we are switching from UnknownAs.TRUE to UnknownAs.FALSE ; so it can't be removed.

Can you elaborate?

@kgyrtkirk
Copy link
Member Author

By the way, why don't you consider IS_TRUE+UnknownAs.TRUE and other Unknown.As.TRUE cases?

we are switching from UnknownAs.TRUE to UnknownAs.FALSE ; so it can't be removed.

An alternate way to think about UnknownAs modes is by IS X operators.

x x IS TRUE [UAF] x x IS NOT FALSE [UAT] x
-1 -1 -1 -1 -1
0 -1 -1 1 1
1 1 1 1 1

Let's consider [UAT] x IS TRUE; to move to proper 3-valued logic, I would instead write:
(x IS TRUE) IS NOT FALSE

x x IS TRUE (x IS TRUE) IS NOT FALSE F(X) IS NOT FALSE
-1 -1 -1 -1
0 -1 -1 -1
1 1 1 1

Question is: is there an F which could satisfy the truth table?

What could be used to construct F?

  • IS X operators are most probably out of scope; since we are trying to replace one of them
  • NOT is good; but has no effect on "UNKNOWN"
  • CASE, AND, OR, COALESCE .... adds much more complexity

I think it's not possible to give a an F in this case; which might be better.

So I think the existing 4 is all what we can do right now; I'm considering adding something like the following as a comment:

UnknownAs.FALSE corresponds to x IS TRUE
UnknownAs.TRUE to x IS NOT FALSE

Note that both UnknownAs.TRUE and UnknownAs.FALSE only changes the meaning of Unknown

  • if we are already in UnknownAs.FALSE mode; x IS TRUE can be simiplified to x without any side effects
  • similarily in UnknownAs.TRUE mode ; x IS NOT FALSE can be simplified to x
  • x IS FALSE could be rewritten to (NOT x) IS TRUE and from there the 1. rule applies
  • x IS NOT TRUE can be rewritten to (NOT x) IS NOT FALSE and from there the 2. rule applies

@@ -262,11 +262,12 @@ private void checkUnknownAs(RexNode node, RexUnknownAs unknownAs) {
}
}
}
if (opt.getType().isNullable() && !node.getType().isNullable()) {
if (unknownAs == RexUnknownAs.UNKNOWN
&& opt.getType().isNullable()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means a simplified node may change nullability if RexUnknownAs is not UNKNOWN,during planner node transformation, sometimes we need all the node types must be equal including nullability, e.g. VolcanoPlanner RelSubsets, what do you think about this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 thank you for taking a look
I would like to try to convince you that that will not happen:

  • simplification in non-UNKNOWN mode should not have any effect on a RelNode's rowType; there is matchNullability in ReduceExpressions which is another line of defense against that; this test is running directly RexSimplify.
  • a Project may never run with anything other than RexUnknownAs.UNKNOWN mode - that would lead to interesting things quickly
  • setting RexUnknownAs to anything which is not UNKNOWN means that it's doing something closely related to boolean logic: it might be a Filter or a Join condition both of these are setting unknownAs.FALSE mode because both of them operates in an implicit "condtion IS TRUE" fashion
    • I think one way to "remove" this unknownAs option would be to force the special call sites; like join/filter to add an "is true" around the condition - and then the actual filter/join implementation may remove the extra "IS TRUE" clause; if it could operate in UnknownAs.FALSE mode
    • the good side of this would be that: there would be no UnknownAs anymore outside of RexSimplify and that Filter/Join would always seen a non-nullable boolean type at all time.
  • by this patch It may only change between boolean non-nullable to boolean nullable ; because the input mode "UnknownAs.FALSE" means that null will be handled as false.
  • I've taken a look at all the callsites which might lead to a RexSimplify running with RexUnknownAs.FALSE - and all of them have been coming from ReduceExpressionsRule
    • for Calc, Project: with RexUnknownAs.UNKNOWN
    • for Join, Filter: with RexUnknownAs.FALSE

Right now I think that this will not have any adverse effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgyrtkirk Thx for you clarification. I think the the ReduceExpressions is the key to defense against the nullability change. That make sense.

@jcamachor
Copy link
Contributor

@vlsi , thanks for your feedback. Do you think this PR is ready to be merged now?

…checks

Earlier expressions like ((x IS TRUE) IS TRUE) were left as is, the new behaviour
recognizes if the IS TRUE/IS NOT FALSE check is redundant.
In case ((x IS TRUE) IS TRUE) is a filter expression, it is simplified to 'x'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants