Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20121][SQL] simplify NullPropagation with NullIntolerant #17450

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Instead of iterating all expressions that can return null for null inputs, we can just check NullIntolerant.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @gatorsmile

@SparkQA
Copy link

SparkQA commented Mar 28, 2017

Test build #75298 has started for PR 17450 at commit dfb70fb.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Mar 28, 2017

Test build #75318 has finished for PR 17450 at commit dfb70fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class BinaryArithmetic extends BinaryOperator with NullIntolerant
  • case class Add(left: Expression, right: Expression) extends BinaryArithmetic
  • case class Subtract(left: Expression, right: Expression) extends BinaryArithmetic
  • case class Multiply(left: Expression, right: Expression) extends BinaryArithmetic
  • case class Divide(left: Expression, right: Expression) extends BinaryArithmetic
  • case class Remainder(left: Expression, right: Expression) extends BinaryArithmetic
  • case class Pmod(left: Expression, right: Expression) extends BinaryArithmetic
  • case class Like(left: Expression, right: Expression) extends StringRegexExpression
  • case class RLike(left: Expression, right: Expression) extends StringRegexExpression
  • case class Contains(left: Expression, right: Expression) extends StringPredicate
  • case class StartsWith(left: Expression, right: Expression) extends StringPredicate
  • case class EndsWith(left: Expression, right: Expression) extends StringPredicate

@@ -1122,7 +1119,7 @@ case class StringSpace(child: Expression)
""")
// scalastyle:on line.size.limit
case class Substring(str: Expression, pos: Expression, len: Expression)
extends TernaryExpression with ImplicitCastInputTypes {
extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the function SUBSTRING null-intolerant? What is the return value if str is a null value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result can be null; if any argument is null, the result is the null value.

Ref: https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_bif_substr.html

Copy link
Contributor

@nsyca nsyca Mar 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be confused with the terminologies: NullIntolerant expression versus "null-intolerant predicate". But if SUBSTRING is marked null-intolerant expression, why do we not mark the class of string functions such as STARTSWITH, etc. the same way? Am I missing anything here?

Copy link
Member

@gatorsmile gatorsmile Mar 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should mark NullIntolerant to the other expressions, if possible, and also update the document.

Copy link
Member

@gatorsmile gatorsmile Mar 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsyca If you have a bandwidth, could you please review all the expressions and see whether they can be marked as NullIntolerant?

You can check the impl of these expressions and compare them with the corresponding ones in the other RDBMS. Thanks!

Below is a ref PR you can use: https://github.com/apache/spark/pull/15850/files. You can continue my work if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will certainly take a look. On a second thought, since "most" of the SQL functions are null-intolerant, isn't easier to mark only functions that are null-tolerant such as ISNOTNULL? I am just pitching an idea here, not indicating we should abandon this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the beginning, when we introduce NullIntolerant , @marmbrus said we should do it more carefully. We prefer to using the white-list solution for avoiding hidden bugs. Marking it NullIntolerant if and only if we are ensure that they are null intolerant. Thus, when we doing it, we should also add the corresponding test cases and documents.

// If the value expression is NULL then transform the In expression to
// Literal(null)
case In(Literal(null, _), list) => Literal.create(null, BooleanType)

// Put exceptional cases above if any
Copy link
Member

@gatorsmile gatorsmile Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Attribute is also NullIntolerant Maybe add a comment?
Non-leaf NullIntolerant expressions will return null, if at least one of its children is a null literal.

@gatorsmile
Copy link
Member

Great! LGTM except a minor comment.

trait StringPredicate extends Predicate with ImplicitCastInputTypes {
self: BinaryExpression =>
abstract class StringPredicate extends BinaryExpression
with Predicate with ImplicitCastInputTypes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing with NullIntolerant here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is just to simplify the existing rule NullPropagation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above StringRegexExpression, similar to it, in order to simplify the NullPropagation, we need to add NullIntolerant, so it can propagate null value...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally got your point. StringPredicate is used for inferring the null constants in the rule NullPropagation. Thus, we should mark it as NullIntolerant .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. :-)

@viirya
Copy link
Member

viirya commented Mar 30, 2017

LGTM

@SparkQA
Copy link

SparkQA commented Mar 30, 2017

Test build #75390 has finished for PR 17450 at commit 63287ef.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Mar 30, 2017

Test build #75392 has finished for PR 17450 at commit 63287ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in c734fc5 Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants