-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20121][SQL] simplify NullPropagation with NullIntolerant #17450
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -297,8 +297,8 @@ case class Lower(child: Expression) extends UnaryExpression with String2StringEx | |
} | ||
|
||
/** A base trait for functions that compare two strings, returning a boolean. */ | ||
trait StringPredicate extends Predicate with ImplicitCastInputTypes { | ||
self: BinaryExpression => | ||
abstract class StringPredicate extends BinaryExpression | ||
with Predicate with ImplicitCastInputTypes { | ||
|
||
def compare(l: UTF8String, r: UTF8String): Boolean | ||
|
||
|
@@ -313,8 +313,7 @@ trait StringPredicate extends Predicate with ImplicitCastInputTypes { | |
/** | ||
* A function that returns true if the string `left` contains the string `right`. | ||
*/ | ||
case class Contains(left: Expression, right: Expression) | ||
extends BinaryExpression with StringPredicate { | ||
case class Contains(left: Expression, right: Expression) extends StringPredicate { | ||
override def compare(l: UTF8String, r: UTF8String): Boolean = l.contains(r) | ||
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
defineCodeGen(ctx, ev, (c1, c2) => s"($c1).contains($c2)") | ||
|
@@ -324,8 +323,7 @@ case class Contains(left: Expression, right: Expression) | |
/** | ||
* A function that returns true if the string `left` starts with the string `right`. | ||
*/ | ||
case class StartsWith(left: Expression, right: Expression) | ||
extends BinaryExpression with StringPredicate { | ||
case class StartsWith(left: Expression, right: Expression) extends StringPredicate { | ||
override def compare(l: UTF8String, r: UTF8String): Boolean = l.startsWith(r) | ||
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
defineCodeGen(ctx, ev, (c1, c2) => s"($c1).startsWith($c2)") | ||
|
@@ -335,8 +333,7 @@ case class StartsWith(left: Expression, right: Expression) | |
/** | ||
* A function that returns true if the string `left` ends with the string `right`. | ||
*/ | ||
case class EndsWith(left: Expression, right: Expression) | ||
extends BinaryExpression with StringPredicate { | ||
case class EndsWith(left: Expression, right: Expression) extends StringPredicate { | ||
override def compare(l: UTF8String, r: UTF8String): Boolean = l.endsWith(r) | ||
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
defineCodeGen(ctx, ev, (c1, c2) => s"($c1).endsWith($c2)") | ||
|
@@ -1122,7 +1119,7 @@ case class StringSpace(child: Expression) | |
""") | ||
// scalastyle:on line.size.limit | ||
case class Substring(str: Expression, pos: Expression, len: Expression) | ||
extends TernaryExpression with ImplicitCastInputTypes { | ||
extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the function SUBSTRING null-intolerant? What is the return value if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ref: https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_bif_substr.html There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might be confused with the terminologies: NullIntolerant expression versus "null-intolerant predicate". But if SUBSTRING is marked null-intolerant expression, why do we not mark the class of string functions such as STARTSWITH, etc. the same way? Am I missing anything here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we should mark There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nsyca If you have a bandwidth, could you please review all the expressions and see whether they can be marked as You can check the impl of these expressions and compare them with the corresponding ones in the other RDBMS. Thanks! Below is a ref PR you can use: https://github.com/apache/spark/pull/15850/files. You can continue my work if you want. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will certainly take a look. On a second thought, since "most" of the SQL functions are null-intolerant, isn't easier to mark only functions that are null-tolerant such as ISNOTNULL? I am just pitching an idea here, not indicating we should abandon this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At the beginning, when we introduce |
||
|
||
def this(str: Expression, pos: Expression) = { | ||
this(str, pos, Literal(Integer.MAX_VALUE)) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -347,35 +347,30 @@ object LikeSimplification extends Rule[LogicalPlan] { | |
* Null value propagation from bottom to top of the expression tree. | ||
*/ | ||
case class NullPropagation(conf: CatalystConf) extends Rule[LogicalPlan] { | ||
private def nonNullLiteral(e: Expression): Boolean = e match { | ||
case Literal(null, _) => false | ||
case _ => true | ||
private def isNullLiteral(e: Expression): Boolean = e match { | ||
case Literal(null, _) => true | ||
case _ => false | ||
} | ||
|
||
def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
case q: LogicalPlan => q transformExpressionsUp { | ||
case e @ WindowExpression(Cast(Literal(0L, _), _, _), _) => | ||
Cast(Literal(0L), e.dataType, Option(conf.sessionLocalTimeZone)) | ||
case e @ AggregateExpression(Count(exprs), _, _, _) if !exprs.exists(nonNullLiteral) => | ||
case e @ AggregateExpression(Count(exprs), _, _, _) if exprs.forall(isNullLiteral) => | ||
Cast(Literal(0L), e.dataType, Option(conf.sessionLocalTimeZone)) | ||
case e @ IsNull(c) if !c.nullable => Literal.create(false, BooleanType) | ||
case e @ IsNotNull(c) if !c.nullable => Literal.create(true, BooleanType) | ||
case e @ GetArrayItem(Literal(null, _), _) => Literal.create(null, e.dataType) | ||
case e @ GetArrayItem(_, Literal(null, _)) => Literal.create(null, e.dataType) | ||
case e @ GetMapValue(Literal(null, _), _) => Literal.create(null, e.dataType) | ||
case e @ GetMapValue(_, Literal(null, _)) => Literal.create(null, e.dataType) | ||
case e @ GetStructField(Literal(null, _), _, _) => Literal.create(null, e.dataType) | ||
case e @ GetArrayStructFields(Literal(null, _), _, _, _, _) => | ||
Literal.create(null, e.dataType) | ||
case e @ EqualNullSafe(Literal(null, _), r) => IsNull(r) | ||
case e @ EqualNullSafe(l, Literal(null, _)) => IsNull(l) | ||
case ae @ AggregateExpression(Count(exprs), _, false, _) if !exprs.exists(_.nullable) => | ||
// This rule should be only triggered when isDistinct field is false. | ||
ae.copy(aggregateFunction = Count(Literal(1))) | ||
|
||
case IsNull(c) if !c.nullable => Literal.create(false, BooleanType) | ||
case IsNotNull(c) if !c.nullable => Literal.create(true, BooleanType) | ||
|
||
case EqualNullSafe(Literal(null, _), r) => IsNull(r) | ||
case EqualNullSafe(l, Literal(null, _)) => IsNull(l) | ||
|
||
// For Coalesce, remove null literals. | ||
case e @ Coalesce(children) => | ||
val newChildren = children.filter(nonNullLiteral) | ||
val newChildren = children.filterNot(isNullLiteral) | ||
if (newChildren.isEmpty) { | ||
Literal.create(null, e.dataType) | ||
} else if (newChildren.length == 1) { | ||
|
@@ -384,33 +379,13 @@ case class NullPropagation(conf: CatalystConf) extends Rule[LogicalPlan] { | |
Coalesce(newChildren) | ||
} | ||
|
||
case e @ Substring(Literal(null, _), _, _) => Literal.create(null, e.dataType) | ||
case e @ Substring(_, Literal(null, _), _) => Literal.create(null, e.dataType) | ||
case e @ Substring(_, _, Literal(null, _)) => Literal.create(null, e.dataType) | ||
|
||
// Put exceptional cases above if any | ||
case e @ BinaryArithmetic(Literal(null, _), _) => Literal.create(null, e.dataType) | ||
case e @ BinaryArithmetic(_, Literal(null, _)) => Literal.create(null, e.dataType) | ||
|
||
case e @ BinaryComparison(Literal(null, _), _) => Literal.create(null, e.dataType) | ||
case e @ BinaryComparison(_, Literal(null, _)) => Literal.create(null, e.dataType) | ||
|
||
case e: StringRegexExpression => e.children match { | ||
case Literal(null, _) :: right :: Nil => Literal.create(null, e.dataType) | ||
case left :: Literal(null, _) :: Nil => Literal.create(null, e.dataType) | ||
case _ => e | ||
} | ||
|
||
case e: StringPredicate => e.children match { | ||
case Literal(null, _) :: right :: Nil => Literal.create(null, e.dataType) | ||
case left :: Literal(null, _) :: Nil => Literal.create(null, e.dataType) | ||
case _ => e | ||
} | ||
|
||
// If the value expression is NULL then transform the In expression to | ||
// Literal(null) | ||
case In(Literal(null, _), list) => Literal.create(null, BooleanType) | ||
|
||
// Put exceptional cases above if any | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: |
||
case e: NullIntolerant if e.children.exists(isNullLiteral) => | ||
Literal.create(null, e.dataType) | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing
with NullIntolerant
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is just to simplify the existing rule
NullPropagation
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
StringRegexExpression
, similar to it, in order to simplify theNullPropagation
, we need to addNullIntolerant
, so it can propagate null value...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got your point.
StringPredicate
is used for inferring the null constants in the ruleNullPropagation
. Thus, we should mark it asNullIntolerant
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. :-)