[SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever possible #22857

aokolnychyi · 2018-10-27T11:13:28Z

What changes were proposed in this pull request?

This PR proposes a new optimization rule that replaces Literal(null, _) with FalseLiteral in conditions in Join and Filter, predicates in If, conditions in CaseWhen.

The idea is that some expressions evaluate to false if the underlying expression is null (as an example see GeneratePredicate$create or doGenCode and eval methods in If and CaseWhen). Therefore, we can replace Literal(null, _) with FalseLiteral, which can lead to more optimizations later on.

Let’s consider a few examples.

val df = spark.range(1, 100).select($"id".as("l"), ($"id" > 50).as("b"))
df.createOrReplaceTempView("t")
df.createOrReplaceTempView("p")

Case 1

spark.sql("SELECT * FROM t WHERE if(l > 10, false, NULL)").explain(true)

// without the new rule
…
== Optimized Logical Plan ==
Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
+- Filter if ((id#0L > 10)) false else null
   +- Range (1, 100, step=1, splits=Some(12))

== Physical Plan ==
*(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
+- *(1) Filter if ((id#0L > 10)) false else null
   +- *(1) Range (1, 100, step=1, splits=12)

// with the new rule
…
== Optimized Logical Plan ==
LocalRelation <empty>, [l#2L, s#3]

== Physical Plan ==
LocalTableScan <empty>, [l#2L, s#3]

Case 2

spark.sql("SELECT * FROM t WHERE CASE WHEN l < 10 THEN null WHEN l > 40 THEN false ELSE null END”).explain(true)

// without the new rule
...
== Optimized Logical Plan ==
Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
+- Filter CASE WHEN (id#0L < 10) THEN null WHEN (id#0L > 40) THEN false ELSE null END
   +- Range (1, 100, step=1, splits=Some(12))

== Physical Plan ==
*(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
+- *(1) Filter CASE WHEN (id#0L < 10) THEN null WHEN (id#0L > 40) THEN false ELSE null END
   +- *(1) Range (1, 100, step=1, splits=12)

// with the new rule
...
== Optimized Logical Plan ==
LocalRelation <empty>, [l#2L, s#3]

== Physical Plan ==
LocalTableScan <empty>, [l#2L, s#3]

Case 3

spark.sql("SELECT * FROM t JOIN p ON IF(t.l > p.l, null, false)").explain(true)

// without the new rule
...
== Optimized Logical Plan ==
Join Inner, if ((l#2L > l#37L)) null else false
:- Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
:  +- Range (1, 100, step=1, splits=Some(12))
+- Project [id#0L AS l#37L, cast(id#0L as string) AS s#38]
   +- Range (1, 100, step=1, splits=Some(12))

== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, Inner, if ((l#2L > l#37L)) null else false
:- *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3]
:  +- *(1) Range (1, 100, step=1, splits=12)
+- BroadcastExchange IdentityBroadcastMode
   +- *(2) Project [id#0L AS l#37L, cast(id#0L as string) AS s#38]
      +- *(2) Range (1, 100, step=1, splits=12)


// with the new rule
...
== Optimized Logical Plan ==
LocalRelation <empty>, [l#2L, s#3, l#37L, s#38]

How was this patch tested?

This PR comes with a set of dedicated tests.

SparkQA · 2018-10-27T14:58:13Z

Test build #98127 has finished for PR 22857 at commit 1d8fefd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

aokolnychyi · 2018-10-27T15:16:04Z

@dbtsai @gatorsmile @cloud-fan could you guys, please, take a look?

dbtsai · 2018-10-28T08:18:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+    case j @ Join(_, _, _, Some(cond)) => j.copy(condition = Some(replaceNullWithFalse(cond)))
+    case p: LogicalPlan => p transformExpressions {
+      case i @ If(pred, _, _) => i.copy(predicate = replaceNullWithFalse(pred))
+      case CaseWhen(branches, elseValue) =>


Nit,

case cw @ CaseWhen(branches, _) => .. .. cw.copy(branches = newBranches)

dbtsai · 2018-10-28T08:26:04Z

LGTM.

@cloud-fan and @gatorsmile, this is the PR I mentioned to you earlier this year in the SF Spark summit which can simplify some of our queries.

Also add @dongjoon-hyun and @viirya

Thanks.

dbtsai · 2018-10-28T08:28:50Z

...alyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseSuite.scala

+    val expectedPlan = func(testRelation, expectedExpr).analyze
+    comparePlans(optimizedPlan, expectedPlan)
+  }
+


remove extra line.

dbtsai · 2018-10-28T08:31:14Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+      }
+      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
+      CaseWhen(newBranches, newElseValue)
+    case If(pred, trueVal, falseVal) if Seq(trueVal, falseVal).forall(isNullOrBoolean) =>


Nit, in other place, we use trueValue and falseValue.

Yep, I shortened this to stay in one line below. I can either rename pred to por split line 783 into multiple.

viirya · 2018-10-28T08:53:37Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
+   */
+  private def replaceNullWithFalse(e: Expression): Expression = e match {


IsNull(Literal(null, _)) => IsNull(FalseLiteral)

Will this be a problem for this change?

We only do the replacements when 1) within Join or Filter such as Filter(If(cond, FalseLiteral, Literal(null, _))), or 2) If(Literal(null, _), trueValue, falseValue).

Also, that's the reason why we don't use transformExpressionsDown. We will stop the replacement as soon as we hit an expression that is not CaseWhen, If, And, Or or Literal(null, _). Therefore, If(IsNull(Literal(null, _))) won't be transformed.

dongjoon-hyun · 2018-10-28T18:49:51Z

...alyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseSuite.scala

+    testJoin(originalCond, expectedCond = FalseLiteral)
+  }
+
+  test("successful replacement of null literals in filter and join conditions (13)") {


Thank you for pinging me, @dbtsai . And, thank you for contribution, @aokolnychyi .
I also clearly feel the benefit of this optimizer. It's worth to be review throughly.

BTW, the test case names are very unclear to me. It only looks like positive case (1)~(13) and negative case (1) ~ (3). Can we have more elaborated and specific names? It will help readability of these test cases and shorten review process.

- successful replacement of null literals in filter and join conditions (1) ... - successful replacement of null literals in filter and join conditions (13) - inability to replace null literals in filter and join conditions (1) ... - inability to replace null literals in filter and join conditions (3)

dongjoon-hyun · 2018-10-28T18:54:05Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+ * As a result, many unnecessary computations can be removed in the query optimization phase.
+ *
+ * Similarly, the same logic can be applied to conditions in [[Join]], predicates in [[If]],
+ * conditions in [[CaseWhen]].


The examples are good, but we have to be more clear the scope of this optimizer.
For now, this PR touches not only predicates in WHERE, but also some expressions in SELECT.
Also, it's unclear with aggregation like HAVING. Could you a little bit more clearly enumerate the targets in this documentation, @aokolnychyi ?

dongjoon-hyun · 2018-10-28T18:57:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

@@ -83,6 +83,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
        BooleanSimplification,
        SimplifyConditionals,
        RemoveDispensableExpressions,
+        ReplaceNullWithFalse,


nit. Although this batch will be repeated, logically, ReplaceNullWithFalse can be better to use the result of SimplifyBinaryComparison. How about postponing this after SimplifyBinaryComparison? In other words, switch ReplaceNullWithFalse and SimplifyBinaryComparison?

cloud-fan · 2018-10-29T01:48:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
+   */
+  private def replaceNullWithFalse(e: Expression): Expression = e match {
+    case cw: CaseWhen if getValues(cw).forall(isNullOrBoolean) =>


how about cw.dataType == BooleanType || cw.dataType == NullType?

actually just cw.dataType == BooleanType. If an expression is NullType, it should be replaced by null literal already.

this applies to If as well.

cloud-fan · 2018-10-29T01:52:00Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+      CaseWhen(newBranches, newElseValue)
+    case If(pred, trueVal, falseVal) if Seq(trueVal, falseVal).forall(isNullOrBoolean) =>
+      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))
+    case And(left, right) =>


we need to be careful here. null && fales is false, null || true is true. Please take a look at #22702

Could you elaborate a bit more on null && false?

I had in mind AND(true, null) and OR(false, null), which are tricky. For example, if we use AND(true, null) in SELECT, we will get null. However, if we use it inside Filter or predicate of If, it will be semantically equivalent to false (e.g., If$eval). Therefore, the proposed rule has a limited scope. I explored the source code & comments in And/Or to come up with an edge case that wouldn’t work. I could not find such a case. To me, it seems safe because the rule is applied only to expressions that evaluate to false if the underlying expression is null (i.e., conditions in Filter/Join, predicates in If, conditions in CaseWhen).

Please, let me know if you have a particular case to test.

I don't have a particular case, this is just to double check that these corner cases are considered. I think we are fine now :)

cloud-fan · 2018-10-29T01:54:46Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

+
+  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {
+
+    def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match {


this assumes we run ConvertToLocalRelation, let's use withSQLConf to make sure this rule is on.

Do we actually have a way to enable/disable ConvertToLocalRelation?

yea we have. Take a look at TestHive, and we did something similar before

// Disable ConvertToLocalRelation for better test coverage. Test cases built on // LocalRelation will exercise the optimization rules better by disabling it as // this rule may potentially block testing of other optimization rules such as // ConstantPropagation etc. .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName)))

I see, thanks.

So you mean using withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> "") {...} to ensure that ConvertToLocalRelation is not excluded?

…r possible

cloud-fan · 2018-10-30T01:33:29Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+   * Recursively replaces `Literal(null, _)` with `FalseLiteral`.
+   *
+   * Note that `transformExpressionsDown` can not be used here as we must stop as soon as we hit
+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.


Can we make it more general? I think the expected expression is:

It's NullIntolerant. If any child is null, it will be null.

it has a null child.

so I would write something like

def replaceNullWithFalse(e: Expression): Expression = e match { case _ if alwaysNull(e) => FalseLiteral case And... case Or... case _ => e } def alwaysNull(e: Expression): Boolean = e match { case Literal(null, _) => true case n: NullIntolerant => n.children.exists(alwaysNull) case _ => false }

I like your snippet because it is clean. We also considered a similar approach.

Unfortunately, it does not handle nested If/CaseWhen expressions as they are not NullIntolerant. For example, cases like If(If(a > 1, FalseLiteral, Literal(null, _)), 1, 2) will not be optimized if we remove branches for If/CaseWhen.

If we just add one more brach to handle all NullIntolerant expressions, I am not sure it will give a lot of benefits as those expressions are transformed into Literal(null, _) by NullPropagation and we operate in the same batch.

As @gatorsmile said, we should be really careful. Generalization might be tricky. For example, Not is NullIntolerant. Not(null) is transformed into null by NullPropagation. We need to ensure that we do not replace null inside Not and do not convert Not(null) into Not(FalseLiteral).

Therefore, the intention was to keep things simple to be safe.

SparkQA · 2018-10-30T02:50:24Z

Test build #98240 has finished for PR 22857 at commit 4c35955.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-10-30T03:22:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+      val newElseValue = cw.elseValue.map(replaceNullWithFalse)
+      CaseWhen(newBranches, newElseValue)
+    case i @ If(pred, trueVal, falseVal) if i.dataType == BooleanType =>
+      If(replaceNullWithFalse(pred), replaceNullWithFalse(trueVal), replaceNullWithFalse(falseVal))


When i.dataType != BooleanType, we still can do replaceNullWithFalse(pred), don't we?

This case is handled in apply and tested in "replace null in predicates of If", "replace null in predicates of If inside another If"

Let me know if I got you correctly here

The general rule for LogicalPlan at apply looks at predicate of If, no matter its dataType is BooleanType or not.

But in replaceNullWithFalse, the rule for If only works if its dataType is BooleanType. "replace null in predicates of If inside another If" is a such case. The If inside another If is of BooleanType. If the inside If is not of BooleanType, this rule doesn't work. And I think it should be ok to replace the null with false when it is not boolean type.

ah, I see. replaceNullWithFalse should only work in boolean type cases. Then I think we are fine with it.

viirya · 2018-10-30T03:24:07Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+   * an expression that is not [[CaseWhen]], [[If]], [[And]], [[Or]] or `Literal(null, _)`.
+   */
+  private def replaceNullWithFalse(e: Expression): Expression = e match {
+    case cw: CaseWhen if cw.dataType == BooleanType =>


When cw.dataType != BooleanType, we can still do replaceNullWithFalse(cond), don't we?

This case is also covered and tested in "replace null in conditions of CaseWhen", "replace null in conditions of CaseWhen inside another CaseWhen".

SparkQA · 2018-10-30T03:37:42Z

Test build #98239 has finished for PR 22857 at commit 0eac890.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-30T05:02:12Z

Please be really careful in null handling. It could easily introduce the correctness bugs like what we recently fixed.

cloud-fan · 2018-10-31T01:05:49Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

@@ -2585,4 +2585,45 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {

    checkAnswer(swappedDf.filter($"key"($"map") > "a"), Row(2, Map(2 -> "b")))
  }
+
+  test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {


it's weird to put optimizer end-to-end test in DataFrameSuite. Can we create a ReplaceNullWithFalseEndToEndSuite?

cloud-fan · 2018-10-31T01:08:12Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

+    val q5 = df1.selectExpr("IF(l > 1 AND null, 5, 1) AS out")
+    checkAnswer(q5, Row(1) :: Row(1) :: Nil)
+    q5.queryExecution.executedPlan.foreach { p =>
+      assert(p.expressions.forall(e => e.find(_.isInstanceOf[If]).isEmpty))


This test can pass without the optimization. The ConvertToLocalRelation rule will eliminate the Project.

Can we use a table as input data? e.g.

withTable("t1", "t2") { Seq((1, true), (2, false)).toDF("l", "b").write.saveAsTable("t1") Seq(2, 3).toDF("l").write.saveAsTable("t2") val df1 = spark.table("t1") val df2 = spark.table("t2") ... }

You are right, this can pass if ConvertToLocalRelation is enabled. When I tested this check, I did not take into account that SharedSparkSession disables ConvertToLocalRelation. So, the check worked correctly but only because ConvertToLocalRelation was disabled in SharedSparkSession. Let’s switch to tables. Thanks!

cloud-fan · 2018-10-31T01:08:58Z

LGTM except the end-to-end test

SparkQA · 2018-10-31T17:44:19Z

Test build #98317 has finished for PR 22857 at commit 5499651.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class ReplaceNullWithFalseEndToEndSuite extends QueryTest with SharedSQLContext

dbtsai · 2018-10-31T18:35:05Z

Thanks all for reviewing! The latest change looks good to me too. Merged into master.

gatorsmile · 2018-11-26T00:32:54Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+      And(replaceNullWithFalse(left), replaceNullWithFalse(right))
+    case Or(left, right) =>
+      Or(replaceNullWithFalse(left), replaceNullWithFalse(right))
+    case Literal(null, _) => FalseLiteral


Here, for safety, we should check the data types.

gatorsmile · 2018-11-26T00:34:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

+ *
+ * As a result, many unnecessary computations can be removed in the query optimization phase.
+ */
+object ReplaceNullWithFalse extends Rule[LogicalPlan] {


Let us move it to a new file. The file is growing too big.

gatorsmile · 2018-12-03T21:49:26Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

@@ -31,14 +31,14 @@ import org.apache.spark.scheduler.{SparkListener, SparkListenerJobEnd}
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.expressions.Uuid
 import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
-import org.apache.spark.sql.catalyst.plans.logical.{Filter, OneRowRelation, Union}
+import org.apache.spark.sql.catalyst.plans.logical.{OneRowRelation, Union}


BTW, please do not remove these in a huge feature PR.

Yea, also it's unrelated import cleanup. It should be discouraged because it might make backporting / reverting potentially difficult, and sometimes those changes make readers confused.

…r possible ## What changes were proposed in this pull request? This PR proposes a new optimization rule that replaces `Literal(null, _)` with `FalseLiteral` in conditions in `Join` and `Filter`, predicates in `If`, conditions in `CaseWhen`. The idea is that some expressions evaluate to `false` if the underlying expression is `null` (as an example see `GeneratePredicate$create` or `doGenCode` and `eval` methods in `If` and `CaseWhen`). Therefore, we can replace `Literal(null, _)` with `FalseLiteral`, which can lead to more optimizations later on. Let’s consider a few examples. ``` val df = spark.range(1, 100).select($"id".as("l"), ($"id" > 50).as("b")) df.createOrReplaceTempView("t") df.createOrReplaceTempView("p") ``` **Case 1** ``` spark.sql("SELECT * FROM t WHERE if(l > 10, false, NULL)").explain(true) // without the new rule … == Optimized Logical Plan == Project [id#0L AS l#2L, cast(id#0L as string) AS s#3] +- Filter if ((id#0L > 10)) false else null +- Range (1, 100, step=1, splits=Some(12)) == Physical Plan == *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3] +- *(1) Filter if ((id#0L > 10)) false else null +- *(1) Range (1, 100, step=1, splits=12) // with the new rule … == Optimized Logical Plan == LocalRelation <empty>, [l#2L, s#3] == Physical Plan == LocalTableScan <empty>, [l#2L, s#3] ``` **Case 2** ``` spark.sql("SELECT * FROM t WHERE CASE WHEN l < 10 THEN null WHEN l > 40 THEN false ELSE null END”).explain(true) // without the new rule ... == Optimized Logical Plan == Project [id#0L AS l#2L, cast(id#0L as string) AS s#3] +- Filter CASE WHEN (id#0L < 10) THEN null WHEN (id#0L > 40) THEN false ELSE null END +- Range (1, 100, step=1, splits=Some(12)) == Physical Plan == *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3] +- *(1) Filter CASE WHEN (id#0L < 10) THEN null WHEN (id#0L > 40) THEN false ELSE null END +- *(1) Range (1, 100, step=1, splits=12) // with the new rule ... == Optimized Logical Plan == LocalRelation <empty>, [l#2L, s#3] == Physical Plan == LocalTableScan <empty>, [l#2L, s#3] ``` **Case 3** ``` spark.sql("SELECT * FROM t JOIN p ON IF(t.l > p.l, null, false)").explain(true) // without the new rule ... == Optimized Logical Plan == Join Inner, if ((l#2L > l#37L)) null else false :- Project [id#0L AS l#2L, cast(id#0L as string) AS s#3] : +- Range (1, 100, step=1, splits=Some(12)) +- Project [id#0L AS l#37L, cast(id#0L as string) AS s#38] +- Range (1, 100, step=1, splits=Some(12)) == Physical Plan == BroadcastNestedLoopJoin BuildRight, Inner, if ((l#2L > l#37L)) null else false :- *(1) Project [id#0L AS l#2L, cast(id#0L as string) AS s#3] : +- *(1) Range (1, 100, step=1, splits=12) +- BroadcastExchange IdentityBroadcastMode +- *(2) Project [id#0L AS l#37L, cast(id#0L as string) AS s#38] +- *(2) Range (1, 100, step=1, splits=12) // with the new rule ... == Optimized Logical Plan == LocalRelation <empty>, [l#2L, s#3, l#37L, s#38] ``` ## How was this patch tested? This PR comes with a set of dedicated tests. Closes apache#22857 from aokolnychyi/spark-25860. Authored-by: Anton Okolnychyi <aokolnychyi@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>

…higher-order functions: ArrayExists, ArrayFilter, MapFilter ## What changes were proposed in this pull request? Extend the `ReplaceNullWithFalse` optimizer rule introduced in SPARK-25860 (apache#22857) to also support optimizing predicates in higher-order functions of `ArrayExists`, `ArrayFilter`, `MapFilter`. Also rename the rule to `ReplaceNullWithFalseInPredicate` to better reflect its intent. Example: ```sql select filter(a, e -> if(e is null, null, true)) as b from ( select array(null, 1, null, 3) as a) ``` The optimized logical plan: **Before**: ``` == Optimized Logical Plan == Project [filter([null,1,null,3], lambdafunction(if (isnull(lambda e#13)) null else true, lambda e#13, false)) AS b#9] +- OneRowRelation ``` **After**: ``` == Optimized Logical Plan == Project [filter([null,1,null,3], lambdafunction(if (isnull(lambda e#13)) false else true, lambda e#13, false)) AS b#9] +- OneRowRelation ``` ## How was this patch tested? Added new unit test cases to the `ReplaceNullWithFalseInPredicateSuite` (renamed from `ReplaceNullWithFalseSuite`). Closes apache#23079 from rednaxelafx/catalyst-master. Authored-by: Kris Mok <kris.mok@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…icate ## What changes were proposed in this pull request? Based on apache#22857 and apache#23079, this PR did a few updates - Limit the data types of NULL to Boolean. - Limit the input data type of replaceNullWithFalse to Boolean; throw an exception in the testing mode. - Create a new file for the rule ReplaceNullWithFalseInPredicate - Update the description of this rule. ## How was this patch tested? Added a test case Closes apache#23139 from gatorsmile/followupSpark-25860. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com>

dbtsai reviewed Oct 28, 2018

View reviewed changes

viirya reviewed Oct 28, 2018

View reviewed changes

dongjoon-hyun reviewed Oct 28, 2018

View reviewed changes

cloud-fan reviewed Oct 29, 2018

View reviewed changes

aokolnychyi added 2 commits October 29, 2018 23:12

[SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral wheneve…

3aac923

…r possible

Address review comments

4c35955

aokolnychyi force-pushed the spark-25860 branch from 0eac890 to 4c35955 Compare October 29, 2018 23:20

cloud-fan reviewed Oct 30, 2018

View reviewed changes

viirya reviewed Oct 30, 2018

View reviewed changes

cloud-fan reviewed Oct 31, 2018

View reviewed changes

SPARK-25860: Separate end-to-end suite and switch to tables

5499651

asfgit closed this in bc9f9b4 Oct 31, 2018

rednaxelafx mentioned this pull request Nov 18, 2018

[SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicate to support higher-order functions: ArrayExists, ArrayFilter, MapFilter #23079

Closed

gatorsmile reviewed Nov 26, 2018

View reviewed changes

gatorsmile mentioned this pull request Nov 26, 2018

[SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullWithFalseInPredicate #23139

Closed

gatorsmile reviewed Dec 3, 2018

View reviewed changes


		test("SPARK-25860: Replace Literal(null, _) with FalseLiteral whenever possible") {

		def checkPlanIsEmptyLocalScan(df: DataFrame): Unit = df.queryExecution.executedPlan match {

[SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever possible #22857

[SPARK-25860][SQL] Replace Literal(null, _) with FalseLiteral whenever possible #22857

Conversation

aokolnychyi commented Oct 27, 2018

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Oct 27, 2018

aokolnychyi commented Oct 27, 2018

Choose a reason for hiding this comment

dbtsai commented Oct 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun Oct 28, 2018 • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun Oct 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Oct 30, 2018 • edited Loading

Choose a reason for hiding this comment

cloud-fan Oct 30, 2018 • edited Loading

Choose a reason for hiding this comment

aokolnychyi Oct 30, 2018 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Oct 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 30, 2018

gatorsmile commented Oct 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Oct 31, 2018

SparkQA commented Oct 31, 2018

dbtsai commented Oct 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun Oct 28, 2018 •

edited

Loading

dongjoon-hyun Oct 28, 2018 •

edited

Loading

aokolnychyi Oct 30, 2018 •

edited

Loading

cloud-fan Oct 30, 2018 •

edited

Loading

aokolnychyi Oct 30, 2018 •

edited

Loading