Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Nov 29, 2016

What changes were proposed in this pull request?

Attribute is not NullIntolerant. This PR is to fix it.

Without the fix, the following test case will return empty.

val data = Seq[java.lang.Integer](1, null).toDF("key")
data.filter("not key is not null").show()

Before the fix, the optimized plan is like

== Optimized Logical Plan ==
Project [value#1 AS key#3]
+- Filter (isnotnull(value#1) && NOT isnotnull(value#1))
   +- LocalRelation [value#1]

After the fix, the optimized plan is like

== Optimized Logical Plan ==
Project [value#1 AS key#3]
+- Filter NOT isnotnull(value#1)
   +- LocalRelation [value#1]

How was this patch tested?

Added a test

@rxin
Copy link
Contributor

rxin commented Nov 29, 2016

Can you explain how did nullintolerant impact the case?

@gatorsmile
Copy link
Member Author

Sure, will update the PR description tomorrow. Thanks!

@SparkQA
Copy link

SparkQA commented Nov 29, 2016

Test build #69310 has finished for PR 16055 at commit 33c10a0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class Attribute extends LeafExpression with NamedExpression

@gatorsmile
Copy link
Member Author

This does not resolve all the cases. Will submit a better fix today. : )

@gatorsmile gatorsmile closed this Nov 29, 2016
@gatorsmile
Copy link
Member Author

gatorsmile commented Nov 29, 2016

The above fix does not cover all the cases. Found the root cause.

The constraints of an operator is the expressions that evaluate to true for all the rows produced. That means, the expression result should be neither false nor unknown (NULL). Thus, we can conclude that IsNotNull on all the constraints, which are generated by its own predicates or propagated from the children. The constraint can be a complex expression. For better usage of these constraints, we try to push down IsNotNull to the lowest-level expressions. IsNotNull can be pushed through an expression when it is null intolerant. (When the input is NULL, the null-intolerant expression always evaluates to NULL.)

Below is the code we have for IsNotNull pushdown.

  var isNotNullConstraints: Set[Expression] =
      constraints.flatMap(scanNullIntolerantExpr).map(IsNotNull(_))

  private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr match {
    case a: Attribute => Seq(a)
    case _: NullIntolerant | IsNotNull(_: NullIntolerant) =>
      expr.children.flatMap(scanNullIntolerantExpr)
    case _ => Seq.empty[Attribute]
  }

IsNotNull is not null-intolerant. It converts null to false. If the expression does not include any Not-like expression, it works; otherwise, it could generate a wrong result. The above function needs to be corrected to

  var isNotNullConstraints: Set[Expression] =
      constraints.flatMap(scanNullIntolerantExpr).map(IsNotNull(_))

  private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr match {
    case a: Attribute => Seq(a)
    case _: NullIntolerant => expr.children.flatMap(scanNullIntolerantExpr)
    case _ => Seq.empty[Attribute]
  }

This fixes the problem, but we need a smarter fix for avoiding regressions. Now, working on a better fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants