Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30353][SQL] Add IsNotNull check in SimplifyBinaryComparison optimization #27008

Closed
wants to merge 9 commits into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Dec 25, 2019

What changes were proposed in this pull request?

Now Spark can propagate constraint during sql optimization when spark.sql.constraintPropagation.enabled is true, then where c = 1 will convert to where c = 1 and c is not null. We also can use constraint in SimplifyBinaryComparison.

SimplifyBinaryComparison will simplify expression which is not nullable and semanticEquals. And we also can simplify if one expression is infered IsNotNull.

Why are the changes needed?

Simplify SQL.

create table test (c1 string);

explain extended select c1 from test where c1 = c1 limit 10;
-- before
GlobalLimit 10
+- LocalLimit 10
   +- Filter (isnotnull(c1#20) AND (c1#20 = c1#20))
      +- Relation[c1#20]
-- after
GlobalLimit 10
+- LocalLimit 10
    +- Filter (isnotnull(c1#20)
        +- Relation[c1#20]

explain extended select c1 from test where c1 > c1 limit 10;
-- before
GlobalLimit 10
+- LocalLimit 10
   +- Filter (isnotnull(c1#20) && (c1#20 > c1#20))
      +- Relation[c1#20]
-- after
LocalRelation <empty>, [c1#20]

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Add UT.

@SparkQA
Copy link

SparkQA commented Dec 25, 2019

Test build #115772 has finished for PR 27008 at commit 3024dd8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -115,6 +114,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
Batch("Infer Filters", Once,
InferFiltersFromConstraints) ::
Batch("Operator Optimization after Inferring Filters", fixedPoint,
Seq(SimplifyBinaryComparison) ++
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should apply InferFiltersFromConstraints first and then apply SimplifyBinaryComparison to retain isnotnull filter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rules in operatorOptimizationRuleSet are run after inferring filters and SimplifyBinaryComparison is in the rules, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because operatorOptimizationRuleSet will run double time. Once before and once after InferFiltersFromConstraints.

We should only do thie optimization after InferFiltersFromConstraints.

@SparkQA
Copy link

SparkQA commented Dec 26, 2019

Test build #115779 has finished for PR 27008 at commit 0da71f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

if (SQLConf.get.constraintPropagationEnabled) {
canSimplify(left, right) || plan.constraints.exists {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we limit this constrain check only in Filter plans?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes, this rely on Filter constraint which InferFiltersFromConstraints add.

case a GreaterThan b if !a.nullable && !b.nullable && a.semanticEquals(b) => FalseLiteral
case a LessThan b if !a.nullable && !b.nullable && a.semanticEquals(b) => FalseLiteral
case a GreaterThan b if canSimplifyWithConstraints(l, a, b) => FalseLiteral
case a LessThan b if canSimplifyWithConstraints(l, a, b) => FalseLiteral
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canSimplifyWithConstraints => canSimplifyComparison?

@maropu
Copy link
Member

maropu commented Dec 26, 2019

cc: @viirya

@SparkQA
Copy link

SparkQA commented Dec 26, 2019

Test build #115815 has finished for PR 27008 at commit 79c97ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


private def canSimplifyComparison(
plan: LogicalPlan, left: Expression, right: Expression): Boolean = {
def canSimplify(left: Expression, right: Expression): Boolean = {
Copy link
Member

@viirya viirya Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canSimplify is not reuse. We can just inline it. It will looks more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

return true
}

if (SQLConf.get.constraintPropagationEnabled) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this optimization constraint propagation only? When constraintPropagationEnabled is disabled, a Filter with predicate like IsNotNull(e) && e > e can still be simplified, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It also can optimize suchIsNotNull(e) && e > e without constraintPropagationEnabled.

@ulysses-you
Copy link
Contributor Author

ulysses-you commented Dec 27, 2019

We can make this simply.

We just need to check if IsNotNull exists. If exists then optimize else not. Constraints is used by InferFiltersFromConstraints, we don't need to rely on it strongly.

I changed the pr title.
cc @maropu @viirya

@ulysses-you ulysses-you changed the title [SPARK-30353][SQL] Use constraints in SimplifyBinaryComparison optimization [SPARK-30353][SQL] Add IsNotNull check in SimplifyBinaryComparison optimization Dec 27, 2019
@@ -122,4 +127,49 @@ class BinaryComparisonSimplificationSuite extends PlanTest with PredicateHelper

comparePlans(optimized, correctAnswer)
}

test("Simplify null and nonnull with filter constraints") {
Seq('a === 'a, 'a <= 'a, 'a >= 'a, 'a < 'a, 'a > 'a).foreach { condition =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can't we avoid to use symbol? See https://github.com/databricks/scala-style-guide#symbol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the remind. Is there any simply way to instead of symbol ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least you can do Symbol(...). This isn't a nit actually because we should support Scala 2.13 properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. The scala-style-guide has the new update. Thanks again.

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115827 has finished for PR 27008 at commit 984e607.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115828 has finished for PR 27008 at commit 2c4dd63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 27, 2019

Test build #115846 has finished for PR 27008 at commit 9b0d268.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Dec 28, 2019

retest this please

@@ -384,22 +384,40 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
/**
* Simplifies binary comparisons with semantically-equal expressions:
* 1) Replace '<=>' with 'true' literal.
* 2) Replace '=', '<=', and '>=' with 'true' literal if both operands are non-nullable.
* 3) Replace '<' and '>' with 'false' literal if both operands are non-nullable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update these statements above? IIUC this pr just checks more for non-nullable cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for remind, seems not need. Revert this.

@SparkQA
Copy link

SparkQA commented Dec 28, 2019

Test build #115883 has finished for PR 27008 at commit 9b0d268.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

case _ => false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

  private def canSimplifyComparison(
      plan: LogicalPlan, left: Expression, right: Expression): Boolean = {
    if (!left.nullable && !right.nullable && left.semanticEquals(right)) {
      true
    } else {
      // We do more checks for non-nullable cases
      plan match {
        case Filter(fc, _) =>
          splitConjunctivePredicates(fc).exists { condition =>
            condition.semanticEquals(IsNotNull(left)) && condition.semanticEquals(IsNotNull(right))
          }
        case _ =>
          false
      }
    }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.

@SparkQA
Copy link

SparkQA commented Dec 30, 2019

Test build #115922 has finished for PR 27008 at commit 87d1c6c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 30, 2019

Test build #115923 has finished for PR 27008 at commit 1c1630e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

@maropu @HyukjinKwon @viirya Do you have time to review again? Any feedback is great.
Also cc @cloud-fan .

plan match {
case Filter(fc, _) =>
splitConjunctivePredicates(fc).exists { condition =>
condition.semanticEquals(IsNotNull(left)) && condition.semanticEquals(IsNotNull(right))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is always false IIUC. How can a condition be both IsNotNull(left) and IsNotNull(right)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scene, left is semanticEquals to right. Like this where c > c, and if there exists IsNotNull(c), then will return true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then how about

if (left.semanticEquals(right)) {
  if (!left.nullable && !right.nullable) {
    true
  } else {
    plan match ...
      condition.semanticEquals(IsNotNull(left))
  }
} else {
  false
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

@SparkQA
Copy link

SparkQA commented Jan 10, 2020

Test build #116473 has finished for PR 27008 at commit af0fbd6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.foreach { condition =>
val plan = nullableRelation.where(condition).analyze
val actual = Optimize.execute(plan)
val correctAnswer = nullableRelation.analyze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be nullableRelation.where(false)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False has been remove by PruneFilters, so result is just empty LocalRelation.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 823e3d3 Jan 12, 2020
@ulysses-you
Copy link
Contributor Author

@cloud-fan thanks for merging. Thanks all !

maropu pushed a commit that referenced this pull request Jan 15, 2020
…me complexity

### What changes were proposed in this pull request?

The changes in the rule `SimplifyBinaryComparison` from #27008 could bring performance regression in the optimizer when there are a large set of filter conditions.

We need to improve the implementation and reduce the time complexity.

### Why are the changes needed?

Need to fix the potential performance regression in the optimizer.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing unit tests.
Also run a micor benchmark in `BinaryComparisonSimplificationSuite`
```
object Optimize extends RuleExecutor[LogicalPlan] {
    val batches =
      Batch("Constant Folding", FixedPoint(50),
        SimplifyBinaryComparison) :: Nil
  }

test("benchmark") {
  val a = Symbol("a")
  val condition = (1 to 500).map(i => EqualTo(a, a)).reduceLeft(And)
  val finalCondition = And(condition, IsNotNull(a))
  val plan = nullableRelation.where(finalCondition).analyze
  val start = System.nanoTime()
  Optimize.execute(plan)
  println((System.nanoTime() - start) /1000000)
}
```

Before the changes: 2507ms
After the changes: 3ms

Closes #27212 from gengliangwang/SimplifyBinaryComparison.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@ulysses-you ulysses-you deleted the SPARK-30353 branch July 4, 2022 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants