New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30353][SQL] Add IsNotNull check in SimplifyBinaryComparison optimization #27008
Conversation
Test build #115772 has finished for PR 27008 at commit
|
@@ -115,6 +114,7 @@ abstract class Optimizer(catalogManager: CatalogManager) | |||
Batch("Infer Filters", Once, | |||
InferFiltersFromConstraints) :: | |||
Batch("Operator Optimization after Inferring Filters", fixedPoint, | |||
Seq(SimplifyBinaryComparison) ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should apply InferFiltersFromConstraints
first and then apply SimplifyBinaryComparison
to retain isnotnull
filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rules in operatorOptimizationRuleSet
are run after inferring filters and SimplifyBinaryComparison
is in the rules, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because operatorOptimizationRuleSet
will run double time. Once before and once after InferFiltersFromConstraints
.
We should only do thie optimization after InferFiltersFromConstraints
.
Test build #115779 has finished for PR 27008 at commit
|
} | ||
|
||
if (SQLConf.get.constraintPropagationEnabled) { | ||
canSimplify(left, right) || plan.constraints.exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we limit this constrain check only in Filter
plans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, this rely on Filter
constraint which InferFiltersFromConstraints
add.
case a GreaterThan b if !a.nullable && !b.nullable && a.semanticEquals(b) => FalseLiteral | ||
case a LessThan b if !a.nullable && !b.nullable && a.semanticEquals(b) => FalseLiteral | ||
case a GreaterThan b if canSimplifyWithConstraints(l, a, b) => FalseLiteral | ||
case a LessThan b if canSimplifyWithConstraints(l, a, b) => FalseLiteral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canSimplifyWithConstraints
=> canSimplifyComparison
?
cc: @viirya |
Test build #115815 has finished for PR 27008 at commit
|
|
||
private def canSimplifyComparison( | ||
plan: LogicalPlan, left: Expression, right: Expression): Boolean = { | ||
def canSimplify(left: Expression, right: Expression): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canSimplify is not reuse. We can just inline it. It will looks more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
return true | ||
} | ||
|
||
if (SQLConf.get.constraintPropagationEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this optimization constraint propagation only? When constraintPropagationEnabled is disabled, a Filter with predicate like IsNotNull(e) && e > e
can still be simplified, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It also can optimize suchIsNotNull(e) && e > e
without constraintPropagationEnabled
.
@@ -122,4 +127,49 @@ class BinaryComparisonSimplificationSuite extends PlanTest with PredicateHelper | |||
|
|||
comparePlans(optimized, correctAnswer) | |||
} | |||
|
|||
test("Simplify null and nonnull with filter constraints") { | |||
Seq('a === 'a, 'a <= 'a, 'a >= 'a, 'a < 'a, 'a > 'a).foreach { condition => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can't we avoid to use symbol? See https://github.com/databricks/scala-style-guide#symbol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the remind. Is there any simply way to instead of symbol ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least you can do Symbol(...)
. This isn't a nit actually because we should support Scala 2.13 properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. The scala-style-guide has the new update. Thanks again.
Test build #115827 has finished for PR 27008 at commit
|
Test build #115828 has finished for PR 27008 at commit
|
Test build #115846 has finished for PR 27008 at commit
|
retest this please |
@@ -384,22 +384,40 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { | |||
/** | |||
* Simplifies binary comparisons with semantically-equal expressions: | |||
* 1) Replace '<=>' with 'true' literal. | |||
* 2) Replace '=', '<=', and '>=' with 'true' literal if both operands are non-nullable. | |||
* 3) Replace '<' and '>' with 'false' literal if both operands are non-nullable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update these statements above? IIUC this pr just checks more for non-nullable cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for remind, seems not need. Revert this.
Test build #115883 has finished for PR 27008 at commit
|
} | ||
|
||
case _ => false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
private def canSimplifyComparison(
plan: LogicalPlan, left: Expression, right: Expression): Boolean = {
if (!left.nullable && !right.nullable && left.semanticEquals(right)) {
true
} else {
// We do more checks for non-nullable cases
plan match {
case Filter(fc, _) =>
splitConjunctivePredicates(fc).exists { condition =>
condition.semanticEquals(IsNotNull(left)) && condition.semanticEquals(IsNotNull(right))
}
case _ =>
false
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine.
Test build #115922 has finished for PR 27008 at commit
|
Test build #115923 has finished for PR 27008 at commit
|
@maropu @HyukjinKwon @viirya Do you have time to review again? Any feedback is great. |
plan match { | ||
case Filter(fc, _) => | ||
splitConjunctivePredicates(fc).exists { condition => | ||
condition.semanticEquals(IsNotNull(left)) && condition.semanticEquals(IsNotNull(right)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is always false IIUC. How can a condition be both IsNotNull(left)
and IsNotNull(right)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this scene, left is semanticEquals to right. Like this where c > c
, and if there exists IsNotNull(c)
, then will return true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then how about
if (left.semanticEquals(right)) {
if (!left.nullable && !right.nullable) {
true
} else {
plan match ...
condition.semanticEquals(IsNotNull(left))
}
} else {
false
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
Test build #116473 has finished for PR 27008 at commit
|
.foreach { condition => | ||
val plan = nullableRelation.where(condition).analyze | ||
val actual = Optimize.execute(plan) | ||
val correctAnswer = nullableRelation.analyze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be nullableRelation.where(false)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
False
has been remove by PruneFilters
, so result is just empty LocalRelation.
thanks, merging to master! |
@cloud-fan thanks for merging. Thanks all ! |
…me complexity ### What changes were proposed in this pull request? The changes in the rule `SimplifyBinaryComparison` from #27008 could bring performance regression in the optimizer when there are a large set of filter conditions. We need to improve the implementation and reduce the time complexity. ### Why are the changes needed? Need to fix the potential performance regression in the optimizer. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing unit tests. Also run a micor benchmark in `BinaryComparisonSimplificationSuite` ``` object Optimize extends RuleExecutor[LogicalPlan] { val batches = Batch("Constant Folding", FixedPoint(50), SimplifyBinaryComparison) :: Nil } test("benchmark") { val a = Symbol("a") val condition = (1 to 500).map(i => EqualTo(a, a)).reduceLeft(And) val finalCondition = And(condition, IsNotNull(a)) val plan = nullableRelation.where(finalCondition).analyze val start = System.nanoTime() Optimize.execute(plan) println((System.nanoTime() - start) /1000000) } ``` Before the changes: 2507ms After the changes: 3ms Closes #27212 from gengliangwang/SimplifyBinaryComparison. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
Now Spark can propagate constraint during sql optimization when
spark.sql.constraintPropagation.enabled
is true, thenwhere c = 1
will convert towhere c = 1 and c is not null
. We also can use constraint inSimplifyBinaryComparison
.SimplifyBinaryComparison
will simplify expression which is not nullable and semanticEquals. And we also can simplify if one expression is inferedIsNotNull
.Why are the changes needed?
Simplify SQL.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Add UT.