-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34222][SQL] Enhance boolean simplification rule #31318
Conversation
@cloud-fan,@maropu, |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
ok to test |
cc: @wangyum |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #135209 has finished for PR 31318 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #135275 has finished for PR 31318 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
and | ||
} else { | ||
// (((a && b) && a && (a && c))) => a && b && c | ||
distinct.reduce(And) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one concern is, if there are many conjunctive predicates, here we may build a left-deep tree while the original one is balanced. Can we add a util function in PredicateHelper
to build a balanced And/Or tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan yes that would be helpful, especially in filter pushdown of data sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @cloud-fan, good observation. I have added required Utils to address this.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #135580 has finished for PR 31318 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #135606 has finished for PR 31318 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left the minor comments and it looks fine. Could you take another look? @cloud-fan @gengliangwang @c21
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
Test build #135624 has started for PR 31318 at commit |
Kubernetes integration test starting |
Kubernetes integration test status failure |
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work, @Swinky !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with only one minor comment.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
Outdated
Show resolved
Hide resolved
Test build #135669 has started for PR 31318 at commit |
Kubernetes integration test starting |
Kubernetes integration test status success |
thanks, merging to master! |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Show resolved
Hide resolved
or | ||
} else { | ||
// (a || b) || a || (a || c) => a || b || c | ||
buildBalancedPredicate(distinct.toSeq, Or) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to call distinct.toSeq
here? The current implementation of ExpressionSet
keeps the original order of expressions, but I'm not sure that its design guarantees the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu - ExpressionSet
implements Iterable.iterator()
method, so .toSeq
will return the expressions in original order, right? Do you think we need to add more documentation on ExpressionSet
to harden this assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need to add more documentation on ExpressionSet to harden this assumption?
Yea, documenting it explicitly looks better to me.
What changes were proposed in this pull request?
Enhance boolean simplification rule by handling following scenarios:
(((a && b) && a && (a && c))) => a && b && c)
(((a || b) || a || (a || c))) => a || b || c
Why are the changes needed?
Minor improvement
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added UTs