New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND #29794
Conversation
@@ -80,6 +80,13 @@ object Canonicalize { | |||
orderCommutative(a, { case And(l, r) if l.deterministic && r.deterministic => Seq(l, r)}) | |||
.reduce(And) | |||
|
|||
case o: BitwiseOr => | |||
orderCommutative(o, { case BitwiseOr(l, r) => Seq(l, r) }).reduce(BitwiseOr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check whether both branches are deterministic like or/and in the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bitwise operations will always execute both sides. Boolean AND and OR can short circuit and not execute one side.
The bitwise operations are more like + and * in this regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I was thinking more about side-effect from non-deterministic expressions (e.g., println), and the evaluation order matters in this case, but it seems other existing transformations don't handle this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Canonicalized expression is only used for determining if two expressions have same result. We don't run it directly. And we usually consider deterministic
together when using canonicalized
, e.g. semanticEquals
.
ok to test |
add to whiltelist |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
Outdated
Show resolved
Hide resolved
@@ -95,4 +96,19 @@ class CanonicalizeSuite extends SparkFunSuite { | |||
val castWithTimeZoneId = Cast(literal, LongType, Some(TimeZone.getDefault.getID)) | |||
assert(castWithTimeZoneId.semanticEquals(cast)) | |||
} | |||
|
|||
test("SPARK-32927: Bitwise operations are commutative") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add more test cases, e.g., non-deterministic exprs, literals, and mixed cases BitwiseOr
/BitwiseAnd
/BitwiseXor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me too
Test build #128842 has finished for PR 29794 at commit
|
Test build #128852 has finished for PR 29794 at commit
|
retest this please |
Test build #128866 has finished for PR 29794 at commit
|
@maropu , @HyukjinKwon are there any more issues to solve before this can be merged? |
cc @cloud-fan and @viirya from #15388. Will merge this in if there are no more comments from you. |
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Merged to master. |
Test build #129157 has finished for PR 29794 at commit
|
What changes were proposed in this pull request?
Add canonicalization rules for commutative bitwise operations.
Why are the changes needed?
Canonical form is used in many other optimization rules. Reduces the number of cases, where plans with identical results are considered to be distinct.
Does this PR introduce any user-facing change?
No
How was this patch tested?
UT