Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND #29794

Closed
wants to merge 2 commits into from

Conversation

tanelk
Copy link
Contributor

@tanelk tanelk commented Sep 17, 2020

What changes were proposed in this pull request?

Add canonicalization rules for commutative bitwise operations.

Why are the changes needed?

Canonical form is used in many other optimization rules. Reduces the number of cases, where plans with identical results are considered to be distinct.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@tanelk tanelk changed the title Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND [SPARK-32927][SQL] Bitwise OR, AND and XOR should have similar canonicalization rules to boolean OR and AND Sep 17, 2020
@@ -80,6 +80,13 @@ object Canonicalize {
orderCommutative(a, { case And(l, r) if l.deterministic && r.deterministic => Seq(l, r)})
.reduce(And)

case o: BitwiseOr =>
orderCommutative(o, { case BitwiseOr(l, r) => Seq(l, r) }).reduce(BitwiseOr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check whether both branches are deterministic like or/and in the above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitwise operations will always execute both sides. Boolean AND and OR can short circuit and not execute one side.
The bitwise operations are more like + and * in this regard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I was thinking more about side-effect from non-deterministic expressions (e.g., println), and the evaluation order matters in this case, but it seems other existing transformations don't handle this as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canonicalized expression is only used for determining if two expressions have same result. We don't run it directly. And we usually consider deterministic together when using canonicalized, e.g. semanticEquals.

@maropu
Copy link
Member

maropu commented Sep 18, 2020

ok to test

@maropu
Copy link
Member

maropu commented Sep 18, 2020

add to whiltelist

@@ -95,4 +96,19 @@ class CanonicalizeSuite extends SparkFunSuite {
val castWithTimeZoneId = Cast(literal, LongType, Some(TimeZone.getDefault.getID))
assert(castWithTimeZoneId.semanticEquals(cast))
}

test("SPARK-32927: Bitwise operations are commutative") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more test cases, e.g., non-deterministic exprs, literals, and mixed cases BitwiseOr/BitwiseAnd/BitwiseXor?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me too

@SparkQA
Copy link

SparkQA commented Sep 18, 2020

Test build #128842 has finished for PR 29794 at commit cf6c7e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2020

Test build #128852 has finished for PR 29794 at commit 456ffd2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Sep 18, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Sep 18, 2020

Test build #128866 has finished for PR 29794 at commit 456ffd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk
Copy link
Contributor Author

tanelk commented Sep 25, 2020

@maropu , @HyukjinKwon are there any more issues to solve before this can be merged?

@HyukjinKwon
Copy link
Member

cc @cloud-fan and @viirya from #15388. Will merge this in if there are no more comments from you.

@maropu
Copy link
Member

maropu commented Sep 27, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Sep 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33772/

@SparkQA
Copy link

SparkQA commented Sep 28, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33772/

@HyukjinKwon
Copy link
Member

Merged to master.

@SparkQA
Copy link

SparkQA commented Sep 28, 2020

Test build #129157 has finished for PR 29794 at commit 456ffd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants