Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34222][SQL] Enhance boolean simplification rule #31318

Closed
wants to merge 8 commits into from

Conversation

Swinky
Copy link
Contributor

@Swinky Swinky commented Jan 25, 2021

What changes were proposed in this pull request?

Enhance boolean simplification rule by handling following scenarios:
(((a && b) && a && (a && c))) => a && b && c)
(((a || b) || a || (a || c))) => a || b || c

Why are the changes needed?

Minor improvement

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UTs

@github-actions github-actions bot added the SQL label Jan 25, 2021
@Swinky
Copy link
Contributor Author

Swinky commented Feb 17, 2021

@cloud-fan,@maropu,
could you please have a look at this. thanks !

@maropu
Copy link
Member

maropu commented Feb 18, 2021

ok to test

@maropu
Copy link
Member

maropu commented Feb 18, 2021

cc: @wangyum

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39790/

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39790/

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Test build #135209 has finished for PR 31318 at commit 8b9ba7f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39856/

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39856/

@SparkQA
Copy link

SparkQA commented Feb 19, 2021

Test build #135275 has finished for PR 31318 at commit 3b188a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

and
} else {
// (((a && b) && a && (a && c))) => a && b && c
distinct.reduce(And)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one concern is, if there are many conjunctive predicates, here we may build a left-deep tree while the original one is balanced. Can we add a util function in PredicateHelper to build a balanced And/Or tree?

cc @gengliangwang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan yes that would be helpful, especially in filter pushdown of data sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @cloud-fan, good observation. I have added required Utils to address this.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40161/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40161/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135580 has finished for PR 31318 at commit 7ce6731.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40187/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40187/

@SparkQA
Copy link

SparkQA commented Mar 1, 2021

Test build #135606 has finished for PR 31318 at commit 2c0d72c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left the minor comments and it looks fine. Could you take another look? @cloud-fan @gengliangwang @c21

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Test build #135624 has started for PR 31318 at commit 5988c8f.

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40204/

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40204/

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work, @Swinky !

Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with only one minor comment.

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Test build #135669 has started for PR 31318 at commit 699c1f4.

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40251/

@SparkQA
Copy link

SparkQA commented Mar 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40251/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 229d2e0 Mar 3, 2021
or
} else {
// (a || b) || a || (a || c) => a || b || c
buildBalancedPredicate(distinct.toSeq, Or)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to call distinct.toSeq here? The current implementation of ExpressionSet keeps the original order of expressions, but I'm not sure that its design guarantees the order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu - ExpressionSet implements Iterable.iterator() method, so .toSeq will return the expressions in original order, right? Do you think we need to add more documentation on ExpressionSet to harden this assumption?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need to add more documentation on ExpressionSet to harden this assumption?

Yea, documenting it explicitly looks better to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants