Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33272][SQL] prune the attributes mapping in QueryPlan.transformUpWithNewOutput #30173

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

For complex query plans, QueryPlan.transformUpWithNewOutput will keep accumulating the attributes mapping to be propagated, which may hurt performance. This PR prunes the attributes mapping before propagating.

Why are the changes needed?

A simple perf improvement.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @maropu @viirya

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34978/

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34978/

Comment on lines +183 to +185
* @param canGetOutput a boolean condition to indicate if we can get the output of a plan node
* to prune the attributes mapping to be propagated. The default value is true
* as only unresolved logical plan can't get output.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this needed to be parameterized? Seems we only need check unresolved logical plan.

Copy link
Contributor Author

@cloud-fan cloud-fan Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like

val canGetOutput = plan match {
  case l: LogicalPlan => l.resolved
  case _ => true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of the current way is we can skip the check for optimizer rules (use the default value true).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Make sense.

@SparkQA
Copy link

SparkQA commented Oct 28, 2020

Test build #130375 has finished for PR 30173 at commit 4f94664.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement! LGTM

@maropu
Copy link
Member

maropu commented Oct 28, 2020

Thanks! Merged to master.

@maropu maropu closed this in 2639ad4 Oct 28, 2020
@gaoyajun02
Copy link
Contributor

hi @cloud-fan, can you help to open backport PRs for 3.0.2? thanks!

I found that prunes the attributes mapping is not only performance optimized, it also circumvents the bug of rewriting attributes for some complex queries, e.g. https://issues.apache.org/jira/browse/SPARK-36815

@cloud-fan
Copy link
Contributor Author

@gaoyajun02 can you cherry-pick this commit and open the backport PR? I'm quite busy this week and I can do the backport next week if you are not able to do it.

gaoyajun02 pushed a commit to gaoyajun02/spark that referenced this pull request Sep 22, 2021
…mUpWithNewOutput

### What changes were proposed in this pull request?

For complex query plans, `QueryPlan.transformUpWithNewOutput` will keep accumulating the attributes mapping to be propagated, which may hurt performance. This PR prunes the attributes mapping before propagating.

### Why are the changes needed?

A simple perf improvement.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes apache#30173 from cloud-fan/bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

(cherry picked from commit 2639ad4)
viirya pushed a commit that referenced this pull request Sep 22, 2021
…QueryPlan.transformUpWithNewOutput

### What changes were proposed in this pull request?

This is a backport PR of #30173.

For complex query plans, `QueryPlan.transformUpWithNewOutput` will keep accumulating the attributes mapping to be propagated, which may hurt performance. This PR prunes the attributes mapping before propagating.

### Why are the changes needed?

A simple perf improvement.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #34068 from gaoyajun02/SPARK-33272.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants