-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9563][SQL] Remove repartition operators when they are the child of Exchange and shuffle=True #7959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #39858 has finished for PR 7959 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, we also need a test case to ensure that this pruning does not apply if shuffle=False.
|
One other thought: a related optimization might be to collapse adjacent repartition calls. E.g. if I call Similar optimizations may apply to |
|
To optimise |
|
Test build #40000 has finished for PR 7959 at commit
|
|
retest this please. |
|
Test build #248 has finished for PR 7959 at commit
|
|
Test build #40022 has finished for PR 7959 at commit
|
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
|
Test build #40285 has finished for PR 7959 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this case ever occur, given that Exchange operators are currently added all at once in a single pass? Are you planning ahead in case we implement Repartition using Exchange, as in #8030?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, because logical.RepartitionByExpression will be transformed to execution.Exchange as well. I am wondering if it is possible to have an Exchange here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Exchange for a Repartition is another possible case too.
|
/cc @yhuai, do you have any thoughts on this PR? The basic change here seems fine to me. |
|
test this please. |
|
retest this please. |
|
Test build #40429 has finished for PR 7959 at commit
|
|
@JoshRosen Hi, I think this patch stays here too long. Is it ok to merge it? |
|
@yhuai any thoughts? I think the high level approach makes sense. |
JIRA: https://issues.apache.org/jira/browse/SPARK-9563
We should remove repartition operators when they are the child of Exchange and shuffle=True.