-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39911][SQL][3.3] Optimize global Sort to RepartitionByExpression #37373
Conversation
### What changes were proposed in this pull request? Optimize Global sort to RepartitionByExpression, for example: ``` Sort local Sort local Sort global => RepartitionByExpression ``` ### Why are the changes needed? If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort. ### Does this PR introduce _any_ user-facing change? no, only improve performance ### How was this patch tested? add test Closes apache#37330 from ulysses-you/optimize-sort. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cc @cloud-fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe some context, please? The current PR description is only describing improvements which is not qualified for legitimate backporting. I believe you have a reason to fix some bugs or regression, @ulysses-you and @cloud-fan .
no, only improve performance
@dongjoon-hyun yes, the story is we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment) |
Than you, @ulysses-you . Please put the explanation into the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @huaxingao , too.
@dongjoon-hyun thank you, I have updated the description |
GA has passed: https://github.com/ulysses-you/spark/runs/7624731865 |
merging to 3.3! |
this is for backport #37330 into branch-3.3 ### What changes were proposed in this pull request? Optimize Global sort to RepartitionByExpression, for example: ``` Sort local Sort local Sort global => RepartitionByExpression ``` ### Why are the changes needed? If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort. ### Does this PR introduce _any_ user-facing change? we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment) ### How was this patch tested? add test Closes #37330 from ulysses-you/optimize-sort. Authored-by: ulysses-you <ulyssesyou18gmail.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> Closes #37373 from ulysses-you/SPARK-39911-3.3. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Thank you all. |
this is for backport #37330 into branch-3.3
What changes were proposed in this pull request?
Optimize Global sort to RepartitionByExpression, for example:
Why are the changes needed?
If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort.
Does this PR introduce any user-facing change?
we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment)
How was this patch tested?
add test
Closes #37330 from ulysses-you/optimize-sort.
Authored-by: ulysses-you ulyssesyou18@gmail.com
Signed-off-by: Wenchen Fan wenchen@databricks.com