Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39911][SQL][3.3] Optimize global Sort to RepartitionByExpression #37373

Closed
wants to merge 1 commit into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Aug 2, 2022

this is for backport #37330 into branch-3.3

What changes were proposed in this pull request?

Optimize Global sort to RepartitionByExpression, for example:

Sort local             Sort local
  Sort global    =>      RepartitionByExpression

Why are the changes needed?

If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort.

Does this PR introduce any user-facing change?

we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment)

How was this patch tested?

add test

Closes #37330 from ulysses-you/optimize-sort.

Authored-by: ulysses-you ulyssesyou18@gmail.com
Signed-off-by: Wenchen Fan wenchen@databricks.com

### What changes were proposed in this pull request?

Optimize Global sort to RepartitionByExpression, for example:
```
Sort local             Sort local
  Sort global    =>      RepartitionByExpression
```

### Why are the changes needed?

If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort.

### Does this PR introduce _any_ user-facing change?

no, only improve performance

### How was this patch tested?

add test

Closes apache#37330 from ulysses-you/optimize-sort.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@github-actions github-actions bot added the SQL label Aug 2, 2022
@ulysses-you
Copy link
Contributor Author

cc @cloud-fan

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe some context, please? The current PR description is only describing improvements which is not qualified for legitimate backporting. I believe you have a reason to fix some bugs or regression, @ulysses-you and @cloud-fan .

no, only improve performance

@ulysses-you
Copy link
Contributor Author

@dongjoon-hyun yes, the story is we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment)

@dongjoon-hyun
Copy link
Member

Than you, @ulysses-you . Please put the explanation into the PR description.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @huaxingao , too.

@ulysses-you
Copy link
Contributor Author

@dongjoon-hyun thank you, I have updated the description

@cloud-fan
Copy link
Contributor

@cloud-fan
Copy link
Contributor

merging to 3.3!

cloud-fan pushed a commit that referenced this pull request Aug 3, 2022
this is for backport #37330 into branch-3.3
### What changes were proposed in this pull request?

Optimize Global sort to RepartitionByExpression, for example:
```
Sort local             Sort local
  Sort global    =>      RepartitionByExpression
```

### Why are the changes needed?

If a global sort below a local sort, the only meaningful thing is it's distribution. So this pr optimizes that global sort to RepartitionByExpression to save a local sort.

### Does this PR introduce _any_ user-facing change?

we fix a bug in #37250 and that pr backport into branch-3.3. However, that fix may introduce performance regression. This pr itself is only to improve performance but in order to avoid the regression, we also backport this pr. see the details #37330 (comment)

### How was this patch tested?

add test

Closes #37330 from ulysses-you/optimize-sort.

Authored-by: ulysses-you <ulyssesyou18gmail.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>

Closes #37373 from ulysses-you/SPARK-39911-3.3.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this Aug 3, 2022
@ulysses-you ulysses-you deleted the SPARK-39911-3.3 branch August 3, 2022 03:14
@dongjoon-hyun
Copy link
Member

Thank you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants