-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39325][CORE]Improve MapOutputTracker convertMapStatuses performance #36709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
I will take a look at this later this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
+CC @Ngone51 as well, since you had reviewed the original change. |
Ngone51
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice improvment!
|
Thank you all. Merged to master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @wankunde , @mridulm , @Ngone51 , @wangyum ?
Just a question, was the slow performance a regression at Apache Spark 3.2.0 due to SPARK-32921 (and the umbrella issue SPARK-30602)?
|
@dongjoon-hyun I don't think this is a regression since all these changes are for push-based shuffles. |
What changes were proposed in this pull request?
Optimize
MapOutputTracker.convertMapStatuses()method.Why are the changes needed?
MapOutputTracker.convertMapStatuses()will be very slow if there are tens of thousands MapStatuses and MergeStatuses.Benchmark code:
Before this PR
After this PR
Does this PR introduce any user-facing change?
No
How was this patch tested?
Exists UTs.