-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35596][CORE] HighlyCompressedMapStatus should record accurately the size of skewed shuffle blocks #32733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ize of skewed shuffle blocks
|
Can one of the admins verify this patch? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @exmy .
| "when fetch shuffle blocks.") | ||
| .version("3.2.0") | ||
| .bytesConf(ByteUnit.BYTE) | ||
| .createWithDefault(350 * 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe the background of this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe the background of this value?
Thanks for review. Described in the reply to @mridulm below, hopes to get your opinion.
|
I am missing something here, if a block is < |
| // Huge blocks are not included in the calculation for average size, thus size for smaller | ||
| // blocks is more accurate. | ||
| if (size < threshold) { | ||
| if ((size >= 5 * overallNonEmptyAvgSize && size >= minThreshold) || size >= threshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echo'ing @dongjoon-hyun's comment above - what is the background of this change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We first compute a average size for non-empty uncompressedSizes, if a block is > N * this avg size, it is marked as a huge block. In order to avoid mistaken mark of a block which is > N * this avg size but not big enough, the new config spark.shuffle.accurateSkewedBlockThreshold is introduced. Only a block is both > N * this avg size and accurateSkewedBlockThreshold can be marked as a huge block.
The reason why accurateSkewedBlockThreshold is set default to 350K is because we assume 3k paratitions, only when the amount of data fetched by a reduce task greater than 3000 * 350K = 1G can this situation be considered.
I'm not sure if N = 5 and accurateSkewedBlockThreshold=350K here are appropriate and really hope to get your opinion.
Thanks for review. In our production, |
|
The size estimation helps make judgement about how many concurrent fetches to make, and whether those many concurrent inflight requests can be handled in parallel - it would not result in all of them getting fetched concurrently. |
|
@mridulm Tunning spark.reducer.maxSizeInFlight and spark.reducer.maxReqsInFlight is useful for one job. But it has thousands of jobs in a cluster, it's difficult to set these two configs for every job. Maybe #32287 can solve this problem. But recording the size of blocks more precisely may still be helpful. |
|
These parameters are not for individual stages or jobs - but model behavior based on how much resources are available and what cost is acceptable for an application (memory, number of concurrent io, etc). I would suggest looking more into tuning these for the specifics of the resources available. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
HighlyCompressedMapStatus supports record accurately the size of skewed shuffle block which small than spark.shuffle.accurateBlockThreshold.
Why are the changes needed?
HighlyCompressedMapStatus currently cannot record accurately the size of shuffle blocks which much greater than other block but small than
spark.shuffle.accurateBlockThreshold, which is likely to lead OOM when fetch shuffle blocks. We have to tune some extra properties likespark.reducer.maxReqsInFlightto prevent it, so it is better to fix it in HighlyCompressedMapStatus.Does this PR introduce any user-facing change?
Yes, a new config
spark.shuffle.accurateSkewedBlockThresholdadded.How was this patch tested?
Add new ut.