[WIP][SPARK-38965][SHUFFLE]Optimize RemoteBlockPushResolver with a memory pool#36279
[WIP][SPARK-38965][SHUFFLE]Optimize RemoteBlockPushResolver with a memory pool#36279wankunde wants to merge 7 commits intoapache:masterfrom
Conversation
f85ea3a to
2ad98ea
Compare
|
Good catching ! |
Did you see any issue which you are trying to address here? |
|
Thanks @otterc for your review.
Could we handle |
|
Now RemoteBlockPushResolver does not limit the memory used to receive pushed blocks. So OOM may happen in NodeManager. |
a7be9d1 to
d741acb
Compare
d741acb to
3c16a73
Compare
|
retest this please |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
RemoteBlockPushResolverRemoteBlockPushResolver.BLOCK_APPEND_COLLISION_DETECTEDfailure because only one pushed map data can write the data file at the same time.BlockPushNonFatalFailuretoBlockPushResponse, so the returnCode can beSUCCESS. We can also encode and decode it to a ByteBuffer, so network RPC do not need to rely onBlockTransferMessage(core module). And removePUSH_BLOCK_RETURN_CODEtype fromBlockTransferMessage.Why are the changes needed?
For push-based shuffle service, there are many
BLOCK_APPEND_COLLISION_DETECTEDwhen there are many small map tasks outputs. InRemoteBlockPushResolver, if one map task pushed blocks is writing, the others map tasks pushed blocks will failed inonComplete()method.And
RemoteBlockPushResolverhas no memory limit , so many executors will OOM when there are many small pushed blocks waiting to be written to the final data file.Does this PR introduce any user-facing change?
No
How was this patch tested?
Exists UTs