[SPARK-39072][SHUFFLE]Fast fail the remaining push blocks if shuffle …#36411
[SPARK-39072][SHUFFLE]Fast fail the remaining push blocks if shuffle …#36411wankunde wants to merge 2 commits intoapache:masterfrom
Conversation
|
Can one of the admins verify this patch? |
|
+CC @otterc |
|
@wankunde Can you please provide more details/logs of the problem that you are trying to solve. In specific, can you provide some logs that exhibit the below
|
1 similar comment
|
@wankunde Can you please provide more details/logs of the problem that you are trying to solve. In specific, can you provide some logs that exhibit the below
|
|
Hi @otterc , You are right, we do not need this PR because ShuffleBlockPusher already uses config maxBlocksInFlightPerAddress. But I found that some ESS received FinalizeShuffleMerge RPC after few seconds. I am not sure if it is because there are many in flight pushing blocks to those ESS. |
|
Driver Logs ESS logs |
|
Close this PR |
…stage finalized
What changes were proposed in this pull request?
Limit the push blocks in flight and try to stop push the remaining blocks shuffle stage is finalized.
Why are the changes needed?
Map task will try to push all map outputs to external shuffle service now.
After the shuffle stage is finalized, the reduce fetch blocks RPC will be blocked if there are still many map output blocks in flight.
We could stop pushing the remaining blocks if the shuffle stage is finalized.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Exists UT