-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37023][CORE] Avoid fetching merge status when shuffleMergeEnabled is false for a shuffleDependency during retry #34461
Conversation
…false for a shuffleDependency during retry
cc @mridulm @Ngone51 @venkata91 @zhouyejoe Please take a look. Thanks. |
…asedShuffleMapSizesByExecutorId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a minor comment.
The change looks good to me.
+CC @Ngone51
Ok to test |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #144884 has finished for PR 34461 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this looks like a bug, could you add a unit test case please, @rmcyang ?
Added unit test. cc @mridulm @dongjoon-hyun @zhouyejoe |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #144940 has finished for PR 34461 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that this condition may be wrong:
spark/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
Lines 134 to 136 in 31b6f61
if (baseShuffleHandle.dependency.shuffleMergeEnabled) { | |
val res = SparkEnv.get.mapOutputTracker.getPushBasedShuffleMapSizesByExecutorId( | |
handle.shuffleId, startMapIndex, endMapIndex, startPartition, endPartition) |
For a retried map stage, we don't push&merge shuffle blocks for tasks due to dep.shuffleMergeEnabled=false
. However, dep.shuffleMergeEnabled=false
should be the reason for the reduce stage to not fetch from the existing merged shuffle data in the previous attempt of the map stage. Right?
IIUC, for a retried map stage, it would go with below code path due to spark/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala Lines 138 to 140 in 31b6f61
Prior to this PR, getMapSizesByExecutorId would call getPushBasedShuffleMapSizesByExecutorId , which would fetch mergeOutputStatuses in getStatuses even for a retried map stage where push-based shuffle is disabled. This further causes enableBatchFetch=false in the convertMapStatuses , thus results in the assertion failure.
The proposed change tries to avoid this behavior - when |
That's what we have today. But I'm thinking the current behavior may be wrong. (Your fix based on the current behavior is correct). For example, assuming we have stage 1 and stage 2, and stage 2 hits fetch failure and leads to stage 1 rerun. |
@Ngone51 For the example you gave, namely: Here, we have two cases:
In all of these combinations, whether stage2/attemptN uses merged output or not must only be based on whether merged output (finalized) is available for it to use - based on parent stage. IMO we should add a new variable, say Thoughts @Victsm, @zhouyejoe, @otterc, @rmcyang, @venkata91 ? |
Ok this makes sense to me - current behavior indeed seems not a very efficient way to handle the scenario of stage retry when the parent stage is determinate and the parent stage has merge data, which should not be ignored by that condition. Given SPARK-37023 is designated to fix the enableBatchFetch assertion failure bug for current behavior, shall we file another spark jira and implement the improvement there? @Ngone51 @mridulm @Victsm @zhouyejoe @otterc @venkata91 |
That sounds fine to me, we can fix the immediate assertion issue (which will fix the problem of stage retry causing application failure when push based shuffle is enabled) and follow up with the fix to ensure we allow determinate stages to read on stage retry. |
sgtm. |
…led is false for a shuffleDependency during retry ### What changes were proposed in this pull request? At high level, created a helper method `getMapSizesByExecutorIdImpl` on which `getMapSizesByExecutorId` and `getPushBasedShuffleMapSizesByExecutorId` can rely. It takes a parameter `useMergeResult`, which helps to check if fetching merge result is needed or not, and pass it as `canFetchMergeResult` into `getStatuses`. ### Why are the changes needed? During some stage retry cases, the `shuffleDependency.shuffleMergeEnabled` can be set to false, but there will be `mergeStatus` since the Driver has already collected the merged status for its shuffle dependency. If this is the case, the current implementation would set the enableBatchFetch to false, since there are mergeStatus, to cause the assertion in `MapOutoutputTracker.getMapSizesByExecutorId` failed: ``` assert(mapSizesByExecutorId.enableBatchFetch == true) ``` The proposed fix helps resolve the issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed the existing UTs. Closes #34461 from rmcyang/SPARK-37023. Authored-by: Minchu Yang <minyang@minyang-mn3.linkedin.biz> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit f1532a2) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
Merged to master and branch-3.2 |
Filed SPARK-37313 for the issue that @Ngone51 raised. Thanks for the reviews! @mridulm @dongjoon-hyun @Ngone51 |
…led is false for a shuffleDependency during retry ### What changes were proposed in this pull request? At high level, created a helper method `getMapSizesByExecutorIdImpl` on which `getMapSizesByExecutorId` and `getPushBasedShuffleMapSizesByExecutorId` can rely. It takes a parameter `useMergeResult`, which helps to check if fetching merge result is needed or not, and pass it as `canFetchMergeResult` into `getStatuses`. ### Why are the changes needed? During some stage retry cases, the `shuffleDependency.shuffleMergeEnabled` can be set to false, but there will be `mergeStatus` since the Driver has already collected the merged status for its shuffle dependency. If this is the case, the current implementation would set the enableBatchFetch to false, since there are mergeStatus, to cause the assertion in `MapOutoutputTracker.getMapSizesByExecutorId` failed: ``` assert(mapSizesByExecutorId.enableBatchFetch == true) ``` The proposed fix helps resolve the issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed the existing UTs. Closes apache#34461 from rmcyang/SPARK-37023. Authored-by: Minchu Yang <minyang@minyang-mn3.linkedin.biz> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit f1532a2) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
…led is false for a shuffleDependency during retry ### What changes were proposed in this pull request? At high level, created a helper method `getMapSizesByExecutorIdImpl` on which `getMapSizesByExecutorId` and `getPushBasedShuffleMapSizesByExecutorId` can rely. It takes a parameter `useMergeResult`, which helps to check if fetching merge result is needed or not, and pass it as `canFetchMergeResult` into `getStatuses`. ### Why are the changes needed? During some stage retry cases, the `shuffleDependency.shuffleMergeEnabled` can be set to false, but there will be `mergeStatus` since the Driver has already collected the merged status for its shuffle dependency. If this is the case, the current implementation would set the enableBatchFetch to false, since there are mergeStatus, to cause the assertion in `MapOutoutputTracker.getMapSizesByExecutorId` failed: ``` assert(mapSizesByExecutorId.enableBatchFetch == true) ``` The proposed fix helps resolve the issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed the existing UTs. Closes apache#34461 from rmcyang/SPARK-37023. Authored-by: Minchu Yang <minyang@minyang-mn3.linkedin.biz> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit f1532a2) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
…led is false for a shuffleDependency during retry ### What changes were proposed in this pull request? At high level, created a helper method `getMapSizesByExecutorIdImpl` on which `getMapSizesByExecutorId` and `getPushBasedShuffleMapSizesByExecutorId` can rely. It takes a parameter `useMergeResult`, which helps to check if fetching merge result is needed or not, and pass it as `canFetchMergeResult` into `getStatuses`. ### Why are the changes needed? During some stage retry cases, the `shuffleDependency.shuffleMergeEnabled` can be set to false, but there will be `mergeStatus` since the Driver has already collected the merged status for its shuffle dependency. If this is the case, the current implementation would set the enableBatchFetch to false, since there are mergeStatus, to cause the assertion in `MapOutoutputTracker.getMapSizesByExecutorId` failed: ``` assert(mapSizesByExecutorId.enableBatchFetch == true) ``` The proposed fix helps resolve the issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed the existing UTs. Closes apache#34461 from rmcyang/SPARK-37023. Authored-by: Minchu Yang <minyang@minyang-mn3.linkedin.biz> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit f1532a2) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
…led is false for a shuffleDependency during retry ### What changes were proposed in this pull request? At high level, created a helper method `getMapSizesByExecutorIdImpl` on which `getMapSizesByExecutorId` and `getPushBasedShuffleMapSizesByExecutorId` can rely. It takes a parameter `useMergeResult`, which helps to check if fetching merge result is needed or not, and pass it as `canFetchMergeResult` into `getStatuses`. ### Why are the changes needed? During some stage retry cases, the `shuffleDependency.shuffleMergeEnabled` can be set to false, but there will be `mergeStatus` since the Driver has already collected the merged status for its shuffle dependency. If this is the case, the current implementation would set the enableBatchFetch to false, since there are mergeStatus, to cause the assertion in `MapOutoutputTracker.getMapSizesByExecutorId` failed: ``` assert(mapSizesByExecutorId.enableBatchFetch == true) ``` The proposed fix helps resolve the issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passed the existing UTs. Closes #34461 from rmcyang/SPARK-37023. Authored-by: Minchu Yang <minyang@minyang-mn3.linkedin.biz> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
What changes were proposed in this pull request?
At high level, created a helper method
getMapSizesByExecutorIdImpl
on whichgetMapSizesByExecutorId
andgetPushBasedShuffleMapSizesByExecutorId
can rely. It takes a parameteruseMergeResult
, which helps to check if fetching merge result is needed or not, and pass it ascanFetchMergeResult
intogetStatuses
.Why are the changes needed?
During some stage retry cases, the
shuffleDependency.shuffleMergeEnabled
can be set to false, but there will bemergeStatus
since the Driver has already collected the merged status for its shuffle dependency. If this is the case, the current implementation would set the enableBatchFetch to false, since there are mergeStatus, to cause the assertion inMapOutoutputTracker.getMapSizesByExecutorId
failed:The proposed fix helps resolve the issue.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Passed the existing UTs.