Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1608] feat(spark3): Ensure the compatiblity of reassign and stageRetry #1783

Merged
merged 5 commits into from
Jun 19, 2024

Conversation

zuston
Copy link
Member

@zuston zuston commented Jun 13, 2024

What changes were proposed in this pull request?

Ensure the compatiblity of reassign and stageRetry.

Why are the changes needed?

To improve the job stability if having reassign and stage retry.
For #1608

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests

@@ -638,7 +638,7 @@ public ShuffleHandleInfo getShuffleHandleInfoByShuffleId(int shuffleId) {
@Override
public int getMaxFetchFailures() {
final String TASK_MAX_FAILURE = "spark.task.maxFailures";
return Math.max(1, sparkConf.getInt(TASK_MAX_FAILURE, 4) - 1);
return Math.max(0, sparkConf.getInt(TASK_MAX_FAILURE, 4) - 1);
Copy link
Member Author

@zuston zuston Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original 1 is wrong if the spark.task.max.failure=1 that won't trigger the stage retry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this into the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rethinking this part code, I think this need to be optimized in the next PRs. the detailed problems is shown in #1798 #1801 . So the comment may be not necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link

github-actions bot commented Jun 13, 2024

Test Results

 2 441 files  +7   2 441 suites  +7   5h 2m 59s ⏱️ + 2m 5s
   938 tests +1     937 ✅ +1   1 💤 ±0  0 ❌ ±0 
10 864 runs  +7  10 850 ✅ +7  14 💤 ±0  0 ❌ ±0 

Results for commit 9ebdc4e. ± Comparison against base commit 3d0c91a.

♻️ This comment has been updated with latest results.

@zuston zuston requested a review from jerqi June 14, 2024 02:33
@zuston
Copy link
Member Author

zuston commented Jun 17, 2024

ping @jerqi

@zuston zuston requested a review from rickyma June 19, 2024 01:57
@zuston
Copy link
Member Author

zuston commented Jun 19, 2024

cc @rickyma

Copy link
Contributor

@rickyma rickyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

@@ -638,7 +638,7 @@ public ShuffleHandleInfo getShuffleHandleInfoByShuffleId(int shuffleId) {
@Override
public int getMaxFetchFailures() {
final String TASK_MAX_FAILURE = "spark.task.maxFailures";
return Math.max(1, sparkConf.getInt(TASK_MAX_FAILURE, 4) - 1);
return Math.max(0, sparkConf.getInt(TASK_MAX_FAILURE, 4) - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this into the comment?

@zuston zuston merged commit 38dcff1 into apache:master Jun 19, 2024
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants