Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-17610][CORE][SCHEDULER]The failed stage caused by FetchFailed may never be resubmitted #15176

Closed
wants to merge 1 commit into from

Conversation

WangTaoTheTonic
Copy link
Contributor

What changes were proposed in this pull request?

The improper time order in handling failed task caused by FetchFailed may cause corresponding stage not being resubmitted ever. For details, see: https://issues.apache.org/jira/browse/SPARK-17610

How was this patch tested?

manual tests.

@SparkQA
Copy link

SparkQA commented Sep 21, 2016

Test build #65707 has finished for PR 15176 at commit 910fd71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@WangTaoTheTonic WangTaoTheTonic changed the title [SPARK-17610][CORE][SCHEDULER]The failed stage caused by FetchFailed may never be resubmitted [WIP][SPARK-17610][CORE][SCHEDULER]The failed stage caused by FetchFailed may never be resubmitted Sep 21, 2016
@WangTaoTheTonic
Copy link
Contributor Author

after second glimpse, I found this current behavior will not cause problems because dagScheduler.handletaskCompletion is triggerred by CompletionEvent. Dagscheduler handle events one by one, so the ResubmitFailedStages posted in dagScheduler.handleTaskCompletion will not be handled before the end of dagscheduler.handleTaskCompletion.

@WangTaoTheTonic
Copy link
Contributor Author

I will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants