-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20205][core] Make sure StageInfo is updated before sending event. #17925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #76697 has finished for PR 17925 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just put this before the old listenerBus.post() on L991?
this change also changes the behavior if there is a an exception while creating the tasks -- you no longer post a SparkListenerStageSubmitted.
I don't have any particular reason why you'd want the behavior one way or the other, but w/out an argument for actually changing the behavior, I'd rather do the minimal change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, had not noticed the change in behavior.
Since behavior of SparkListenerStageSubmitted is unfortunately not documented, I agree that perhaps we should not change the semantics here (I am curious if it actually impacts in reality, but good to err on side of caution).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I made a mental note to double-check this later, and neither option will really work because of this code in JobProgressListener:
if (stageInfo.submissionTime.isEmpty) {
// if this stage is pending, it won't complete, so mark it as "skipped":
skippedStages += stageInfo
So the change in semantics I'm introducing is actually wrong, and I'll have to avoid it.
The DAGScheduler was sending a "stage submitted" event before it properly updated the event's information. This meant that a listener (e.g. the even logging listener) could record wrong information about the event. This change sets the stage's submission time before the event is submitted, when there are tasks to be executed in the stage.
|
Test build #76819 has finished for PR 17925 at commit
|
|
Test failure is because of SPARK-20666. All core tests passed. |
|
retest this please |
|
Test build #76947 has finished for PR 17925 at commit
|
|
Ping |
|
Given the silence I assume no more feedback. Merging to master. |
|
sorry didn't review this earlier, but in any case, lgtm |
The DAGScheduler was sending a "stage submitted" event before it properly
updated the event's information. This meant that a listener (e.g. the
even logging listener) could record wrong information about the event.
This change sets the stage's submission time before the event is submitted,
when there are tasks to be executed in the stage.
Tested with existing unit tests.