Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented May 9, 2017

The DAGScheduler was sending a "stage submitted" event before it properly
updated the event's information. This meant that a listener (e.g. the
even logging listener) could record wrong information about the event.

This change sets the stage's submission time before the event is submitted,
when there are tasks to be executed in the stage.

Tested with existing unit tests.

@vanzin
Copy link
Contributor Author

vanzin commented May 9, 2017

@squito

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76697 has finished for PR 17925 at commit 0d8f717.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just put this before the old listenerBus.post() on L991?

this change also changes the behavior if there is a an exception while creating the tasks -- you no longer post a SparkListenerStageSubmitted.

I don't have any particular reason why you'd want the behavior one way or the other, but w/out an argument for actually changing the behavior, I'd rather do the minimal change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, had not noticed the change in behavior.
Since behavior of SparkListenerStageSubmitted is unfortunately not documented, I agree that perhaps we should not change the semantics here (I am curious if it actually impacts in reality, but good to err on side of caution).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I made a mental note to double-check this later, and neither option will really work because of this code in JobProgressListener:

          if (stageInfo.submissionTime.isEmpty) {
            // if this stage is pending, it won't complete, so mark it as "skipped":
            skippedStages += stageInfo

So the change in semantics I'm introducing is actually wrong, and I'll have to avoid it.

The DAGScheduler was sending a "stage submitted" event before it properly
updated the event's information. This meant that a listener (e.g. the
even logging listener) could record wrong information about the event.

This change sets the stage's submission time before the event is submitted,
when there are tasks to be executed in the stage.
@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76819 has finished for PR 17925 at commit f1ca990.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented May 11, 2017

Test failure is because of SPARK-20666. All core tests passed.

@vanzin
Copy link
Contributor Author

vanzin commented May 15, 2017

retest this please

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76947 has finished for PR 17925 at commit f1ca990.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented May 16, 2017

Ping

@vanzin
Copy link
Contributor Author

vanzin commented May 24, 2017

Given the silence I assume no more feedback. Merging to master.

@asfgit asfgit closed this in 95aef66 May 25, 2017
@vanzin vanzin deleted the SPARK-20205 branch May 25, 2017 00:06
@squito
Copy link
Contributor

squito commented May 30, 2017

sorry didn't review this earlier, but in any case, lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants