Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10058][Core][Tests]Fix the flaky tests in HeartbeatReceiverSuite #8946

Closed
wants to merge 2 commits into from
Closed

[SPARK-10058][Core][Tests]Fix the flaky tests in HeartbeatReceiverSuite #8946

wants to merge 2 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Sep 30, 2015

Fixed the test failure here: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/

This failure is because HeartbeatReceiverSuite. heartbeatReceiver may receive SparkListenerExecutorAdded("driver") sent from LocalBackend.

There are other race conditions in HeartbeatReceiverSuite because HeartbeatReceiver.onExecutorAdded and HeartbeatReceiver.onExecutorRemoved are asynchronous. This PR also fixed them.

@zsxwing
Copy link
Member Author

zsxwing commented Sep 30, 2015

/cc @andrewor14

@SparkQA
Copy link

SparkQA commented Sep 30, 2015

Test build #43127 has finished for PR 8946 at commit 158e178.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -61,7 +62,12 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
this(sc, new SystemClock)
}

sc.addSparkListener(this)
if (clock.isInstanceOf[SystemClock]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a weird check; I'd favor either some explicit "testMode" boolean parameter to the constructor, or, my preferred, making the test ignore executors that match SparkContext.DRIVER_IDENTIFIER.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great. I will update it.

@zsxwing zsxwing changed the title [SPARK-10800][Core][Tests]Fix the flaky tests in HeartbeatReceiverSuite [SPARK-10058][Core][Tests]Fix the flaky tests in HeartbeatReceiverSuite Sep 30, 2015
@SparkQA
Copy link

SparkQA commented Sep 30, 2015

Test build #43134 has finished for PR 8946 at commit 0debc46.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 1, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Oct 1, 2015

Test build #43143 has finished for PR 8946 at commit 0debc46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Oct 1, 2015

LGTM.

asfgit pushed a commit that referenced this pull request Oct 1, 2015
…Suite

Fixed the test failure here: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/

This failure is because `HeartbeatReceiverSuite. heartbeatReceiver` may receive `SparkListenerExecutorAdded("driver")` sent from [LocalBackend](https://github.com/apache/spark/blob/8fb3a65cbb714120d612e58ef9d12b0521a83260/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala#L121).

There are other race conditions in `HeartbeatReceiverSuite` because `HeartbeatReceiver.onExecutorAdded` and `HeartbeatReceiver.onExecutorRemoved` are asynchronous. This PR also fixed them.

Author: zsxwing <zsxwing@gmail.com>

Closes #8946 from zsxwing/SPARK-10058.

(cherry picked from commit 9b3e776)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin
Copy link
Contributor

vanzin commented Oct 1, 2015

Merged to master and branch-1.5, thanks!

@asfgit asfgit closed this in 9b3e776 Oct 1, 2015
@zsxwing zsxwing deleted the SPARK-10058 branch October 1, 2015 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants