Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30987][Core]Increase the timeout on local-cluster waitUntilExecutorsUp calls #27738

Closed
wants to merge 1 commit into from

Conversation

tgravescs
Copy link
Contributor

What changes were proposed in this pull request?

The ResourceDiscoveryPlugin tests intermittently timeout. They are timing out on just bringing up the local-cluster. I am not able to reproduce locally. I suspect the jenkins boxes are overloaded and taking longer then 10 seconds. There was another jira SPARK-29139 that increased timeout for some other of these as well. So try increasing the timeout to 60 seconds.

Examples of timeouts:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119030/testReport/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119005/testReport/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119029/testReport/

Why are the changes needed?

tests should no longer intermittently fail.

Does this PR introduce any user-facing change?

no

How was this patch tested?

unit tests ran.

@tgravescs
Copy link
Contributor Author

@Ngone51

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I saw the flaky test results as well.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this!

@SparkQA
Copy link

SparkQA commented Feb 28, 2020

Test build #119097 has finished for PR 27738 at commit 3fa43d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

jiangxb1987 pushed a commit that referenced this pull request Feb 28, 2020
…ecutorsUp calls

### What changes were proposed in this pull request?

The ResourceDiscoveryPlugin tests intermittently timeout. They are timing out on just bringing up the local-cluster. I am not able to reproduce locally.  I suspect the jenkins boxes are overloaded and taking longer then 10 seconds. There was another jira SPARK-29139 that increased timeout for some other of these as well. So try increasing the timeout to 60 seconds.

Examples of timeouts:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119030/testReport/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119005/testReport/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119029/testReport/

### Why are the changes needed?

tests should no longer intermittently fail.

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

unit tests ran.

Closes #27738 from tgravescs/SPARK-30987.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
(cherry picked from commit 6c0c41f)
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
@jiangxb1987
Copy link
Contributor

Thanks, merged to master/3.0 !

@dongjoon-hyun
Copy link
Member

+1, late LGTM. Thank you all for stabilizing branch-3.0.

@Ngone51
Copy link
Member

Ngone51 commented Feb 29, 2020

thanks, LGTM.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ecutorsUp calls

### What changes were proposed in this pull request?

The ResourceDiscoveryPlugin tests intermittently timeout. They are timing out on just bringing up the local-cluster. I am not able to reproduce locally.  I suspect the jenkins boxes are overloaded and taking longer then 10 seconds. There was another jira SPARK-29139 that increased timeout for some other of these as well. So try increasing the timeout to 60 seconds.

Examples of timeouts:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119030/testReport/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119005/testReport/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119029/testReport/

### Why are the changes needed?

tests should no longer intermittently fail.

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

unit tests ran.

Closes apache#27738 from tgravescs/SPARK-30987.

Authored-by: Thomas Graves <tgraves@nvidia.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants