New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30987][Core]Increase the timeout on local-cluster waitUntilExecutorsUp calls #27738
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I saw the flaky test results as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing this!
Test build #119097 has finished for PR 27738 at commit
|
…ecutorsUp calls ### What changes were proposed in this pull request? The ResourceDiscoveryPlugin tests intermittently timeout. They are timing out on just bringing up the local-cluster. I am not able to reproduce locally. I suspect the jenkins boxes are overloaded and taking longer then 10 seconds. There was another jira SPARK-29139 that increased timeout for some other of these as well. So try increasing the timeout to 60 seconds. Examples of timeouts: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119030/testReport/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119005/testReport/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119029/testReport/ ### Why are the changes needed? tests should no longer intermittently fail. ### Does this PR introduce any user-facing change? no ### How was this patch tested? unit tests ran. Closes #27738 from tgravescs/SPARK-30987. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com> (cherry picked from commit 6c0c41f) Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
Thanks, merged to master/3.0 ! |
+1, late LGTM. Thank you all for stabilizing |
thanks, LGTM. |
…ecutorsUp calls ### What changes were proposed in this pull request? The ResourceDiscoveryPlugin tests intermittently timeout. They are timing out on just bringing up the local-cluster. I am not able to reproduce locally. I suspect the jenkins boxes are overloaded and taking longer then 10 seconds. There was another jira SPARK-29139 that increased timeout for some other of these as well. So try increasing the timeout to 60 seconds. Examples of timeouts: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119030/testReport/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119005/testReport/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119029/testReport/ ### Why are the changes needed? tests should no longer intermittently fail. ### Does this PR introduce any user-facing change? no ### How was this patch tested? unit tests ran. Closes apache#27738 from tgravescs/SPARK-30987. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
What changes were proposed in this pull request?
The ResourceDiscoveryPlugin tests intermittently timeout. They are timing out on just bringing up the local-cluster. I am not able to reproduce locally. I suspect the jenkins boxes are overloaded and taking longer then 10 seconds. There was another jira SPARK-29139 that increased timeout for some other of these as well. So try increasing the timeout to 60 seconds.
Examples of timeouts:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119030/testReport/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119005/testReport/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119029/testReport/
Why are the changes needed?
tests should no longer intermittently fail.
Does this PR introduce any user-facing change?
no
How was this patch tested?
unit tests ran.