Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26941][YARN]Fix incorrect computation of maxNumExecutorFailures in ApplicationMaster for streaming #23845

Conversation

liupc
Copy link

@liupc liupc commented Feb 20, 2019

What changes were proposed in this pull request?

Currently, when enabled streaming dynamic allocation for streaming applications, the maxNumExecutorFailures in ApplicationMaster is still computed with spark.dynamicAllocation.maxExecutors.

Actually, we should consider spark.streaming.dynamicAllocation.maxExecutors instead.

Related codes:

How was this patch tested?

NA

Please review http://spark.apache.org/contributing.html before opening a pull request.

@liupc liupc changed the title [SPARK-26941]Fix incorrect computation of maxNumExecutorFailures for streaming [SPARK-26941][YARN]Fix incorrect computation of maxNumExecutorFailures in ApplicationMaster for streaming Feb 20, 2019
@@ -100,7 +100,9 @@ private[spark] class ApplicationMaster(

private val maxNumExecutorFailures = {
val effectiveNumExecutors =
if (Utils.isDynamicAllocationEnabled(sparkConf)) {
if (Utils.isStreamingDynamicAllocationEnabled(sparkConf)) {
sparkConf.get(STREAMING_DYN_ALLOCATION_MAX_EXECUTORS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other reviewers -- this is the fix itself.

@SparkQA
Copy link

SparkQA commented Feb 22, 2019

Test build #4565 has finished for PR 23845 at commit d5a5107.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 25, 2019

@liupc could you have a look at the build failure?

@SparkQA
Copy link

SparkQA commented Feb 26, 2019

Test build #4575 has finished for PR 23845 at commit 155f284.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liupc
Copy link
Author

liupc commented Mar 1, 2019

retest this please!

@liupc
Copy link
Author

liupc commented Mar 1, 2019

@srowen UT seems failed with some unrelated tests, can you help to retest this please?

@SparkQA
Copy link

SparkQA commented Mar 1, 2019

Test build #4583 has finished for PR 23845 at commit 155f284.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liupc
Copy link
Author

liupc commented Mar 5, 2019

retest this please

1 similar comment
@liupc
Copy link
Author

liupc commented Mar 6, 2019

retest this please

@liupc
Copy link
Author

liupc commented Mar 6, 2019

cc @HyukjinKwon @srowen

@SparkQA
Copy link

SparkQA commented Mar 6, 2019

Test build #4598 has started for PR 23845 at commit f825118.

@SparkQA
Copy link

SparkQA commented Mar 9, 2019

Test build #4603 has finished for PR 23845 at commit f825118.

  • This patch fails Java style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 11, 2019

Test build #4608 has finished for PR 23845 at commit f825118.

  • This patch fails Java style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 11, 2019

Hm, that's weird. The failure is not directly related, and I thought we fixed this in #23887 Not sure what's going on. dev/lint-java and dev/sbt-checkstyle pass locally for master too, and this isn't failing other PR builds. Maybe some odd cached artifacts on a Jenkins build server? no idea.

@SparkQA
Copy link

SparkQA commented Mar 11, 2019

Test build #4609 has started for PR 23845 at commit f825118.

@SparkQA
Copy link

SparkQA commented Mar 12, 2019

Test build #4612 has finished for PR 23845 at commit f825118.

  • This patch fails Java style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not familiar with streaming dynamic allocation, but I assume you can't enable both that and regular dynamic allocation together?

If for some reason that is allowed, then the fix should take the max of both values.

To fix the tests, probably needs a merge with master. Maybe github is confused.

@@ -332,6 +332,51 @@ package object config {
ConfigBuilder("spark.dynamicAllocation.sustainedSchedulerBacklogTimeout")
.fallbackConf(DYN_ALLOCATION_SCHEDULER_BACKLOG_TIMEOUT)

private[spark] val STREAMING_DYN_ALLOCATION_ENABLED =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add these to a new Streaming.scala instead? We shouldn't be adding more stuff to package.scala.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin

If for some reason that is allowed, then the fix should take the max of both values.

This is not allowed to enable both of them. This check is done in the scheduler/ExecutorAllocationManager

I agree to put these configs to Streaming.scala, it's more clear. I will update.

@liupc liupc force-pushed the Fix-incorrect-maxNumExecutorFailures-for-streaming branch from f825118 to 882bd35 Compare March 13, 2019 01:31
@vanzin
Copy link
Contributor

vanzin commented Mar 13, 2019

ok to test

@SparkQA
Copy link

SparkQA commented Mar 13, 2019

Test build #103459 has finished for PR 23845 at commit 2f46380.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 15, 2019

Test build #4625 has finished for PR 23845 at commit 2f46380.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 15, 2019

@liupc can you take a look at ...

[error] /home/jenkins/workspace/NewSparkPullRequestBuilder/core/src/main/scala/org/apache/spark/util/Utils.scala:2495: not found: value STREAMING_DYN_ALLOCATION_ENABLED
[error]     val streamingDynamicAllocationEnabled = conf.get(STREAMING_DYN_ALLOCATION_ENABLED)
[error]                

@SparkQA
Copy link

SparkQA commented Mar 16, 2019

Test build #103570 has finished for PR 23845 at commit a46d4d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 17, 2019

merged to master

@srowen srowen closed this in cad475d Mar 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants