New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26941][YARN]Fix incorrect computation of maxNumExecutorFailures in ApplicationMaster for streaming #23845
[SPARK-26941][YARN]Fix incorrect computation of maxNumExecutorFailures in ApplicationMaster for streaming #23845
Conversation
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
@@ -100,7 +100,9 @@ private[spark] class ApplicationMaster( | |||
|
|||
private val maxNumExecutorFailures = { | |||
val effectiveNumExecutors = | |||
if (Utils.isDynamicAllocationEnabled(sparkConf)) { | |||
if (Utils.isStreamingDynamicAllocationEnabled(sparkConf)) { | |||
sparkConf.get(STREAMING_DYN_ALLOCATION_MAX_EXECUTORS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other reviewers -- this is the fix itself.
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
Test build #4565 has finished for PR 23845 at commit
|
@liupc could you have a look at the build failure? |
Test build #4575 has finished for PR 23845 at commit
|
retest this please! |
@srowen UT seems failed with some unrelated tests, can you help to retest this please? |
Test build #4583 has finished for PR 23845 at commit
|
retest this please |
1 similar comment
retest this please |
Test build #4598 has started for PR 23845 at commit |
Test build #4603 has finished for PR 23845 at commit
|
Test build #4608 has finished for PR 23845 at commit
|
Hm, that's weird. The failure is not directly related, and I thought we fixed this in #23887 Not sure what's going on. |
Test build #4609 has started for PR 23845 at commit |
Test build #4612 has finished for PR 23845 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not familiar with streaming dynamic allocation, but I assume you can't enable both that and regular dynamic allocation together?
If for some reason that is allowed, then the fix should take the max
of both values.
To fix the tests, probably needs a merge with master. Maybe github is confused.
@@ -332,6 +332,51 @@ package object config { | |||
ConfigBuilder("spark.dynamicAllocation.sustainedSchedulerBacklogTimeout") | |||
.fallbackConf(DYN_ALLOCATION_SCHEDULER_BACKLOG_TIMEOUT) | |||
|
|||
private[spark] val STREAMING_DYN_ALLOCATION_ENABLED = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add these to a new Streaming.scala
instead? We shouldn't be adding more stuff to package.scala
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If for some reason that is allowed, then the fix should take the max of both values.
This is not allowed to enable both of them. This check is done in the scheduler/ExecutorAllocationManager
spark/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala
Line 211 in 7beb464
throw new IllegalArgumentException( |
I agree to put these configs to Streaming.scala
, it's more clear. I will update.
f825118
to
882bd35
Compare
ok to test |
Test build #103459 has finished for PR 23845 at commit
|
Test build #4625 has finished for PR 23845 at commit
|
@liupc can you take a look at ...
|
Test build #103570 has finished for PR 23845 at commit
|
merged to master |
What changes were proposed in this pull request?
Currently, when enabled streaming dynamic allocation for streaming applications, the maxNumExecutorFailures in ApplicationMaster is still computed with
spark.dynamicAllocation.maxExecutors
.Actually, we should consider
spark.streaming.dynamicAllocation.maxExecutors
instead.Related codes:
spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
Line 101 in f87153a
How was this patch tested?
NA
Please review http://spark.apache.org/contributing.html before opening a pull request.