-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13889][YARN] Fix integer overflow when calculating the max number of executor failure #11713
Conversation
Test build #53162 has finished for PR 11713 at commit
|
@@ -73,7 +73,8 @@ private[spark] class ApplicationMaster( | |||
} else { | |||
sparkConf.get(EXECUTOR_INSTANCES).getOrElse(0) | |||
} | |||
val defaultMaxNumExecutorFailures = math.max(3, 2 * effectiveNumExecutors) | |||
val defaultMaxNumExecutorFailures = math.max(3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here? Says effectiveNumExecutors
is Int.MaxValue
with dynamic allocation enabled by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment for it.
BTW, @carsonwang can you also describe without this change, what would happen to those applications with dynamic allocation enabled? This will helps people to understand the impact of this bug fixing. |
Without this patch, the application with dynamic allocation enabled will fail when only 3 executors are lost. |
Test build #53260 has finished for PR 11713 at commit
|
cc @rxin @JoshRosen |
cc @andrewor14 |
LGTM. |
Merged to master. It looks like this was not a problem in 1.6 because the default was not Int.MaxValue at that point. |
Thanks @srowen . There is no integer overflow in 1.6 but the max number of executor failure is also 3 if dynamic allocation is enabled. It should use Int.MaxValue as the default as the doc and other code use. Do you want me to submit a fix to 1.6? |
OK if it's a different change for the same issue, and therefore not a question of cherry-picking, go ahead and make the change you think needs to happen for 1.6 and open a PR |
…ber of executor failure ## What changes were proposed in this pull request? The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. So this causes an integer overflow and a wrong result. The calculated value of the default max number of executor failure is 3. This PR adds a check to avoid the overflow. ## How was this patch tested? It tests if the value is greater that Int.MaxValue / 2 to avoid the overflow when it multiplies 2. Author: Carson Wang <carson.wang@intel.com> Closes apache#11713 from carsonwang/IntOverflow.
What changes were proposed in this pull request?
The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. So this causes an integer overflow and a wrong result. The calculated value of the default max number of executor failure is 3. This PR adds a check to avoid the overflow.
How was this patch tested?
It tests if the value is greater that Int.MaxValue / 2 to avoid the overflow when it multiplies 2.