Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13889][YARN] Fix integer overflow when calculating the max number of executor failure #11713

Closed
wants to merge 2 commits into from

Conversation

carsonwang
Copy link
Contributor

What changes were proposed in this pull request?

The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. So this causes an integer overflow and a wrong result. The calculated value of the default max number of executor failure is 3. This PR adds a check to avoid the overflow.

How was this patch tested?

It tests if the value is greater that Int.MaxValue / 2 to avoid the overflow when it multiplies 2.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53162 has finished for PR 11713 at commit 8469148.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -73,7 +73,8 @@ private[spark] class ApplicationMaster(
} else {
sparkConf.get(EXECUTOR_INSTANCES).getOrElse(0)
}
val defaultMaxNumExecutorFailures = math.max(3, 2 * effectiveNumExecutors)
val defaultMaxNumExecutorFailures = math.max(3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here? Says effectiveNumExecutors is Int.MaxValue with dynamic allocation enabled by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment for it.

@chenghao-intel
Copy link
Contributor

BTW, @carsonwang can you also describe without this change, what would happen to those applications with dynamic allocation enabled? This will helps people to understand the impact of this bug fixing.

@carsonwang
Copy link
Contributor Author

Without this patch, the application with dynamic allocation enabled will fail when only 3 executors are lost.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53260 has finished for PR 11713 at commit 1b4f1b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor

cc @rxin @JoshRosen

@rxin
Copy link
Contributor

rxin commented Mar 16, 2016

cc @andrewor14

@sarutak
Copy link
Member

sarutak commented Mar 16, 2016

LGTM.

@asfgit asfgit closed this in 496d2a2 Mar 16, 2016
@srowen
Copy link
Member

srowen commented Mar 16, 2016

Merged to master. It looks like this was not a problem in 1.6 because the default was not Int.MaxValue at that point.

@carsonwang
Copy link
Contributor Author

Thanks @srowen . There is no integer overflow in 1.6 but the max number of executor failure is also 3 if dynamic allocation is enabled. It should use Int.MaxValue as the default as the doc and other code use. Do you want me to submit a fix to 1.6?

@srowen
Copy link
Member

srowen commented Mar 17, 2016

OK if it's a different change for the same issue, and therefore not a question of cherry-picking, go ahead and make the change you think needs to happen for 1.6 and open a PR

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…ber of executor failure

## What changes were proposed in this pull request?
The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. So this causes an integer overflow and a wrong result. The calculated value of the default max number of executor failure is 3. This PR adds a check to avoid the overflow.

## How was this patch tested?
It tests if the value is greater that Int.MaxValue / 2 to avoid the overflow when it multiplies 2.

Author: Carson Wang <carson.wang@intel.com>

Closes apache#11713 from carsonwang/IntOverflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants