-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-6735:[YARN] Adding properties to disable maximum number of executor failure's check or to make it relative to duration #5449
Conversation
…ble: to disable maximum executor failure check 2) spark.yarn.max.executor.failures.relative to make the maximum executor failure to be relative 3)spark.yarn.max.executor.failures.relative.window : specify relative window duration in sec , default being 600 sec
|
@@ -59,6 +59,10 @@ private[spark] class ApplicationMaster( | |||
private val maxNumExecutorFailures = sparkConf.getInt("spark.yarn.max.executor.failures", | |||
sparkConf.getInt("spark.yarn.max.worker.failures", math.max(args.numExecutors * 2, 3))) | |||
|
|||
// Disable the maximum executor failure check | |||
private val disableMaxExecutorFailureCheck = | |||
sparkConf.getBoolean("spark.yarn.max.executor.failures.disable", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a cost and weight to making a flag for everything, and I think this doesn't add value. Just set max to a high value to "disable" it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We could make special value to means disabled -> like 0 or -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, will treat maxNumExecutorFailures = -1 as disable state for this check.
Hi @srowen , Please review the changes. Thanks, |
Hi, Can somebody please review the change. Regards, On Fri, Apr 10, 2015 at 1:57 PM, UCB AMPLab notifications@github.com
|
@@ -59,6 +59,9 @@ private[spark] class ApplicationMaster( | |||
private val maxNumExecutorFailures = sparkConf.getInt("spark.yarn.max.executor.failures", | |||
sparkConf.getInt("spark.yarn.max.worker.failures", math.max(args.numExecutors * 2, 3))) | |||
|
|||
// Disable the maximum executor failure check | |||
private val disableMaxExecutorFailureCheck = if (maxNumExecutorFailures == -1) true else false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an extra space here. Also, the right side of this equation can just be maxNumExecutorFailures == -1
.
@twinkle-sachdeva do you have time to address at Sandy's comments |
Hi Tom, I will do that in one or two days. Thanks, On Mon, Jun 29, 2015 at 7:44 PM, Tom Graves notifications@github.com
|
Hi Tom, It will be good, if somebody else can take up. I am not doing well, it Sorry for the inconvenience caused. Regards, On Tue, Jun 30, 2015 at 9:41 AM, twinkle sachdeva <
|
Not a problem. Can you close this when you have a chance and I'll post comment to see if someone can take over. thanks. |
Can one of the admins verify this patch? |
@twinkle-sachdeva can you close this PR? |
For long running applications, user might want to disable this property or to make it relative to a duration window, so that some older failure does not cause Application to abort in long run.
Added properties 1) spark.yarn.max.executor.failures.disable: to disable maximum executor failure check 2) spark.yarn.max.executor.failures.relative: to make the maximum executor failure to be relative 3)spark.yarn.max.executor.failures.relative.window: specify relative window duration in sec , default being 600 sec