Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26269][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource #23223

Conversation

Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Dec 5, 2018

What changes were proposed in this pull request?

As I mentioned in jira SPARK-26269, in order to maxmize the use of cluster resource, this pr try to make YarnAllocator have the same blacklist behaviour with YARN.

How was this patch tested?

Added.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 5, 2018

ping @attilapiros @vanzin @jerryshao for kindly review.

@Ngone51 Ngone51 changed the title Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource [SPARK-2629][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource Dec 5, 2018
@Ngone51 Ngone51 changed the title [SPARK-2629][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource [SPARK-26269][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource Dec 5, 2018
Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ngone51 I read your change but I would like to understand whether the intention of this change is right.

I mean if node blacklisting in Spark would be perfectly aligned to YARN then it would be just redundant to have it in Spark in the first place. Take for example disk failure. According to design of Spark blacklisting:
"Resources are blacklisted even if they are only flaky (not just when they are completely
unusable), eg. because of one bad disk out of many."
This could be only intended design for task level backlisting.
This is why I am a bit hesitant so it would be nice to support it with some concrete case where this really helps and needed.

Ping @squito, @tgravescs

@@ -612,11 +612,14 @@ private[yarn] class YarnAllocator(
val message = "Container killed by YARN for exceeding physical memory limits. " +
s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
(true, message)
case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) =>
(true, "Container marked as failed: " + containerId + onHostStr +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use string interpolation.

Copy link

@seregasheypak seregasheypak Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concrete case where this really helps and needed.

There is 1K nodes cluster and jobs have performance degradation because of a single node. It's rather hard to convince Cluster Ops to decommission node because of "performance degradation". Imagine 10 dev teams chase single ops team for valid reason (node has problems) or because code has a bug or data is skewed or spots on the sun.

Simple solution:

  • rerun failed / delayed job and blacklist "problematic" node.
  • Report about the problem if job works w/o anomalies

Results

  • Ops are not spammed with a weird requests from devs
  • devs are not blocked because of really bad node.

@@ -612,11 +612,14 @@ private[yarn] class YarnAllocator(
val message = "Container killed by YARN for exceeding physical memory limits. " +
s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
(true, message)
case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to have it as a separate case but just a new if around handleResourceAllocationFailure and as NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS is introduced it would make sense to separate it from the huge match on exitStatus. This way it would be easier to follow when it is really triggered (one should not check all the previous case branches then consider this condition with contains). That way values like ContainerExitStatus.SUCCESS from the set would be really used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree this should be cleaned up we already handle cases above that are in the NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also after this gets rearranged, I'd leave a comment in here pointing to the code in hadoop you linked to on the jira.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. But I'm not sure about:

That way values like ContainerExitStatus.SUCCESS from the set would be really used.

this part. @attilapiros

val containerId = ContainerId.newContainerId(appAttemptId, containerNum)
def createContainer(
host: String,
containerId: ContainerId = ContainerId.newContainerId(appAttemptId, containerNum),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just containerNumber as parameter with default value of containerNum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@tgravescs
Copy link
Contributor

ok to test

@tgravescs
Copy link
Contributor

the approach here makes sense. Are you seeing actual issues with this blacklisting when it shouldn't? I could see that possible there and if so we should move this to defect and make sure it goes into 2.4.1

@SparkQA
Copy link

SparkQA commented Dec 5, 2018

Test build #99733 has finished for PR 23223 at commit 65a70dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like @tgravescs said, I'm curious if you have cases where this is going wrong currently, and if so probably should be a bug fix targeted at 2.4.1 as well.

val cs7 = ContainerStatus.newInstance(containers(7).getId, ContainerState.COMPLETE,
"aborted", ContainerExitStatus.ABORTED)
val cs8 = ContainerStatus.newInstance(containers(8).getId, ContainerState.COMPLETE,
"disk_failed", ContainerExitStatus.DISKS_FAILED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion, you can avoid some repetition here

val nonBlacklistedStatuses = Seq(ContainerExitStatus.SUCCESSS, ..., ContainerExitStatus.DISKS_FAILED)
val containerStatuses = nonBlacklistedStatus.zipWithIndex.map { case (state, idx) =>
  ContainerStatus.newInstance(containers(idx).getId, ContainerState.COMPLETE, "diagnostics", state)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion!

@@ -612,11 +612,14 @@ private[yarn] class YarnAllocator(
val message = "Container killed by YARN for exceeding physical memory limits. " +
s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
(true, message)
case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also after this gets rearranged, I'd leave a comment in here pointing to the code in hadoop you linked to on the jira.

@squito
Copy link
Contributor

squito commented Dec 5, 2018

@attilapiros

I mean if node blacklisting in Spark would be perfectly aligned to YARN then it would be just redundant to have it in Spark in the first place.

I'm not super familiar with exactly how the blacklisting works in yarn itself -- it looks like its only going to blacklist the node for the AM, not other nodes for general containers. I don't totally follow where the KILLED_BY_RESOURCEMANAGER status is generated, but it does seem like a good idea to protect against this, maybe there is a race where the container is created by the RM, but before it reports back to the driver it gets killed with KILLED_BY_RESOURCEMANAGER. (another reason I'm curious if @Ngone51 is actually seeing this cause problems, or just noticed a case to improve.)

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 6, 2018

I mean if node blacklisting in Spark would be perfectly aligned to YARN then it would be just redundant to have it in Spark in the first place.

This change seems result in perfectly aligned to YARN for node blacklisting in Spark, but my original thought is that some exit status (e.g. KILLED_BY_RESOURCEMANAGER ), currently, should not lead to a node blacklisting. So, actually, perfectly aligned to YARN is not the real target of this change, and we can also make some custom strategy for Spark.

Take for example disk failure.

For spark task level backlisting, is it should be delegated to schedulerBlacklist in YarnAllocatorBlacklistTracker ?

And it seems ContainerExitStatus.DISKS_FAILED in YARN is not same with Spark tasks' disk failure.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 6, 2018

Are you seeing actual issues with this blacklisting when it shouldn't?

Unfortunately, no. @tgravescs @squito

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 6, 2018

it looks like its only going to blacklist the node for the AM, not other nodes for general containers.

@squito Yarn have blacklist for AM when config am-scheduling.node-blacklisting-enabled=true, and have ContainerFailureTracker for general containers(haven't find a config for it).

@SparkQA
Copy link

SparkQA commented Dec 6, 2018

Test build #99757 has finished for PR 23223 at commit 2d1c27a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

if you aren't seeing actual issues with this I guess it would be interesting to test it further to see if it does. I can see spark blacklisting when it shouldn't for exit codes like you mention (KILLED_BY_RESOURCEMANAGER) . so I guess I would like to see someone test this further and determine if that happens. If it does we should change to bug and put into 2.4.1.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 6, 2018

it would be interesting to test it further to see if it does.

@tgravescs Yeah, I have the same thought. I'd like to try it, but I can not guarantee that I can achieve it... Because I have never done this kind of test before. I'll try my best.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 10, 2018

Hi @tgravescs , I tried it, but found it's difficult to produce KILLED_BY_RESOURCEMANAGER exit status. I followed YARN-73 YARN-495, but things didn't go as I expected.

@tgravescs
Copy link
Contributor

ok thanks for trying. if I get a chance I can try later in the week, but that doesn't have to block this now if someone else has time to review before I get to it. We can always pull it back later.

@tgravescs
Copy link
Contributor

sorry for the delay on this, will try it out today

// oop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/ap
// ache/hadoop/yarn/util/Apps.java#L273 for details)
if (NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) {
(true, s"Container marked as failed: $containerId$onHostStr" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to return false here for exitCausedByApp since these don't seem to be issues with the App. From the comment in the yarn code: // Neither the app's fault nor the system's fault. This happens by design,
// so no need for skipping nodes

If we mark it as true then it counts against our task failure, if its false it doesn't seems like these shouldn't count against our failures.

Copy link
Member Author

@Ngone51 Ngone51 Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

// ache/hadoop/yarn/util/Apps.java#L273 for details)
if (NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) {
(true, s"Container marked as failed: $containerId$onHostStr" +
s". Exit status: ${completedContainer.getExitStatus}" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra space after :

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.

// completed container from a bad node
allocatorBlacklistTracker.handleResourceAllocationFailure(hostOpt)
(true, s"Container from a bad node: $containerId$onHostStr" +
s". Exit status: ${completedContainer.getExitStatus}" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra space after :

@tgravescs
Copy link
Contributor

haven't been able to easily reproduce, will try again today

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 20, 2018

Thank you for your efforts. @tgravescs

@SparkQA
Copy link

SparkQA commented Dec 20, 2018

Test build #100336 has finished for PR 23223 at commit da8442e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I haven't had time to try further to reproduce but changes make sense

@asfgit asfgit closed this in d6a5f85 Dec 21, 2018
@tgravescs
Copy link
Contributor

merged to master, going to merge into if its a clean merge

@tgravescs
Copy link
Contributor

this doesn't cleanly merge to 2.4, @Ngone51 would you be able to put up a pr again branch 2.4?

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 22, 2018

sure, thanks a lot. @tgravescs

@dongjoon-hyun
Copy link
Member

Hi, @tgravescs .
SPARK-26269 is marked as 'Improvement'.
So, we are going to backport this improvement patch to branch-2.4?

@tgravescs
Copy link
Contributor

thanks for pointing that out, I forgot to change the type, will fix iut now. I consider this a defect as we would blacklist thing when we shouldn't.

@dongjoon-hyun
Copy link
Member

+1. Thanks!

holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
…r with yarn to maxmize use of cluster resource

## What changes were proposed in this pull request?

As I mentioned in jira [SPARK-26269](https://issues.apache.org/jira/browse/SPARK-26269), in order to maxmize the use of cluster resource,  this pr try to make `YarnAllocator` have the same blacklist behaviour with YARN.

## How was this patch tested?

Added.

Closes apache#23223 from Ngone51/dev-YarnAllocator-should-have-same-blacklist-behaviour-with-YARN.

Lead-authored-by: wuyi <ngone_5451@163.com>
Co-authored-by: Ngone51 <ngone_5451@163.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…r with yarn to maxmize use of cluster resource

## What changes were proposed in this pull request?

As I mentioned in jira [SPARK-26269](https://issues.apache.org/jira/browse/SPARK-26269), in order to maxmize the use of cluster resource,  this pr try to make `YarnAllocator` have the same blacklist behaviour with YARN.

## How was this patch tested?

Added.

Closes apache#23223 from Ngone51/dev-YarnAllocator-should-have-same-blacklist-behaviour-with-YARN.

Lead-authored-by: wuyi <ngone_5451@163.com>
Co-authored-by: Ngone51 <ngone_5451@163.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants