[SPARK-26269][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource #23223

Ngone51 · 2018-12-05T06:17:50Z

What changes were proposed in this pull request?

As I mentioned in jira SPARK-26269, in order to maxmize the use of cluster resource, this pr try to make YarnAllocator have the same blacklist behaviour with YARN.

How was this patch tested?

Added.

Ngone51 · 2018-12-05T06:19:39Z

ping @attilapiros @vanzin @jerryshao for kindly review.

attilapiros

@Ngone51 I read your change but I would like to understand whether the intention of this change is right.

I mean if node blacklisting in Spark would be perfectly aligned to YARN then it would be just redundant to have it in Spark in the first place. Take for example disk failure. According to design of Spark blacklisting:
"Resources are blacklisted even if they are only flaky (not just when they are completely
unusable), eg. because of one bad disk out of many."
This could be only intended design for task level backlisting.
This is why I am a bit hesitant so it would be nice to support it with some concrete case where this really helps and needed.

Ping @squito, @tgravescs

attilapiros · 2018-12-05T12:55:25Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

@@ -612,11 +612,14 @@ private[yarn] class YarnAllocator(
            val message = "Container killed by YARN for exceeding physical memory limits. " +
              s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
            (true, message)
+          case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) =>
+            (true, "Container marked as failed: " + containerId + onHostStr +


Nit: Use string interpolation.

concrete case where this really helps and needed.

There is 1K nodes cluster and jobs have performance degradation because of a single node. It's rather hard to convince Cluster Ops to decommission node because of "performance degradation". Imagine 10 dev teams chase single ops team for valid reason (node has problems) or because code has a bug or data is skewed or spots on the sun.

Simple solution:

rerun failed / delayed job and blacklist "problematic" node.

Report about the problem if job works w/o anomalies

Results

Ops are not spammed with a weird requests from devs

devs are not blocked because of really bad node.

attilapiros · 2018-12-05T13:19:09Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

@@ -612,11 +612,14 @@ private[yarn] class YarnAllocator(
            val message = "Container killed by YARN for exceeding physical memory limits. " +
              s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
            (true, message)
+          case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) =>


I would prefer not to have it as a separate case but just a new if around handleResourceAllocationFailure and as NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS is introduced it would make sense to separate it from the huge match on exitStatus. This way it would be easier to follow when it is really triggered (one should not check all the previous case branches then consider this condition with contains). That way values like ContainerExitStatus.SUCCESS from the set would be really used.

yeah I agree this should be cleaned up we already handle cases above that are in the NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS set.

also after this gets rearranged, I'd leave a comment in here pointing to the code in hadoop you linked to on the jira.

Updated. But I'm not sure about:

That way values like ContainerExitStatus.SUCCESS from the set would be really used.

this part. @attilapiros

attilapiros · 2018-12-05T13:52:08Z

resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala

-    val containerId = ContainerId.newContainerId(appAttemptId, containerNum)
+  def createContainer(
+      host: String,
+      containerId: ContainerId = ContainerId.newContainerId(appAttemptId, containerNum),


Just containerNumber as parameter with default value of containerNum?

tgravescs · 2018-12-05T15:18:33Z

ok to test

tgravescs · 2018-12-05T15:31:47Z

the approach here makes sense. Are you seeing actual issues with this blacklisting when it shouldn't? I could see that possible there and if so we should move this to defect and make sure it goes into 2.4.1

SparkQA · 2018-12-05T15:42:03Z

Test build #99733 has finished for PR 23223 at commit 65a70dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito

like @tgravescs said, I'm curious if you have cases where this is going wrong currently, and if so probably should be a bug fix targeted at 2.4.1 as well.

squito · 2018-12-05T17:59:42Z

resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala

+    val cs7 = ContainerStatus.newInstance(containers(7).getId, ContainerState.COMPLETE,
+      "aborted", ContainerExitStatus.ABORTED)
+    val cs8 = ContainerStatus.newInstance(containers(8).getId, ContainerState.COMPLETE,
+      "disk_failed", ContainerExitStatus.DISKS_FAILED)


just a suggestion, you can avoid some repetition here

val nonBlacklistedStatuses = Seq(ContainerExitStatus.SUCCESSS, ..., ContainerExitStatus.DISKS_FAILED) val containerStatuses = nonBlacklistedStatus.zipWithIndex.map { case (state, idx) => ContainerStatus.newInstance(containers(idx).getId, ContainerState.COMPLETE, "diagnostics", state) }

Nice suggestion!

squito · 2018-12-05T18:02:04Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

@@ -612,11 +612,14 @@ private[yarn] class YarnAllocator(
            val message = "Container killed by YARN for exceeding physical memory limits. " +
              s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
            (true, message)
+          case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) =>


also after this gets rearranged, I'd leave a comment in here pointing to the code in hadoop you linked to on the jira.

squito · 2018-12-05T18:28:15Z

@attilapiros

I mean if node blacklisting in Spark would be perfectly aligned to YARN then it would be just redundant to have it in Spark in the first place.

I'm not super familiar with exactly how the blacklisting works in yarn itself -- it looks like its only going to blacklist the node for the AM, not other nodes for general containers. I don't totally follow where the KILLED_BY_RESOURCEMANAGER status is generated, but it does seem like a good idea to protect against this, maybe there is a race where the container is created by the RM, but before it reports back to the driver it gets killed with KILLED_BY_RESOURCEMANAGER. (another reason I'm curious if @Ngone51 is actually seeing this cause problems, or just noticed a case to improve.)

Ngone51 · 2018-12-06T03:00:17Z

I mean if node blacklisting in Spark would be perfectly aligned to YARN then it would be just redundant to have it in Spark in the first place.

This change seems result in perfectly aligned to YARN for node blacklisting in Spark, but my original thought is that some exit status (e.g. KILLED_BY_RESOURCEMANAGER ), currently, should not lead to a node blacklisting. So, actually, perfectly aligned to YARN is not the real target of this change, and we can also make some custom strategy for Spark.

Take for example disk failure.

For spark task level backlisting, is it should be delegated to schedulerBlacklist in YarnAllocatorBlacklistTracker ?

And it seems ContainerExitStatus.DISKS_FAILED in YARN is not same with Spark tasks' disk failure.

Ngone51 · 2018-12-06T03:02:06Z

Are you seeing actual issues with this blacklisting when it shouldn't?

Unfortunately, no. @tgravescs @squito

Ngone51 · 2018-12-06T03:14:56Z

it looks like its only going to blacklist the node for the AM, not other nodes for general containers.

@squito Yarn have blacklist for AM when config am-scheduling.node-blacklisting-enabled=true, and have ContainerFailureTracker for general containers(haven't find a config for it).

SparkQA · 2018-12-06T06:51:40Z

Test build #99757 has finished for PR 23223 at commit 2d1c27a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2018-12-06T14:13:15Z

if you aren't seeing actual issues with this I guess it would be interesting to test it further to see if it does. I can see spark blacklisting when it shouldn't for exit codes like you mention (KILLED_BY_RESOURCEMANAGER) . so I guess I would like to see someone test this further and determine if that happens. If it does we should change to bug and put into 2.4.1.

Ngone51 · 2018-12-06T14:39:05Z

it would be interesting to test it further to see if it does.

@tgravescs Yeah, I have the same thought. I'd like to try it, but I can not guarantee that I can achieve it... Because I have never done this kind of test before. I'll try my best.

Ngone51 · 2018-12-10T04:07:40Z

Hi @tgravescs , I tried it, but found it's difficult to produce KILLED_BY_RESOURCEMANAGER exit status. I followed YARN-73 YARN-495, but things didn't go as I expected.

tgravescs · 2018-12-10T19:34:52Z

ok thanks for trying. if I get a chance I can try later in the week, but that doesn't have to block this now if someone else has time to review before I get to it. We can always pull it back later.

tgravescs · 2018-12-18T14:35:00Z

sorry for the delay on this, will try it out today

tgravescs · 2018-12-20T15:27:58Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

+            // oop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/ap
+            // ache/hadoop/yarn/util/Apps.java#L273 for details)
+            if (NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) {
+              (true, s"Container marked as failed: $containerId$onHostStr" +


I think we want to return false here for exitCausedByApp since these don't seem to be issues with the App. From the comment in the yarn code: // Neither the app's fault nor the system's fault. This happens by design,
// so no need for skipping nodes

If we mark it as true then it counts against our task failure, if its false it doesn't seems like these shouldn't count against our failures.

Make sense.

tgravescs · 2018-12-20T15:28:24Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

+            // ache/hadoop/yarn/util/Apps.java#L273 for details)
+            if (NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(other_exit_status)) {
+              (true, s"Container marked as failed: $containerId$onHostStr" +
+                s". Exit status:  ${completedContainer.getExitStatus}" +


remove extra space after :

tgravescs · 2018-12-20T15:28:52Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

+              // completed container from a bad node
+              allocatorBlacklistTracker.handleResourceAllocationFailure(hostOpt)
+              (true, s"Container from a bad node: $containerId$onHostStr" +
+                s". Exit status:  ${completedContainer.getExitStatus}" +


remove extra space after :

tgravescs · 2018-12-20T15:37:44Z

haven't been able to easily reproduce, will try again today

Ngone51 · 2018-12-20T16:14:39Z

Thank you for your efforts. @tgravescs

SparkQA · 2018-12-20T16:43:33Z

Test build #100336 has finished for PR 23223 at commit da8442e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs

+1. I haven't had time to try further to reproduce but changes make sense

tgravescs · 2018-12-21T19:28:51Z

merged to master, going to merge into if its a clean merge

tgravescs · 2018-12-21T22:28:29Z

this doesn't cleanly merge to 2.4, @Ngone51 would you be able to put up a pr again branch 2.4?

Ngone51 · 2018-12-22T01:23:07Z

sure, thanks a lot. @tgravescs

dongjoon-hyun · 2018-12-22T08:54:20Z

Hi, @tgravescs .
SPARK-26269 is marked as 'Improvement'.
So, we are going to backport this improvement patch to branch-2.4?

tgravescs · 2018-12-22T14:45:22Z

thanks for pointing that out, I forgot to change the type, will fix iut now. I consider this a defect as we would blacklist thing when we shouldn't.

dongjoon-hyun · 2018-12-22T17:10:28Z

+1. Thanks!

…r with yarn to maxmize use of cluster resource ## What changes were proposed in this pull request? As I mentioned in jira [SPARK-26269](https://issues.apache.org/jira/browse/SPARK-26269), in order to maxmize the use of cluster resource, this pr try to make `YarnAllocator` have the same blacklist behaviour with YARN. ## How was this patch tested? Added. Closes apache#23223 from Ngone51/dev-YarnAllocator-should-have-same-blacklist-behaviour-with-YARN. Lead-authored-by: wuyi <ngone_5451@163.com> Co-authored-by: Ngone51 <ngone_5451@163.com> Signed-off-by: Thomas Graves <tgraves@apache.org>

Ngone51 added 2 commits December 5, 2018 00:17

YarnAllocator should have same blacklist behaviour with YARN

9f88e1c

fix ut

65a70dc

Ngone51 changed the title ~~Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource~~ [SPARK-2629][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource Dec 5, 2018

Ngone51 changed the title ~~[SPARK-2629][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource~~ [SPARK-26269][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource Dec 5, 2018

attilapiros reviewed Dec 5, 2018

View reviewed changes

squito reviewed Dec 5, 2018

View reviewed changes

address comments

2d1c27a

tgravescs reviewed Dec 20, 2018

View reviewed changes

address comment

da8442e

tgravescs approved these changes Dec 21, 2018

View reviewed changes

asfgit closed this in d6a5f85 Dec 21, 2018

Ngone51 mentioned this pull request Dec 22, 2018

[SPARK-26269][YARN][BRANCH-2.4] Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource #23368

Closed

[SPARK-26269][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource #23223

[SPARK-26269][YARN]Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource #23223

Conversation

Ngone51 commented Dec 5, 2018

What changes were proposed in this pull request?

How was this patch tested?

Ngone51 commented Dec 5, 2018

attilapiros left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seregasheypak Jan 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Dec 5, 2018

tgravescs commented Dec 5, 2018

SparkQA commented Dec 5, 2018

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squito commented Dec 5, 2018

Ngone51 commented Dec 6, 2018

Ngone51 commented Dec 6, 2018

Ngone51 commented Dec 6, 2018

SparkQA commented Dec 6, 2018

tgravescs commented Dec 6, 2018

Ngone51 commented Dec 6, 2018

Ngone51 commented Dec 10, 2018

tgravescs commented Dec 10, 2018

tgravescs commented Dec 18, 2018

Choose a reason for hiding this comment

Ngone51 Dec 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgravescs commented Dec 20, 2018

Ngone51 commented Dec 20, 2018

SparkQA commented Dec 20, 2018

tgravescs left a comment

Choose a reason for hiding this comment

tgravescs commented Dec 21, 2018

tgravescs commented Dec 21, 2018

Ngone51 commented Dec 22, 2018

dongjoon-hyun commented Dec 22, 2018

tgravescs commented Dec 22, 2018

dongjoon-hyun commented Dec 22, 2018

attilapiros left a comment •

edited

Loading

seregasheypak Jan 21, 2019 •

edited

Loading

Ngone51 Dec 20, 2018 •

edited

Loading