[SPARK-15273] YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutOfMemoryError parameter given by user #13057

tedyu · 2016-05-11T20:22:44Z

What changes were proposed in this pull request?

As Nirav reported in this thread:
http://search-hadoop.com/m/q3RTtdF3yNLMd7u

YarnSparkHadoopUtil#getOutOfMemoryErrorArgument previously specified 'kill %p' unconditionally.
We should respect the parameter given by user.

How was this patch tested?

Existing tests

…fMemoryError parameter given by user

SparkQA · 2016-05-11T20:29:16Z

Test build #58397 has finished for PR 13057 at commit 1fa3634.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-11T20:39:04Z

Test build #58398 has finished for PR 13057 at commit b0a84ff.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-11T21:00:51Z

Test build #58400 has finished for PR 13057 at commit 9bf7967.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-05-12T10:19:28Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala

    if (Utils.isWindows) {
      escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID %%%%p")
    } else {
-      "-XX:OnOutOfMemoryError='kill %p'"
+      val onOOME = javaOpts.find(x => x.contains("-XX:OnOutOfMemoryError"))
+      if (onOOME == None) {


There are several things wrong with this Scala code, but more generally, this isn't a good way to design this method. If you really mean to optionally add an argument, then see how things like the PermGen argument are handled and make it work more that way.

SparkQA · 2016-05-12T13:08:16Z

Test build #58483 has finished for PR 13057 at commit 044ca7e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-12T13:18:59Z

Test build #58485 has finished for PR 13057 at commit 2e05881.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-12T13:33:20Z

Test build #58486 has finished for PR 13057 at commit c7b17ed.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-12T13:37:35Z

Test build #58487 has started for PR 13057 at commit 67e1d3a.

shaneknapp · 2016-05-12T14:49:09Z

i will retrigger this build once maintenance is over.

tedyu · 2016-05-12T14:51:45Z

@srowen
Can you take another look ?

Thanks

srowen · 2016-05-12T15:18:51Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala

@@ -418,11 +420,11 @@ object YarnSparkHadoopUtil {
   *
   * @return The correct OOM Error handler JVM option, platform dependent.
   */
-  def getOutOfMemoryErrorArgument: String = {
+  def getOutOfMemoryErrorArgument(sparkConf: SparkConf, javaOpts: ListBuffer[String]): String = {


This doesn't need sparkConf as an arg right? and javaOpts can just be a Seq?

shaneknapp · 2016-05-12T16:21:11Z

jenkins, test this please

shaneknapp · 2016-05-12T16:27:51Z

jenkins, test this please

SparkQA · 2016-05-12T18:28:06Z

Test build #58498 has finished for PR 13057 at commit bf06049.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2016-05-13T09:11:21Z

@srowen
Mind taking another look ?

Thanks

SparkQA · 2016-05-13T11:43:15Z

Test build #58562 has finished for PR 13057 at commit c5a0971.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-13T11:57:59Z

Test build #58568 has finished for PR 13057 at commit 7f05b71.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2016-05-13T13:57:21Z

@srowen
I think I have addressed your comments.

Cheers

tedyu · 2016-05-15T20:20:19Z

@srowen
Gentle ping.

tedyu · 2016-05-16T20:10:51Z

@srowen
Pardon for the ping.

srowen · 2016-05-17T15:44:53Z

launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java

@@ -334,6 +334,18 @@ static void addPermGenSizeOpt(List<String> cmd) {
  }

  /**
+   * Gets the OutOfMemoryError option for Spark if the user hasn't set it.
+   */
+  public static void addOutOfMemoryErrorArgument(List<String> cmd) {


Sorry last question -- why does this method need to be in this class? it's not used, it seems, except from YarnSparkHadoopUtil

Please suggest a suitable class which is better host for this Java method.

Just YarnSparkHadoopUtil ? it can be inlined the one place it's called or am I overlooking something?

YarnSparkHadoopUtil is written in Scala while this method is in Java.

Why can't it be written in Scala?

Can you take a look at my initial attempt ?

+ val = onOOME = javaOpts.find(x => x.contains("-XX:OnOutOfMemoryError")) + if (onOOME == None) { + "-XX:OnOutOfMemoryError='kill %p'" + } else { + "" + }

Sure, maybe ...

if (!javaOpts.exists(_.contains("...")) { javaOpts.add("...") }

SparkQA · 2016-05-17T17:47:59Z

Test build #58707 has finished for PR 13057 at commit 1ff83ff.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class CommandBuilderUtils

SparkQA · 2016-05-17T18:01:06Z

Test build #58703 has finished for PR 13057 at commit 0847e6e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-05-17T18:14:29Z

Test build #58709 has finished for PR 13057 at commit 67d5bfc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-05-18T11:12:12Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala

@@ -128,6 +128,11 @@ private[yarn] class ExecutorRunnable(
    }
  }

+  // Kill if OOM is raised - leverage yarn's failure handling to cause rescheduling.


This should be a comment on addOutOfMemoryErrorArgument right? that's what I meant.

SparkQA · 2016-05-18T12:37:09Z

Test build #58778 has finished for PR 13057 at commit 09fcb1e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2016-05-19T08:12:14Z

@srowen
See if I have addressed all your comments.

tedyu · 2016-05-20T12:04:29Z

@srowen
Gentle ping.

srowen · 2016-05-20T13:19:12Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala

    if (Utils.isWindows) {
-      escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID %%%%p")
+      if (!javaOpts.exists(_.contains("-XX:OnOutOfMemoryError"))) {


OK, but this duplicates the condition. You probably want to flip the nesting of the if conditions here to avoid it

SparkQA · 2016-05-20T14:04:42Z

Test build #58995 has finished for PR 13057 at commit e7c4472.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2016-05-20T22:32:55Z

@srowen :
Is this ready to go in ?

Thanks

srowen · 2016-05-20T22:39:01Z

@tedyu yes, but that's a lot of pinging. I think many iterations on this simple change could have been saved, so I'd focus not on hurrying to merge, but thinking through the feedback and changes you're making more holistically.

…respect OnOutOfMemoryError parameter given by user ## What changes were proposed in this pull request? As Nirav reported in this thread: http://search-hadoop.com/m/q3RTtdF3yNLMd7u YarnSparkHadoopUtil#getOutOfMemoryErrorArgument previously specified 'kill %p' unconditionally. We should respect the parameter given by user. ## How was this patch tested? Existing tests Author: tedyu <yuzhihong@gmail.com> Closes #13057 from tedyu/master. (cherry picked from commit 06c9f52) Signed-off-by: Sean Owen <sowen@cloudera.com>

srowen · 2016-05-20T23:13:45Z

Merged to master/2.0

YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutO…

1fa3634

…fMemoryError parameter given by user

tedyu changed the title ~~YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutOfMemoryError parameter given by user~~ [SPARK-15273] YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutOfMemoryError parameter given by user May 11, 2016

Add import

b0a84ff

Remove extraneous equal sign

9bf7967

srowen reviewed May 12, 2016
View reviewed changes

tedyu added 2 commits May 12, 2016 05:57

Address Sean's comment

044ca7e

Add missing import

3963acd

Add missing import

2e05881

Make getOutOfMemoryErrorArgument() public

c7b17ed

Make CommandBuilderUtils public

67e1d3a

srowen reviewed May 12, 2016
View reviewed changes

Remove sparkConf parameter

bf06049

tedyu added 2 commits May 13, 2016 02:42

Return user specified OnOutOfMemoryError argument

c5a0971

Address Sean's comment

7f05b71

srowen reviewed May 17, 2016
View reviewed changes

tedyu added 2 commits May 17, 2016 08:59

Address Sean's comments

0847e6e

Address Sean's comment

1ff83ff

Correct syntax

67d5bfc

srowen reviewed May 18, 2016
View reviewed changes

Address Sean's comments

09fcb1e

srowen reviewed May 20, 2016
View reviewed changes

Lift common if check

e7c4472

asfgit closed this in 06c9f52 May 20, 2016

[SPARK-15273] YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutOfMemoryError parameter given by user #13057

[SPARK-15273] YarnSparkHadoopUtil#getOutOfMemoryErrorArgument should respect OnOutOfMemoryError parameter given by user #13057

Conversation

tedyu commented May 11, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented May 11, 2016

SparkQA commented May 11, 2016

SparkQA commented May 11, 2016

Choose a reason for hiding this comment

SparkQA commented May 12, 2016

SparkQA commented May 12, 2016

SparkQA commented May 12, 2016

SparkQA commented May 12, 2016

shaneknapp commented May 12, 2016

tedyu commented May 12, 2016

Choose a reason for hiding this comment

shaneknapp commented May 12, 2016

shaneknapp commented May 12, 2016

SparkQA commented May 12, 2016

tedyu commented May 13, 2016

SparkQA commented May 13, 2016

SparkQA commented May 13, 2016

tedyu commented May 13, 2016

tedyu commented May 15, 2016

tedyu commented May 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented May 17, 2016

SparkQA commented May 17, 2016

SparkQA commented May 17, 2016

Choose a reason for hiding this comment

SparkQA commented May 18, 2016

tedyu commented May 19, 2016

tedyu commented May 20, 2016

Choose a reason for hiding this comment

SparkQA commented May 20, 2016

tedyu commented May 20, 2016

srowen commented May 20, 2016

srowen commented May 20, 2016