[SPARK-22941][core] Do not exit JVM when submit fails with in-process launcher. #20925

vanzin · 2018-03-28T20:49:33Z

The current in-process launcher implementation just calls the SparkSubmit
object, which, in case of errors, will more often than not exit the JVM.
This is not desirable since this launcher is meant to be used inside other
applications, and that would kill the application.

The change turns SparkSubmit into a class, and abstracts aways some of
the functionality used to print error messages and abort the submission
process. The default implementation uses the logging system for messages,
and throws exceptions for errors. As part of that I also moved some code
that doesn't really belong in SparkSubmit to a better location.

The command line invocation of spark-submit now uses a special implementation
of the SparkSubmit class that overrides those behaviors to do what is expected
from the command line version (print to the terminal, exit the JVM, etc).

A lot of the changes are to replace calls to methods such as "printErrorAndExit"
with the new API.

As part of adding tests for this, I had to fix some small things in the
launcher option parser so that things like "--version" can work when
used in the launcher library.

There is still code that prints directly to the terminal, like all the
Ivy-related code in SparkSubmitUtils, and other areas where some re-factoring
would help, like the CommandLineUtils class, but I chose to leave those
alone to keep this change more focused.

Aside from existing and added unit tests, I ran command line tools with
a bunch of different arguments to make sure messages and errors behave
like before.

… launcher. The current in-process launcher implementation just calls the SparkSubmit object, which, in case of errors, will more often than not exit the JVM. This is not desirable since this launcher is meant to be used inside other applications, and that would kill the application. The change turns SparkSubmit into a class, and abstracts aways some of the functionality used to print error messages and abort the submission process. The default implementation uses the logging system for messages, and throws exceptions for errors. As part of that I also moved some code that doesn't really belong in SparkSubmit to a better location. The command line invocation of spark-submit now uses a special implementation of the SparkSubmit class that overrides those behaviors to do what is expected from the command line version (print to the terminal, exit the JVM, etc). A lot of the changes are to replace calls to methods such as "printErrorAndExit" with the new API. As part of adding tests for this, I had to fix some small things in the launcher option parser so that things like "--version" can work when used in the launcher library. There is still code that prints directly to the terminal, like all the Ivy-related code in SparkSubmitUtils, and other areas where some re-factoring would help, like the CommandLineUtils class, but I chose to leave those alone to keep this change more focused. Aside from existing and added unit tests, I ran command line tools with a bunch of different arguments to make sure messages and errors behave like before.

SparkQA · 2018-03-29T00:50:19Z

Test build #88675 has finished for PR 20925 at commit 466f84a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
logInfo(s\"Failed to load main class $childMainClass.\")
error(s\"Cannot load main class from JAR $primaryResource\")
error(\"No main class set in JAR; please specify one with --class\")

attilapiros

It is a lot of changes to check at once. I will continue the review later to go a bit deeper.

attilapiros · 2018-04-06T16:48:04Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

@@ -289,27 +288,26 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
    }


This might be a good candidate to use your new error method instead of throwing the Exception directly. It might happen there is client catching both Exception and SparkException and doing very different things but I guess that is very unlikely case.

attilapiros · 2018-04-06T16:57:37Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

@@ -499,20 +497,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
  }

  private def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit = {


Consider renaming the method. What about printUsageAndThrowException?

The intent is to "exit" the submission process (even if there's no "exit" in some cases). The different name would also feel weird given the "exitCode" parameter. So even if not optimal I prefer the current name.

attilapiros · 2018-04-06T17:33:42Z

launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java

@@ -88,7 +88,8 @@
      SparkLauncher.NO_RESOURCE);
  }

-  final List<String> sparkArgs;
+  final List<String> userArgs;


Consider making it private and accessing via methods.

That's overkill for final fields. Even more if those fields are package-private.

attilapiros · 2018-04-06T18:00:37Z

launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java

    boolean isExample = false;
    List<String> submitArgs = args;
+    this.userArgs = null;


Consider Collections.emptyList(). I see these two constructors covers two different use cases. An abstract base class with two derived classes could express this two uses cases better but I know it is out of scope for now. Does it make sense to create a Jira ticket for refactoring this?

If you want to take a stab at refactoring... I'm not so sure you'd be able to make things much better though, since the parameters just control shared logic that is applied later.

attilapiros · 2018-04-06T18:05:33Z

launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java

@@ -400,6 +419,11 @@ private boolean isThriftServer(String mainClass) {
  private class OptionParser extends SparkSubmitOptionParser {

    boolean isAppResourceReq = true;
+    boolean errorOnUnknownArgs;


SparkQA · 2018-04-07T03:34:37Z

Test build #89002 has finished for PR 20925 at commit d208e33.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

squito

just a couple things to clarify or add comments to

squito · 2018-04-08T16:08:43Z

core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala

@@ -775,17 +781,17 @@ class SparkSubmitSuite
  }

  test("SPARK_CONF_DIR overrides spark-defaults.conf") {
-    forConfDir(Map("spark.executor.memory" -> "2.3g")) { path =>
+    forConfDir(Map("spark.executor.memory" -> "3g")) { path =>


why this change? you no longer support fractional values?

It's just that now instead of just printing an error to the output, an exception is actually thrown.

[info] SparkSubmitSuite: [info] - SPARK_CONF_DIR overrides spark-defaults.conf *** FAILED *** (144 milliseconds) [info] org.apache.spark.SparkException: Executor Memory cores must be a positive number [info] at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:652) [info] at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:267)

That is because:

scala> c.set("spark.abcde", "2.3g") res0: org.apache.spark.SparkConf = org.apache.spark.SparkConf@cd5ff55 scala> c.getSizeAsBytes("spark.abcde") java.lang.NumberFormatException: Size must be specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m. Fractional values are not supported. Input was: 2.3

Just noticed that the error message is kinda wrong, but also this whole validation function (validateSubmitArguments) leaves a lot to be desired...

squito · 2018-04-09T04:57:06Z

launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java

@@ -154,9 +165,17 @@

  List<String> buildSparkSubmitArgs() {
    List<String> args = new ArrayList<>();
-    SparkSubmitOptionParser parser = new SparkSubmitOptionParser();
+    OptionParser parser = new OptionParser(false);


whats the reason for allowing unknown args here? Is it so an old launcher can start a newer spark, which may accept more args? A comment would be helpful

This is explained in the public API (addSparkArg which is declared in AbstractLauncher).

squito · 2018-04-09T04:59:22Z

launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java

+    // the user is trying to run, so that checks below are correct.
+    if (!userArgs.isEmpty()) {
+      parser.parse(userArgs);
+      isStartingApp = parser.isAppResourceReq;


I don't really care whether the name is isStartingApp or isAppResourceReq, but seems the name should be same here and in OptionParser, unless there is some difference I'm missing.

attilapiros · 2018-04-09T11:07:05Z

I have finished my review and have not found any additional issue.

LGTM

squito · 2018-04-09T20:40:28Z

lgtm assuming tests pass

SparkQA · 2018-04-09T23:38:33Z

Test build #89074 has finished for PR 20925 at commit 262bad8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2018-04-10T13:52:13Z

Flaky test I've seen before: https://issues.apache.org/jira/browse/SPARK-23894

Jenkins, retest this please

SparkQA · 2018-04-10T18:23:12Z

Test build #4150 has finished for PR 20925 at commit 262bad8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2018-04-11T15:13:10Z

another unrelated flaky test, I filed https://issues.apache.org/jira/browse/SPARK-23962

squito · 2018-04-11T15:14:20Z

merged to master

…class is required. ## What changes were proposed in this pull request? With [PR 20925](#20925) now it's not possible to execute the following commands: * run-example * run-example --help * run-example --version * run-example --usage-error * run-example --status ... * run-example --kill ... In this PR the execution will be allowed for the mentioned commands. ## How was this patch tested? Existing unit tests extended + additional written. Author: Gabor Somogyi <gabor.g.somogyi@gmail.com> Closes #21450 from gaborgsomogyi/SPARK-24319.

…ssDefFoundError in SparkSubmit to Error ## What changes were proposed in this pull request? In my local setup, I set log4j root category as ERROR (https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console , first item show up if we google search "set spark log level".) When I run such command ``` spark-submit --class foo bar.jar ``` Nothing shows up, and the script exits. After quick investigation, I think the log level for ClassNotFoundException/NoClassDefFoundError in SparkSubmit should be ERROR instead of WARN. Since the whole process exit because of the exception/error. Before apache#20925, the message is not controlled by `log4j.rootCategory`. ## How was this patch tested? Manual check. Closes apache#23189 from gengliangwang/changeLogLevel. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

### What changes were proposed in this pull request? This PR aims to remove the deprecate Java `System.setSecurityManager` usage for Apache Spark 4.0. Note that this is the only usage in Apache Spark AS-IS code. ``` $ git grep setSecurityManager core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(sm) core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(currentSm) ``` This usage was added at `Apache Spark 1.5.0`. - #5841 Since `Apache Spark 2.4.0`, we don't need `setSecurityManager` due to the following improvement. - #20925 ### Why are the changes needed? ``` $ java -version openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+32-2482) OpenJDK 64-Bit Server VM (build 21-ea+32-2482, mixed mode, sharing) ``` ``` max spark-3.5.0-bin-hadoop3:$ bin/spark-sql --help ... CLI options: Exception in thread "main" java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release at java.base/java.lang.System.setSecurityManager(System.java:429) at org.apache.spark.deploy.SparkSubmitArguments.getSqlShellOptions(SparkSubmitArguments.scala:623) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ``` $ build/sbt test:package -Phive -Phive-thriftserver $ bin/spark-sql --help ... CLI options: -d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42901 from dongjoon-hyun/SPARK-45147. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? This PR aims to remove the deprecate Java `System.setSecurityManager` usage for Apache Spark 4.0. Note that this is the only usage in Apache Spark AS-IS code. ``` $ git grep setSecurityManager core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(sm) core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(currentSm) ``` This usage was added at `Apache Spark 1.5.0`. - apache#5841 Since `Apache Spark 2.4.0`, we don't need `setSecurityManager` due to the following improvement. - apache#20925 ### Why are the changes needed? ``` $ java -version openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+32-2482) OpenJDK 64-Bit Server VM (build 21-ea+32-2482, mixed mode, sharing) ``` ``` max spark-3.5.0-bin-hadoop3:$ bin/spark-sql --help ... CLI options: Exception in thread "main" java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release at java.base/java.lang.System.setSecurityManager(System.java:429) at org.apache.spark.deploy.SparkSubmitArguments.getSqlShellOptions(SparkSubmitArguments.scala:623) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ``` $ build/sbt test:package -Phive -Phive-thriftserver $ bin/spark-sql --help ... CLI options: -d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#42901 from dongjoon-hyun/SPARK-45147. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

attilapiros reviewed Apr 6, 2018

View reviewed changes

Feedback.

d208e33

squito reviewed Apr 9, 2018

View reviewed changes

Marcelo Vanzin added 2 commits April 9, 2018 13:08

Feedback.

3f68bf3

Merge branch 'master' into SPARK-22941

262bad8

asfgit closed this in 3cb8204 Apr 11, 2018

vanzin deleted the SPARK-22941 branch April 13, 2018 21:21

gaborgsomogyi mentioned this pull request May 29, 2018

[SPARK-24319][SPARK SUBMIT] Fix spark-submit execution where no main class is required. #21450

Closed

gengliangwang mentioned this pull request Nov 30, 2018

[SPARK-26235][Core] Change log level for ClassNotFoundException/NoClassDefFoundError in SparkSubmit to Error #23189

Closed

dongjoon-hyun mentioned this pull request Sep 13, 2023

[SPARK-45147][CORE] Remove System.setSecurityManager usage #42901

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22941][core] Do not exit JVM when submit fails with in-process launcher. #20925

[SPARK-22941][core] Do not exit JVM when submit fails with in-process launcher. #20925

vanzin commented Mar 28, 2018

SparkQA commented Mar 29, 2018

attilapiros left a comment

attilapiros Apr 6, 2018

attilapiros Apr 6, 2018

vanzin Apr 6, 2018

attilapiros Apr 6, 2018

vanzin Apr 6, 2018

attilapiros Apr 6, 2018

vanzin Apr 6, 2018

attilapiros Apr 6, 2018

SparkQA commented Apr 7, 2018

squito left a comment

squito Apr 8, 2018

vanzin Apr 9, 2018

squito Apr 9, 2018

vanzin Apr 9, 2018

squito Apr 9, 2018

attilapiros commented Apr 9, 2018

squito commented Apr 9, 2018

SparkQA commented Apr 9, 2018

squito commented Apr 10, 2018

SparkQA commented Apr 10, 2018

squito commented Apr 11, 2018

squito commented Apr 11, 2018

		@@ -289,27 +288,26 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
		}

		@@ -499,20 +497,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
		}

		private def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit = {

[SPARK-22941][core] Do not exit JVM when submit fails with in-process launcher. #20925

[SPARK-22941][core] Do not exit JVM when submit fails with in-process launcher. #20925

Conversation

vanzin commented Mar 28, 2018

SparkQA commented Mar 29, 2018

attilapiros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Apr 7, 2018

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

attilapiros commented Apr 9, 2018

squito commented Apr 9, 2018

SparkQA commented Apr 9, 2018

squito commented Apr 10, 2018

SparkQA commented Apr 10, 2018

squito commented Apr 11, 2018

squito commented Apr 11, 2018