Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22941][core] Do not exit JVM when submit fails with in-process launcher. #20925

Closed
wants to merge 4 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Mar 28, 2018

The current in-process launcher implementation just calls the SparkSubmit
object, which, in case of errors, will more often than not exit the JVM.
This is not desirable since this launcher is meant to be used inside other
applications, and that would kill the application.

The change turns SparkSubmit into a class, and abstracts aways some of
the functionality used to print error messages and abort the submission
process. The default implementation uses the logging system for messages,
and throws exceptions for errors. As part of that I also moved some code
that doesn't really belong in SparkSubmit to a better location.

The command line invocation of spark-submit now uses a special implementation
of the SparkSubmit class that overrides those behaviors to do what is expected
from the command line version (print to the terminal, exit the JVM, etc).

A lot of the changes are to replace calls to methods such as "printErrorAndExit"
with the new API.

As part of adding tests for this, I had to fix some small things in the
launcher option parser so that things like "--version" can work when
used in the launcher library.

There is still code that prints directly to the terminal, like all the
Ivy-related code in SparkSubmitUtils, and other areas where some re-factoring
would help, like the CommandLineUtils class, but I chose to leave those
alone to keep this change more focused.

Aside from existing and added unit tests, I ran command line tools with
a bunch of different arguments to make sure messages and errors behave
like before.

… launcher.

The current in-process launcher implementation just calls the SparkSubmit
object, which, in case of errors, will more often than not exit the JVM.
This is not desirable since this launcher is meant to be used inside other
applications, and that would kill the application.

The change turns SparkSubmit into a class, and abstracts aways some of
the functionality used to print error messages and abort the submission
process. The default implementation uses the logging system for messages,
and throws exceptions for errors. As part of that I also moved some code
that doesn't really belong in SparkSubmit to a better location.

The command line invocation of spark-submit now uses a special implementation
of the SparkSubmit class that overrides those behaviors to do what is expected
from the command line version (print to the terminal, exit the JVM, etc).

A lot of the changes are to replace calls to methods such as "printErrorAndExit"
with the new API.

As part of adding tests for this, I had to fix some small things in the
launcher option parser so that things like "--version" can work when
used in the launcher library.

There is still code that prints directly to the terminal, like all the
Ivy-related code in SparkSubmitUtils, and other areas where some re-factoring
would help, like the CommandLineUtils class, but I chose to leave those
alone to keep this change more focused.

Aside from existing and added unit tests, I ran command line tools with
a bunch of different arguments to make sure messages and errors behave
like before.
@SparkQA
Copy link

SparkQA commented Mar 29, 2018

Test build #88675 has finished for PR 20925 at commit 466f84a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • logInfo(s\"Failed to load main class $childMainClass.\")
  • error(s\"Cannot load main class from JAR $primaryResource\")
  • error(\"No main class set in JAR; please specify one with --class\")

Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a lot of changes to check at once. I will continue the review later to go a bit deeper.

@@ -289,27 +288,26 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good candidate to use your new error method instead of throwing the Exception directly. It might happen there is client catching both Exception and SparkException and doing very different things but I guess that is very unlikely case.

@@ -499,20 +497,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S
}

private def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming the method. What about printUsageAndThrowException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to "exit" the submission process (even if there's no "exit" in some cases). The different name would also feel weird given the "exitCode" parameter. So even if not optimal I prefer the current name.

@@ -88,7 +88,8 @@
SparkLauncher.NO_RESOURCE);
}

final List<String> sparkArgs;
final List<String> userArgs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making it private and accessing via methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's overkill for final fields. Even more if those fields are package-private.

boolean isExample = false;
List<String> submitArgs = args;
this.userArgs = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider Collections.emptyList(). I see these two constructors covers two different use cases. An abstract base class with two derived classes could express this two uses cases better but I know it is out of scope for now. Does it make sense to create a Jira ticket for refactoring this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to take a stab at refactoring... I'm not so sure you'd be able to make things much better though, since the parameters just control shared logic that is applied later.

@@ -400,6 +419,11 @@ private boolean isThriftServer(String mainClass) {
private class OptionParser extends SparkSubmitOptionParser {

boolean isAppResourceReq = true;
boolean errorOnUnknownArgs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private

@SparkQA
Copy link

SparkQA commented Apr 7, 2018

Test build #89002 has finished for PR 20925 at commit d208e33.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple things to clarify or add comments to

@@ -775,17 +781,17 @@ class SparkSubmitSuite
}

test("SPARK_CONF_DIR overrides spark-defaults.conf") {
forConfDir(Map("spark.executor.memory" -> "2.3g")) { path =>
forConfDir(Map("spark.executor.memory" -> "3g")) { path =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change? you no longer support fractional values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that now instead of just printing an error to the output, an exception is actually thrown.

[info] SparkSubmitSuite:
[info] - SPARK_CONF_DIR overrides spark-defaults.conf *** FAILED *** (144 milliseconds)
[info]   org.apache.spark.SparkException: Executor Memory cores must be a positive number
[info]   at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:652)
[info]   at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:267)

That is because:

scala> c.set("spark.abcde", "2.3g")
res0: org.apache.spark.SparkConf = org.apache.spark.SparkConf@cd5ff55

scala> c.getSizeAsBytes("spark.abcde")
java.lang.NumberFormatException: Size must be specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.
Fractional values are not supported. Input was: 2.3

Just noticed that the error message is kinda wrong, but also this whole validation function (validateSubmitArguments) leaves a lot to be desired...

@@ -154,9 +165,17 @@

List<String> buildSparkSubmitArgs() {
List<String> args = new ArrayList<>();
SparkSubmitOptionParser parser = new SparkSubmitOptionParser();
OptionParser parser = new OptionParser(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the reason for allowing unknown args here? Is it so an old launcher can start a newer spark, which may accept more args? A comment would be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explained in the public API (addSparkArg which is declared in AbstractLauncher).

// the user is trying to run, so that checks below are correct.
if (!userArgs.isEmpty()) {
parser.parse(userArgs);
isStartingApp = parser.isAppResourceReq;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really care whether the name is isStartingApp or isAppResourceReq, but seems the name should be same here and in OptionParser, unless there is some difference I'm missing.

@attilapiros
Copy link
Contributor

I have finished my review and have not found any additional issue.

LGTM

@squito
Copy link
Contributor

squito commented Apr 9, 2018

lgtm assuming tests pass

@SparkQA
Copy link

SparkQA commented Apr 9, 2018

Test build #89074 has finished for PR 20925 at commit 262bad8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Apr 10, 2018

Flaky test I've seen before: https://issues.apache.org/jira/browse/SPARK-23894

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Apr 10, 2018

Test build #4150 has finished for PR 20925 at commit 262bad8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Apr 11, 2018

another unrelated flaky test, I filed https://issues.apache.org/jira/browse/SPARK-23962

@squito
Copy link
Contributor

squito commented Apr 11, 2018

merged to master

@asfgit asfgit closed this in 3cb8204 Apr 11, 2018
@vanzin vanzin deleted the SPARK-22941 branch April 13, 2018 21:21
asfgit pushed a commit that referenced this pull request Jun 14, 2018
…class is required.

## What changes were proposed in this pull request?

With [PR 20925](#20925) now it's not possible to execute the following commands:
* run-example
* run-example --help
* run-example --version
* run-example --usage-error
* run-example --status ...
* run-example --kill ...

In this PR the execution will be allowed for the mentioned commands.

## How was this patch tested?

Existing unit tests extended + additional written.

Author: Gabor Somogyi <gabor.g.somogyi@gmail.com>

Closes #21450 from gaborgsomogyi/SPARK-24319.
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ssDefFoundError in SparkSubmit to Error

## What changes were proposed in this pull request?

In my local setup, I set log4j root category as ERROR (https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console , first item show up if we google search "set spark log level".) When I run such command
```
spark-submit --class foo bar.jar
```
Nothing shows up, and the script exits.

After quick investigation, I think the log level for ClassNotFoundException/NoClassDefFoundError in SparkSubmit should be ERROR instead of WARN. Since the whole process exit because of the exception/error.

Before apache#20925, the message is not controlled by `log4j.rootCategory`.

## How was this patch tested?

Manual check.

Closes apache#23189 from gengliangwang/changeLogLevel.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
dongjoon-hyun added a commit that referenced this pull request Sep 13, 2023
### What changes were proposed in this pull request?

This PR aims to remove the deprecate Java `System.setSecurityManager` usage for Apache Spark 4.0.

Note that this is the only usage in Apache Spark AS-IS code.
```
$ git grep setSecurityManager
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala:      System.setSecurityManager(sm)
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala:      System.setSecurityManager(currentSm)
```

This usage was added at `Apache Spark 1.5.0`.

- #5841

Since `Apache Spark 2.4.0`, we don't need `setSecurityManager` due to the following improvement.

- #20925

### Why are the changes needed?

```
$ java -version
openjdk version "21-ea" 2023-09-19
OpenJDK Runtime Environment (build 21-ea+32-2482)
OpenJDK 64-Bit Server VM (build 21-ea+32-2482, mixed mode, sharing)
```

```
max spark-3.5.0-bin-hadoop3:$ bin/spark-sql --help
...
CLI options:
Exception in thread "main" java.lang.UnsupportedOperationException:
The Security Manager is deprecated and will be removed in a future release
  at java.base/java.lang.System.setSecurityManager(System.java:429)
  at org.apache.spark.deploy.SparkSubmitArguments.getSqlShellOptions(SparkSubmitArguments.scala:623)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.
```
$ build/sbt test:package -Phive -Phive-thriftserver
$ bin/spark-sql --help
...
CLI options:
 -d,--define <key=value>          Variable substitution to apply to Hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable substitution to apply to Hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42901 from dongjoon-hyun/SPARK-45147.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
### What changes were proposed in this pull request?

This PR aims to remove the deprecate Java `System.setSecurityManager` usage for Apache Spark 4.0.

Note that this is the only usage in Apache Spark AS-IS code.
```
$ git grep setSecurityManager
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala:      System.setSecurityManager(sm)
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala:      System.setSecurityManager(currentSm)
```

This usage was added at `Apache Spark 1.5.0`.

- apache#5841

Since `Apache Spark 2.4.0`, we don't need `setSecurityManager` due to the following improvement.

- apache#20925

### Why are the changes needed?

```
$ java -version
openjdk version "21-ea" 2023-09-19
OpenJDK Runtime Environment (build 21-ea+32-2482)
OpenJDK 64-Bit Server VM (build 21-ea+32-2482, mixed mode, sharing)
```

```
max spark-3.5.0-bin-hadoop3:$ bin/spark-sql --help
...
CLI options:
Exception in thread "main" java.lang.UnsupportedOperationException:
The Security Manager is deprecated and will be removed in a future release
  at java.base/java.lang.System.setSecurityManager(System.java:429)
  at org.apache.spark.deploy.SparkSubmitArguments.getSqlShellOptions(SparkSubmitArguments.scala:623)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.
```
$ build/sbt test:package -Phive -Phive-thriftserver
$ bin/spark-sql --help
...
CLI options:
 -d,--define <key=value>          Variable substitution to apply to Hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable substitution to apply to Hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#42901 from dongjoon-hyun/SPARK-45147.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants