-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22941][core] Do not exit JVM when submit fails with in-process launcher. #20925
Conversation
… launcher. The current in-process launcher implementation just calls the SparkSubmit object, which, in case of errors, will more often than not exit the JVM. This is not desirable since this launcher is meant to be used inside other applications, and that would kill the application. The change turns SparkSubmit into a class, and abstracts aways some of the functionality used to print error messages and abort the submission process. The default implementation uses the logging system for messages, and throws exceptions for errors. As part of that I also moved some code that doesn't really belong in SparkSubmit to a better location. The command line invocation of spark-submit now uses a special implementation of the SparkSubmit class that overrides those behaviors to do what is expected from the command line version (print to the terminal, exit the JVM, etc). A lot of the changes are to replace calls to methods such as "printErrorAndExit" with the new API. As part of adding tests for this, I had to fix some small things in the launcher option parser so that things like "--version" can work when used in the launcher library. There is still code that prints directly to the terminal, like all the Ivy-related code in SparkSubmitUtils, and other areas where some re-factoring would help, like the CommandLineUtils class, but I chose to leave those alone to keep this change more focused. Aside from existing and added unit tests, I ran command line tools with a bunch of different arguments to make sure messages and errors behave like before.
Test build #88675 has finished for PR 20925 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a lot of changes to check at once. I will continue the review later to go a bit deeper.
@@ -289,27 +288,26 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a good candidate to use your new error method instead of throwing the Exception directly. It might happen there is client catching both Exception and SparkException and doing very different things but I guess that is very unlikely case.
@@ -499,20 +497,18 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S | |||
} | |||
|
|||
private def printUsageAndExit(exitCode: Int, unknownParam: Any = null): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming the method. What about printUsageAndThrowException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to "exit" the submission process (even if there's no "exit" in some cases). The different name would also feel weird given the "exitCode" parameter. So even if not optimal I prefer the current name.
@@ -88,7 +88,8 @@ | |||
SparkLauncher.NO_RESOURCE); | |||
} | |||
|
|||
final List<String> sparkArgs; | |||
final List<String> userArgs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making it private and accessing via methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's overkill for final fields. Even more if those fields are package-private.
boolean isExample = false; | ||
List<String> submitArgs = args; | ||
this.userArgs = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider Collections.emptyList(). I see these two constructors covers two different use cases. An abstract base class with two derived classes could express this two uses cases better but I know it is out of scope for now. Does it make sense to create a Jira ticket for refactoring this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to take a stab at refactoring... I'm not so sure you'd be able to make things much better though, since the parameters just control shared logic that is applied later.
@@ -400,6 +419,11 @@ private boolean isThriftServer(String mainClass) { | |||
private class OptionParser extends SparkSubmitOptionParser { | |||
|
|||
boolean isAppResourceReq = true; | |||
boolean errorOnUnknownArgs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
Test build #89002 has finished for PR 20925 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple things to clarify or add comments to
@@ -775,17 +781,17 @@ class SparkSubmitSuite | |||
} | |||
|
|||
test("SPARK_CONF_DIR overrides spark-defaults.conf") { | |||
forConfDir(Map("spark.executor.memory" -> "2.3g")) { path => | |||
forConfDir(Map("spark.executor.memory" -> "3g")) { path => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? you no longer support fractional values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that now instead of just printing an error to the output, an exception is actually thrown.
[info] SparkSubmitSuite:
[info] - SPARK_CONF_DIR overrides spark-defaults.conf *** FAILED *** (144 milliseconds)
[info] org.apache.spark.SparkException: Executor Memory cores must be a positive number
[info] at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:652)
[info] at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:267)
That is because:
scala> c.set("spark.abcde", "2.3g")
res0: org.apache.spark.SparkConf = org.apache.spark.SparkConf@cd5ff55
scala> c.getSizeAsBytes("spark.abcde")
java.lang.NumberFormatException: Size must be specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.
Fractional values are not supported. Input was: 2.3
Just noticed that the error message is kinda wrong, but also this whole validation function (validateSubmitArguments
) leaves a lot to be desired...
@@ -154,9 +165,17 @@ | |||
|
|||
List<String> buildSparkSubmitArgs() { | |||
List<String> args = new ArrayList<>(); | |||
SparkSubmitOptionParser parser = new SparkSubmitOptionParser(); | |||
OptionParser parser = new OptionParser(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the reason for allowing unknown args here? Is it so an old launcher can start a newer spark, which may accept more args? A comment would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is explained in the public API (addSparkArg
which is declared in AbstractLauncher
).
// the user is trying to run, so that checks below are correct. | ||
if (!userArgs.isEmpty()) { | ||
parser.parse(userArgs); | ||
isStartingApp = parser.isAppResourceReq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really care whether the name is isStartingApp
or isAppResourceReq
, but seems the name should be same here and in OptionParser, unless there is some difference I'm missing.
I have finished my review and have not found any additional issue. LGTM |
lgtm assuming tests pass |
Test build #89074 has finished for PR 20925 at commit
|
Flaky test I've seen before: https://issues.apache.org/jira/browse/SPARK-23894 Jenkins, retest this please |
Test build #4150 has finished for PR 20925 at commit
|
another unrelated flaky test, I filed https://issues.apache.org/jira/browse/SPARK-23962 |
merged to master |
…class is required. ## What changes were proposed in this pull request? With [PR 20925](#20925) now it's not possible to execute the following commands: * run-example * run-example --help * run-example --version * run-example --usage-error * run-example --status ... * run-example --kill ... In this PR the execution will be allowed for the mentioned commands. ## How was this patch tested? Existing unit tests extended + additional written. Author: Gabor Somogyi <gabor.g.somogyi@gmail.com> Closes #21450 from gaborgsomogyi/SPARK-24319.
…ssDefFoundError in SparkSubmit to Error ## What changes were proposed in this pull request? In my local setup, I set log4j root category as ERROR (https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console , first item show up if we google search "set spark log level".) When I run such command ``` spark-submit --class foo bar.jar ``` Nothing shows up, and the script exits. After quick investigation, I think the log level for ClassNotFoundException/NoClassDefFoundError in SparkSubmit should be ERROR instead of WARN. Since the whole process exit because of the exception/error. Before apache#20925, the message is not controlled by `log4j.rootCategory`. ## How was this patch tested? Manual check. Closes apache#23189 from gengliangwang/changeLogLevel. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
### What changes were proposed in this pull request? This PR aims to remove the deprecate Java `System.setSecurityManager` usage for Apache Spark 4.0. Note that this is the only usage in Apache Spark AS-IS code. ``` $ git grep setSecurityManager core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(sm) core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(currentSm) ``` This usage was added at `Apache Spark 1.5.0`. - #5841 Since `Apache Spark 2.4.0`, we don't need `setSecurityManager` due to the following improvement. - #20925 ### Why are the changes needed? ``` $ java -version openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+32-2482) OpenJDK 64-Bit Server VM (build 21-ea+32-2482, mixed mode, sharing) ``` ``` max spark-3.5.0-bin-hadoop3:$ bin/spark-sql --help ... CLI options: Exception in thread "main" java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release at java.base/java.lang.System.setSecurityManager(System.java:429) at org.apache.spark.deploy.SparkSubmitArguments.getSqlShellOptions(SparkSubmitArguments.scala:623) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ``` $ build/sbt test:package -Phive -Phive-thriftserver $ bin/spark-sql --help ... CLI options: -d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42901 from dongjoon-hyun/SPARK-45147. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This PR aims to remove the deprecate Java `System.setSecurityManager` usage for Apache Spark 4.0. Note that this is the only usage in Apache Spark AS-IS code. ``` $ git grep setSecurityManager core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(sm) core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(currentSm) ``` This usage was added at `Apache Spark 1.5.0`. - apache#5841 Since `Apache Spark 2.4.0`, we don't need `setSecurityManager` due to the following improvement. - apache#20925 ### Why are the changes needed? ``` $ java -version openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+32-2482) OpenJDK 64-Bit Server VM (build 21-ea+32-2482, mixed mode, sharing) ``` ``` max spark-3.5.0-bin-hadoop3:$ bin/spark-sql --help ... CLI options: Exception in thread "main" java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release at java.base/java.lang.System.setSecurityManager(System.java:429) at org.apache.spark.deploy.SparkSubmitArguments.getSqlShellOptions(SparkSubmitArguments.scala:623) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ``` $ build/sbt test:package -Phive -Phive-thriftserver $ bin/spark-sql --help ... CLI options: -d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#42901 from dongjoon-hyun/SPARK-45147. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
The current in-process launcher implementation just calls the SparkSubmit
object, which, in case of errors, will more often than not exit the JVM.
This is not desirable since this launcher is meant to be used inside other
applications, and that would kill the application.
The change turns SparkSubmit into a class, and abstracts aways some of
the functionality used to print error messages and abort the submission
process. The default implementation uses the logging system for messages,
and throws exceptions for errors. As part of that I also moved some code
that doesn't really belong in SparkSubmit to a better location.
The command line invocation of spark-submit now uses a special implementation
of the SparkSubmit class that overrides those behaviors to do what is expected
from the command line version (print to the terminal, exit the JVM, etc).
A lot of the changes are to replace calls to methods such as "printErrorAndExit"
with the new API.
As part of adding tests for this, I had to fix some small things in the
launcher option parser so that things like "--version" can work when
used in the launcher library.
There is still code that prints directly to the terminal, like all the
Ivy-related code in SparkSubmitUtils, and other areas where some re-factoring
would help, like the CommandLineUtils class, but I chose to leave those
alone to keep this change more focused.
Aside from existing and added unit tests, I ran command line tools with
a bunch of different arguments to make sure messages and errors behave
like before.