-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6324] [core] Centralize handling of script usage messages. #5841
Conversation
Reorganize code so that the launcher library handles most of the work of printing usage messages, instead of having an awkward protocol between the library and the scripts for that. This mostly applies to SparkSubmit, since the launcher lib does not do command line parsing for classes invoked in other ways, and thus cannot handle failures for those. Most scripts end up going through SparkSubmit, though, so it all works. The change adds a new, internal command line switch, "--usage-error", which prints the usage message and exits with a non-zero status. Scripts can override the command printed in the usage message by setting an environment variable - this avoids having to grep the output of SparkSubmit to remove references to the "spark-submit" script. The only sub-optimal part of the change is the special handling for the spark-sql usage, which is now done in SparkSubmitArguments.
Merged build triggered. |
Merged build started. |
Test build #31582 has started for PR 5841 at commit |
Test build #31582 has finished for PR 5841 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build triggered. |
Merged build started. |
Test build #33996 has started for PR 5841 at commit |
LGTM after a read through the changes. I understand the intent and it makes sense. I haven't tested it myself. |
Test build #33996 timed out for PR 5841 at commit |
Merged build finished. Test FAILed. |
Jenkins, retest this please |
Merged build triggered. |
Merged build started. |
Test build #34019 has started for PR 5841 at commit |
Test build #34019 has finished for PR 5841 at commit
|
Merged build finished. Test PASSed. |
Reorganize code so that the launcher library handles most of the work of printing usage messages, instead of having an awkward protocol between the library and the scripts for that. This mostly applies to SparkSubmit, since the launcher lib does not do command line parsing for classes invoked in other ways, and thus cannot handle failures for those. Most scripts end up going through SparkSubmit, though, so it all works. The change adds a new, internal command line switch, "--usage-error", which prints the usage message and exits with a non-zero status. Scripts can override the command printed in the usage message by setting an environment variable - this avoids having to grep the output of SparkSubmit to remove references to the "spark-submit" script. The only sub-optimal part of the change is the special handling for the spark-sql usage, which is now done in SparkSubmitArguments. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#5841 from vanzin/SPARK-6324 and squashes the following commits: 2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324 bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help. c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages. 6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
Reorganize code so that the launcher library handles most of the work of printing usage messages, instead of having an awkward protocol between the library and the scripts for that. This mostly applies to SparkSubmit, since the launcher lib does not do command line parsing for classes invoked in other ways, and thus cannot handle failures for those. Most scripts end up going through SparkSubmit, though, so it all works. The change adds a new, internal command line switch, "--usage-error", which prints the usage message and exits with a non-zero status. Scripts can override the command printed in the usage message by setting an environment variable - this avoids having to grep the output of SparkSubmit to remove references to the "spark-submit" script. The only sub-optimal part of the change is the special handling for the spark-sql usage, which is now done in SparkSubmitArguments. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#5841 from vanzin/SPARK-6324 and squashes the following commits: 2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324 bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help. c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages. 6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
### What changes were proposed in this pull request? This PR aims to remove the deprecate Java `System.setSecurityManager` usage for Apache Spark 4.0. Note that this is the only usage in Apache Spark AS-IS code. ``` $ git grep setSecurityManager core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(sm) core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(currentSm) ``` This usage was added at `Apache Spark 1.5.0`. - #5841 Since `Apache Spark 2.4.0`, we don't need `setSecurityManager` due to the following improvement. - #20925 ### Why are the changes needed? ``` $ java -version openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+32-2482) OpenJDK 64-Bit Server VM (build 21-ea+32-2482, mixed mode, sharing) ``` ``` max spark-3.5.0-bin-hadoop3:$ bin/spark-sql --help ... CLI options: Exception in thread "main" java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release at java.base/java.lang.System.setSecurityManager(System.java:429) at org.apache.spark.deploy.SparkSubmitArguments.getSqlShellOptions(SparkSubmitArguments.scala:623) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ``` $ build/sbt test:package -Phive -Phive-thriftserver $ bin/spark-sql --help ... CLI options: -d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42901 from dongjoon-hyun/SPARK-45147. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This PR aims to remove the deprecate Java `System.setSecurityManager` usage for Apache Spark 4.0. Note that this is the only usage in Apache Spark AS-IS code. ``` $ git grep setSecurityManager core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(sm) core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala: System.setSecurityManager(currentSm) ``` This usage was added at `Apache Spark 1.5.0`. - apache#5841 Since `Apache Spark 2.4.0`, we don't need `setSecurityManager` due to the following improvement. - apache#20925 ### Why are the changes needed? ``` $ java -version openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+32-2482) OpenJDK 64-Bit Server VM (build 21-ea+32-2482, mixed mode, sharing) ``` ``` max spark-3.5.0-bin-hadoop3:$ bin/spark-sql --help ... CLI options: Exception in thread "main" java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release at java.base/java.lang.System.setSecurityManager(System.java:429) at org.apache.spark.deploy.SparkSubmitArguments.getSqlShellOptions(SparkSubmitArguments.scala:623) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. ``` $ build/sbt test:package -Phive -Phive-thriftserver $ bin/spark-sql --help ... CLI options: -d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#42901 from dongjoon-hyun/SPARK-45147. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Reorganize code so that the launcher library handles most of the work
of printing usage messages, instead of having an awkward protocol between
the library and the scripts for that.
This mostly applies to SparkSubmit, since the launcher lib does not do
command line parsing for classes invoked in other ways, and thus cannot
handle failures for those. Most scripts end up going through SparkSubmit,
though, so it all works.
The change adds a new, internal command line switch, "--usage-error",
which prints the usage message and exits with a non-zero status. Scripts
can override the command printed in the usage message by setting an
environment variable - this avoids having to grep the output of
SparkSubmit to remove references to the "spark-submit" script.
The only sub-optimal part of the change is the special handling for the
spark-sql usage, which is now done in SparkSubmitArguments.