Skip to content

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Jul 31, 2025

Backport #51725 to branch-3.5

What changes were proposed in this pull request?

Currently, JPMS args are only applied to SparkSubmit-based processes, for non-SparkSubmit processes, e.g. Spark History Server, BeeLine, JPMS args are not applied automatically.

For example, when trying run Spark History Server with Java 24, I see JEP 472 complains

$ spark-4.1.0-SNAPSHOT-bin-hadoop3 SPARK_NO_DAEMONIZE=1 sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/logs/spark-chengpan-org.apache.spark.deploy.history.HistoryServer-1-H27212-MAC-01.local.out
Spark Command: /Users/chengpan/.sdkman/candidates/java/24.0.2-tem/bin/java -cp /Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/conf/:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/jars/slf4j-api-2.0.17.jar:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/jars/* -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/07/30 21:28:39 INFO HistoryServer: Started daemon with process name: 54576H27212-MAC-01.local
25/07/30 21:28:39 INFO SignalUtils: Registering signal handler for TERM
25/07/30 21:28:39 INFO SignalUtils: Registering signal handler for HUP
25/07/30 21:28:39 INFO SignalUtils: Registering signal handler for INT
WARNING: A restricted method in java.lang.System has been called
WARNING: java.lang.System::loadLibrary has been called by org.apache.hadoop.util.NativeCodeLoader in an unnamed module (file:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/jars/hadoop-client-api-3.4.1.jar)
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for callers in this module
WARNING: Restricted methods will be blocked in a future release unless native access is enabled

25/07/30 21:28:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
...

Why are the changes needed?

I think the JPMS args defined at JavaModuleOptions.DEFAULT_MODULE_OPTIONS are intended to be applied to all Spark process, so I would classify this as a bug.

Does this PR introduce any user-facing change?

Seems no, as I haven't seen Spark deamon process failures caused by missing JPMS args in our real use cases.

How was this patch tested?

UT is added. Also verified Spark History Server start command:

$ spark-4.1.0-SNAPSHOT-bin-SPARK-53020 SPARK_NO_DAEMONIZE=1 sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-SPARK-53020/logs/spark-chengpan-org.apache.spark.deploy.history.HistoryServer-1-H27212-MAC-01.local.out
Spark Command: /Users/chengpan/.sdkman/candidates/java/24.0.2-tem/bin/java -cp /Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-SPARK-53020/conf/:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-SPARK-53020/jars/slf4j-api-2.0.17.jar:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-SPARK-53020/jars/* -Xmx1g -XX:+IgnoreUnrecognizedVMOptions --add-modules=jdk.incubator.vector --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false -Dio.netty.tryReflectionSetAccessible=true --enable-native-access=ALL-UNNAMED org.apache.spark.deploy.history.HistoryServer
========================================
WARNING: Using incubator modules: jdk.incubator.vector
WARNING: package sun.security.action not in java.base
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/07/30 22:06:38 INFO HistoryServer: Started daemon with process name: 60622H27212-MAC-01.local
25/07/30 22:06:38 INFO SignalUtils: Registering signal handler for TERM
25/07/30 22:06:38 INFO SignalUtils: Registering signal handler for HUP
25/07/30 22:06:38 INFO SignalUtils: Registering signal handler for INT
25/07/30 22:06:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
...

Was this patch authored or co-authored using generative AI tooling?

No.

…process

### What changes were proposed in this pull request?

Currently, JPMS args are only applied to `SparkSubmit`-based processes, for non-`SparkSubmit` processes, e.g. Spark History Server, BeeLine, JPMS args are not applied automatically.

For example, when trying run Spark History Server with Java 24, I see [JEP 472](https://openjdk.org/jeps/472) complains
```
$ spark-4.1.0-SNAPSHOT-bin-hadoop3 SPARK_NO_DAEMONIZE=1 sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/logs/spark-chengpan-org.apache.spark.deploy.history.HistoryServer-1-H27212-MAC-01.local.out
Spark Command: /Users/chengpan/.sdkman/candidates/java/24.0.2-tem/bin/java -cp /Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/conf/:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/jars/slf4j-api-2.0.17.jar:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/jars/* -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/07/30 21:28:39 INFO HistoryServer: Started daemon with process name: 54576H27212-MAC-01.local
25/07/30 21:28:39 INFO SignalUtils: Registering signal handler for TERM
25/07/30 21:28:39 INFO SignalUtils: Registering signal handler for HUP
25/07/30 21:28:39 INFO SignalUtils: Registering signal handler for INT
WARNING: A restricted method in java.lang.System has been called
WARNING: java.lang.System::loadLibrary has been called by org.apache.hadoop.util.NativeCodeLoader in an unnamed module (file:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-hadoop3/jars/hadoop-client-api-3.4.1.jar)
WARNING: Use --enable-native-access=ALL-UNNAMED to avoid a warning for callers in this module
WARNING: Restricted methods will be blocked in a future release unless native access is enabled

25/07/30 21:28:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
...
```

### Why are the changes needed?

I think the JPMS args defined at `JavaModuleOptions.DEFAULT_MODULE_OPTIONS` are intended to be applied to all Spark process, so I would classify this as a bug.

### Does this PR introduce _any_ user-facing change?

Seems no, as I haven't seen Spark deamon process failures caused by missing JPMS args in our real use cases.

### How was this patch tested?

UT is added. Also verified Spark History Server start command:

```
spark-4.1.0-SNAPSHOT-bin-SPARK-53020 SPARK_NO_DAEMONIZE=1 sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-SPARK-53020/logs/spark-chengpan-org.apache.spark.deploy.history.HistoryServer-1-H27212-MAC-01.local.out
Spark Command: /Users/chengpan/.sdkman/candidates/java/24.0.2-tem/bin/java -cp /Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-SPARK-53020/conf/:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-SPARK-53020/jars/slf4j-api-2.0.17.jar:/Users/chengpan/app/spark-4.1.0-SNAPSHOT-bin-SPARK-53020/jars/* -Xmx1g -XX:+IgnoreUnrecognizedVMOptions --add-modules=jdk.incubator.vector --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false -Dio.netty.tryReflectionSetAccessible=true --enable-native-access=ALL-UNNAMED org.apache.spark.deploy.history.HistoryServer
========================================
WARNING: Using incubator modules: jdk.incubator.vector
WARNING: package sun.security.action not in java.base
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/07/30 22:06:38 INFO HistoryServer: Started daemon with process name: 60622H27212-MAC-01.local
25/07/30 22:06:38 INFO SignalUtils: Registering signal handler for TERM
25/07/30 22:06:38 INFO SignalUtils: Registering signal handler for HUP
25/07/30 22:06:38 INFO SignalUtils: Registering signal handler for INT
25/07/30 22:06:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
...
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51725 from pan3793/SPARK-53020.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@pan3793 pan3793 changed the title [SPARK-53020][DEPLOY] JPMS args should also apply to non-SparkSubmit process [SPARK-53020][DEPLOY][3.5] JPMS args should also apply to non-SparkSubmit process Jul 31, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs)

@pan3793
Copy link
Member Author

pan3793 commented Jul 31, 2025

@dongjoon-hyun it seems that SPARK-45197 is not landed on branch-3.5, should I backport it too, or just abandon this one?

@dongjoon-hyun
Copy link
Member

Oh, in that case, SPARK-53020 is not applicable to branch-3.5. We can close this PR simply, @pan3793 .

@pan3793
Copy link
Member Author

pan3793 commented Jul 31, 2025

okay

@pan3793 pan3793 closed this Jul 31, 2025
@dongjoon-hyun
Copy link
Member

Thank you for verifying that and closing this, @pan3793 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants