[SPARK-53084][CORE] Supplement default GC options in SparkContext initialization by wangyum · Pull Request #51796 · apache/spark

wangyum · 2025-08-03T05:35:41Z

What changes were proposed in this pull request?

This PR introduces a method supplementJavaGCOptions in SparkContext to ensure that a default garbage collection (GC) algorithm (UseParallelGC) is supplemented when no GC algorithm is explicitly specified in the JVM options for the driver or executor.

This method checks DRIVER_JAVA_OPTIONS and EXECUTOR_JAVA_OPTIONS in the Spark configuration. If neither contains an explicit GC algorithm flag (excluding certain policy flags like -XX:UseAdaptiveSizePolicyWithSystemGC), -XX:+UseParallelGC is prepended to the options.

Why are the changes needed?

Since Java 9+, the default GC algorithm has changed from UseParallelGC to UseG1GC(more details: https://blog.gceasy.io/what-is-javas-default-gc-algorithm). This change cause hundreds of jobs to fail due to OOM errors in our cluster.
Smooth upgrade from Spark 3.5 to 4.0
Smooth upgrade from Java 8 to 17.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit tests to verify that UseParallelGC is supplemented when no GC algorithm is specified.
Verified that existing GC options are not overridden if already present.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2025-08-03T05:44:45Z

+      "UseMaximumCompactionOnSystemGC")
+
+    val gcAlgorithms = try {
+      val output = javaFlagCommand.!!


Oh, is this a way to detect JDK_JAVA_OPTIONS too, @wangyum ? I was curious how we can handle the user environment.

To test all supported GC Algorithm in current Java.

The new implementation checks if a GC algorithm is specified in the configuration: 38eacf0

dongjoon-hyun

I understand why this is proposed. However, IMO, this looks like a breaking change between 4.0 to 4.1 because Apache Spark 4.0.0 is already out with the default GC which is G1GC. WDYT about the exiting Apache Spark 4 users?

pan3793 · 2025-08-04T02:53:13Z

We also see increased OOM errors after upgrading from Java 8 to 17 (we use the default GC algorithm so it also means changing from ParallelGC to G1), according to https://tech.clevertap.com/demystifying-g1-gc-gclocker-jni-critical-and-fake-oom-exceptions/, 1) increase GCLockerRetryAllocationCount by setting -XX:+UnlockDiagnosticVMOptions -XX:GCLockerRetryAllocationCount=100 helps mitigate some OOM error, and 2) upgrade to Java 22+ which contains https://openjdk.org/jeps/423 should almost solve the issue.

I checked 1), it does help, but some jobs run well on Java 8 still hit OOM errors on Java 17. I haven't had a chance to run Spark with Java 22 to verify 2).

I'm not sure if Spark should disable G1 by default, because I think G1 is a future-proofing GC algorithm. BTW, Trino used to add -XX:GCLockerRetryAllocationCount=32 to its default jvm.config until requires Java 22+ trinodb/trino#19026

dongjoon-hyun · 2025-08-04T04:18:16Z

Thank you for sharing the details, @pan3793 .

HyukjinKwon · 2025-08-04T23:19:36Z

+      "UseMaximumCompactionOnSystemGC")
+
+    val gcAlgorithms = try {
+      val output = javaFlagCommand.!!


This will create a subprocess .. can we avoid this way?

There is no official Java API to list all supported GC algorithms at runtime.
Use Java ManagementFactory and reflection to infer which GC algorithms are available also seem not a good way:

import java.lang.management.ManagementFactory import java.util.Locale import scala.jdk.CollectionConverters._ def getSupportedGCAlgorithms(): Set[String] = { val knownGCs = Set( "-XX:+UseSerialGC", "-XX:+UseParallelGC", "-XX:+UseG1GC", "-XX:+UseZGC", "-XX:+UseShenandoahGC", "-XX:+UseConcMarkSweepGC" ) val gcBeans = ManagementFactory.getGarbageCollectorMXBeans.asScala val activeGCs = gcBeans.map(_.getName.toLowerCase(Locale.ROOT)) def isClassAvailable(className: String): Boolean = { try { Class.forName(className) true } catch { case _: ClassNotFoundException => false } } val detected = scala.collection.mutable.Set[String]() if (isClassAvailable("com.sun.management.GarbageCollectionNotificationInfo") || activeGCs.exists(_.contains("g1"))) { detected += "-XX:+UseG1GC" } if (isClassAvailable("sun.management.ZGarbageCollectorMXBean") || activeGCs.exists(_.contains("z"))) { detected += "-XX:+UseZGC" } if (isClassAvailable("sun.management.shenandoah.ShenandoahHeuristics") || activeGCs.exists(_.contains("shenandoah"))) { detected += "-XX:+UseShenandoahGC" } detected += "-XX:+UseSerialGC" detected += "-XX:+UseParallelGC" if (detected.isEmpty) knownGCs else detected.toSet

HyukjinKwon · 2025-08-04T23:20:41Z

  }
+
+  private def supplementJavaGCOptions(conf: SparkConf): Unit = {
+    val javaFlagCommand = "java -XX:+PrintFlagsFinal -version"


I think this actually does not guarantee the same way how the current JVM is created .. Different java exec could be used.

The new implementation checks if a GC algorithm is specified in the configuration: 38eacf0

cloud-fan · 2025-08-06T05:14:30Z

This might be a breaking change for people who are already using Spark 4.0 with G1GC. BTW, driver and executor have very different memory usage patterns, shall we have different default GC algorithm for them?

wangyum · 2025-08-06T08:26:38Z

shall we have different default GC algorithm for them?

It seems so. Prior to Java 25, the default executor GC algorithm seems should be UseParallelGC, unless we have specifically tuned it for our business scenario.
In fact, to achieve better performance, other JVM options for the executor may also need to be adjusted in addition to the GC algorithm. For example: #48202

pan3793 · 2025-08-06T08:31:18Z

instead of changing GC options silently, how about writing a GC tuning guide in docs?

mridulm · 2025-08-07T07:08:35Z

I agree with @dongjoon-hyun and @cloud-fan , this is a breaking change.
It is better for users to specify GC options explicitly (or via the deployment defaults).

Also note that the change for DRIVER_JAVA_OPTIONS does not work anyway.

Even if we are going down this approach - doing it in submission is where the change should be targeted, not SparkContext.

cloud-fan · 2025-08-07T08:53:58Z

@wangyum shall we just add a GC tunning guide as @pan3793 suggested?

wangyum · 2025-08-08T05:57:56Z

Added the doc to tuning guide: #51920

wangyum added 2 commits August 3, 2025 12:35

fix

adcb849

fix

7592987

github-actions Bot added the CORE label Aug 3, 2025

wangyum changed the title ~~[SPARK-53084][CORE] Supplement Default GC Options in SparkContext Initialization~~ [SPARK-53084][CORE] Supplement default GC options in SparkContext initialization Aug 3, 2025

dongjoon-hyun reviewed Aug 3, 2025

View reviewed changes

HyukjinKwon reviewed Aug 4, 2025

View reviewed changes

wangyum added 2 commits August 5, 2025 11:45

fix

38eacf0

fix

bf50815

wangyum force-pushed the SPARK-53084 branch from b9a0556 to bf50815 Compare August 5, 2025 08:08

fix

44ccf4d

wangyum closed this Aug 8, 2025

wangyum deleted the SPARK-53084 branch February 13, 2026 23:15

pan3793 mentioned this pull request May 8, 2026

[SPARK-53327][SQL] Workaround datasketches-memory Java 25 support #54477

Closed

Conversation

wangyum commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

wangyum Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

wangyum Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Aug 4, 2025

Uh oh!

HyukjinKwon Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

wangyum Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

wangyum Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Aug 6, 2025

Uh oh!

wangyum commented Aug 6, 2025

Uh oh!

pan3793 commented Aug 6, 2025

Uh oh!

mridulm commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Aug 7, 2025

Uh oh!

wangyum commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wangyum commented Aug 3, 2025 •

edited

Loading

pan3793 commented Aug 4, 2025 •

edited

Loading

mridulm commented Aug 7, 2025 •

edited

Loading