[SPARK-53084][CORE] Supplement default GC options in SparkContext initialization#51796
[SPARK-53084][CORE] Supplement default GC options in SparkContext initialization#51796wangyum wants to merge 5 commits into
Conversation
| "UseMaximumCompactionOnSystemGC") | ||
|
|
||
| val gcAlgorithms = try { | ||
| val output = javaFlagCommand.!! |
There was a problem hiding this comment.
Oh, is this a way to detect JDK_JAVA_OPTIONS too, @wangyum ? I was curious how we can handle the user environment.
There was a problem hiding this comment.
To test all supported GC Algorithm in current Java.
There was a problem hiding this comment.
The new implementation checks if a GC algorithm is specified in the configuration: 38eacf0
dongjoon-hyun
left a comment
There was a problem hiding this comment.
I understand why this is proposed. However, IMO, this looks like a breaking change between 4.0 to 4.1 because Apache Spark 4.0.0 is already out with the default GC which is G1GC. WDYT about the exiting Apache Spark 4 users?
|
We also see increased OOM errors after upgrading from Java 8 to 17 (we use the default GC algorithm so it also means changing from ParallelGC to G1), according to https://tech.clevertap.com/demystifying-g1-gc-gclocker-jni-critical-and-fake-oom-exceptions/, 1) increase GCLockerRetryAllocationCount by setting I checked 1), it does help, but some jobs run well on Java 8 still hit OOM errors on Java 17. I haven't had a chance to run Spark with Java 22 to verify 2). I'm not sure if Spark should disable G1 by default, because I think G1 is a future-proofing GC algorithm. BTW, Trino used to add |
|
Thank you for sharing the details, @pan3793 . |
| "UseMaximumCompactionOnSystemGC") | ||
|
|
||
| val gcAlgorithms = try { | ||
| val output = javaFlagCommand.!! |
There was a problem hiding this comment.
This will create a subprocess .. can we avoid this way?
There was a problem hiding this comment.
There is no official Java API to list all supported GC algorithms at runtime.
Use Java ManagementFactory and reflection to infer which GC algorithms are available also seem not a good way:
import java.lang.management.ManagementFactory
import java.util.Locale
import scala.jdk.CollectionConverters._
def getSupportedGCAlgorithms(): Set[String] = {
val knownGCs = Set(
"-XX:+UseSerialGC",
"-XX:+UseParallelGC",
"-XX:+UseG1GC",
"-XX:+UseZGC",
"-XX:+UseShenandoahGC",
"-XX:+UseConcMarkSweepGC"
)
val gcBeans = ManagementFactory.getGarbageCollectorMXBeans.asScala
val activeGCs = gcBeans.map(_.getName.toLowerCase(Locale.ROOT))
def isClassAvailable(className: String): Boolean = {
try {
Class.forName(className)
true
} catch {
case _: ClassNotFoundException => false
}
}
val detected = scala.collection.mutable.Set[String]()
if (isClassAvailable("com.sun.management.GarbageCollectionNotificationInfo") ||
activeGCs.exists(_.contains("g1"))) {
detected += "-XX:+UseG1GC"
}
if (isClassAvailable("sun.management.ZGarbageCollectorMXBean") ||
activeGCs.exists(_.contains("z"))) {
detected += "-XX:+UseZGC"
}
if (isClassAvailable("sun.management.shenandoah.ShenandoahHeuristics") ||
activeGCs.exists(_.contains("shenandoah"))) {
detected += "-XX:+UseShenandoahGC"
}
detected += "-XX:+UseSerialGC"
detected += "-XX:+UseParallelGC"
if (detected.isEmpty) knownGCs else detected.toSet| } | ||
|
|
||
| private def supplementJavaGCOptions(conf: SparkConf): Unit = { | ||
| val javaFlagCommand = "java -XX:+PrintFlagsFinal -version" |
There was a problem hiding this comment.
I think this actually does not guarantee the same way how the current JVM is created .. Different java exec could be used.
There was a problem hiding this comment.
The new implementation checks if a GC algorithm is specified in the configuration: 38eacf0
|
This might be a breaking change for people who are already using Spark 4.0 with G1GC. BTW, driver and executor have very different memory usage patterns, shall we have different default GC algorithm for them? |
It seems so. Prior to Java 25, the default executor GC algorithm seems should be |
|
instead of changing GC options silently, how about writing a GC tuning guide in docs? |
|
I agree with @dongjoon-hyun and @cloud-fan , this is a breaking change. Also note that the change for Even if we are going down this approach - doing it in submission is where the change should be targeted, not SparkContext. |
|
Added the doc to tuning guide: #51920 |
What changes were proposed in this pull request?
This PR introduces a method
supplementJavaGCOptionsinSparkContextto ensure that a default garbage collection (GC) algorithm (UseParallelGC) is supplemented when no GC algorithm is explicitly specified in the JVM options for the driver or executor.This method checks
DRIVER_JAVA_OPTIONSandEXECUTOR_JAVA_OPTIONSin the Spark configuration. If neither contains an explicit GC algorithm flag (excluding certain policy flags like-XX:UseAdaptiveSizePolicyWithSystemGC),-XX:+UseParallelGCis prepended to the options.Why are the changes needed?
UseParallelGCtoUseG1GC(more details: https://blog.gceasy.io/what-is-javas-default-gc-algorithm). This change cause hundreds of jobs to fail due to OOM errors in our cluster.Does this PR introduce any user-facing change?
No.
How was this patch tested?
UseParallelGCis supplemented when no GC algorithm is specified.Was this patch authored or co-authored using generative AI tooling?
No.