[SPARK-56238][K8S] Fix app ID propagation in KubernetesClusterSchedulerBackend for client mode submission#55355
[SPARK-56238][K8S] Fix app ID propagation in KubernetesClusterSchedulerBackend for client mode submission#55355xiaoxuandev wants to merge 2 commits into
client mode submission#55355Conversation
…erBackend
### What changes were proposed in this pull request?
Cache the application ID at construction time in `KubernetesClusterSchedulerBackend` so that `applicationId()` returns a stable value across calls.
Previously, `applicationId()` fell back to `KubernetesConf.getKubernetesAppId()` when `spark.app.id` was not yet set, which generates a new random UUID on every call. In client mode, `SparkContext` sets `spark.app.id` only after `start()` returns, so during `start()` the multiple calls to `applicationId()` (for `podAllocator.start()`, `watchEvents.start()`, `pollEvents.start()`, and `setUpExecutorConfigMap()`) each received a different ID. This caused subsystems to use inconsistent app IDs for pod labeling and filtering.
This only affects client mode. In cluster mode, the submission client generates the app ID upfront and writes it into `spark.app.id` via `BasicDriverFeatureStep` before the driver pod starts, so `conf.getOption("spark.app.id")` always returns a value and the `getOrElse` branch is never reached.
The fix adds a `private val appId` that resolves the ID once at construction time and returns it consistently, matching the pattern used by `SchedulerBackend`, `LocalSchedulerBackend`, and other backends.
### Why are the changes needed?
Without this fix, the Kubernetes scheduler backend could propagate different app IDs to different subsystems during `start()` in client mode, leading to:
- Pod allocator, watch events, and poll events using different app IDs
- `stop()` unable to clean up resources created by `start()` (services, PVCs, config maps, executor pods) because the label selector uses a different ID
### Does this PR introduce _any_ user-facing change?
No direct user-facing change. This fixes an internal consistency issue that could cause resource leaks in Kubernetes client-mode deployments.
### How was this patch tested?
- Unit tests in `KubernetesClusterSchedulerBackendSuite` verifying `applicationId()` stability both when `spark.app.id` is set and when it is not set.
- Existing tests for `DeploymentAllocatorSuite`, `ExecutorPodsLifecycleManagerSuite`, and `StatefulSetAllocatorSuite` continue to pass.
### Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored with Kiro.
|
|
||
| test("SPARK-56238: applicationId() returns consistent value when spark.app.id is set") { | ||
| val id1 = schedulerBackendUnderTest.applicationId() | ||
| val id2 = schedulerBackendUnderTest.applicationId() |
There was a problem hiding this comment.
This test case doesn't make sense to me in this PR's context. Please remove this test case because this passes without your PR, @xiaoxuandev .
There was a problem hiding this comment.
Removed, thanks!
| assert(id1 === TEST_SPARK_APP_ID) | ||
| } | ||
|
|
||
| test("SPARK-56238: applicationId() is stable across calls when spark.app.id is not set") { |
There was a problem hiding this comment.
This test case seems to reproduce the reported scenario.
| when(localRpcEnv.setupEndpoint(any(), any())).thenReturn(driverEndpointRef) | ||
| val localTaskScheduler = mock(classOf[TaskSchedulerImpl]) | ||
| when(localTaskScheduler.sc).thenReturn(localSc) | ||
| val backendWithoutAppId = new KubernetesClusterSchedulerBackend( |
There was a problem hiding this comment.
Do you happen to know when this situation happens in the production environment, @xiaoxuandev ? I'm wondering if this is a valid case in the Apache Spark usage.
This only affects client mode.
One more question. Do you know if this is a regression or not? (as Enrico claims)
There was a problem hiding this comment.
Yes, this is a regression introduced by #54269. The original code cached the generated ID in private val appId, which was correct.
Affected versions: v4.2.0-preview3+. All 4.0.x and 4.1.x releases are clean.
Regarding production usage: this affects Kubernetes client mode (--deploy-mode client), where the driver runs outside the K8s cluster. In that path, spark.app.id is not pre-set before backend.start(), so each call to applicationId() during start() would generate a different UUID.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @xiaoxuandev .
client mode submission
|
Thanks for fixing this! |
What changes were proposed in this pull request?
Cache the application ID at construction time in
KubernetesClusterSchedulerBackendso thatapplicationId()returns a stable value across calls.Previously,
applicationId()fell back toKubernetesConf.getKubernetesAppId()whenspark.app.idwas not yet set, which generates a new random UUID on every call. In client mode,SparkContextsetsspark.app.idonly afterstart()returns, so duringstart()the multiple calls toapplicationId()(forpodAllocator.start(),watchEvents.start(),pollEvents.start(), andsetUpExecutorConfigMap()) each received a different ID. This caused subsystems to use inconsistent app IDs for pod labeling and filtering.This only affects client mode. In cluster mode, the submission client generates the app ID upfront and writes it into
spark.app.idviaBasicDriverFeatureStepbefore the driver pod starts, soconf.getOption("spark.app.id")always returns a value and thegetOrElsebranch is never reached.The fix adds a
private val appIdthat resolves the ID once at construction time and returns it consistently, matching the pattern used bySchedulerBackend,LocalSchedulerBackend, and other backends.Why are the changes needed?
Without this fix, the Kubernetes scheduler backend could propagate different app IDs to different subsystems during
start(), leading to:stop()unable to clean up resources created bystart()(services, PVCs, config maps, executor pods) because the label selector uses a different IDDoes this PR introduce any user-facing change?
No direct user-facing change. This fixes an internal consistency issue that could cause resource leaks in Kubernetes deployments.
How was this patch tested?
KubernetesClusterSchedulerBackendSuiteverifyingapplicationId()stability both whenspark.app.idis set and when it is not set.DeploymentAllocatorSuite,ExecutorPodsLifecycleManagerSuite, andStatefulSetAllocatorSuitecontinue to pass.Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored with Kiro.