Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 5, 2025

What changes were proposed in this pull request?

This PR aims to support spark.kubernetes.executor.deletedExecutorsCacheTimeout.

Why are the changes needed?

To allow users to control the TTL for the deleted executors cache.

Previously, it has a hard-coded value. In some very slow clusters, we need to remember longer than 3 minutes.

private lazy val removedExecutorsCache =
CacheBuilder.newBuilder()
.expireAfterWrite(3, TimeUnit.MINUTES)
.build[java.lang.Long, java.lang.Long]()

Does this PR introduce any user-facing change?

No behavior change because the default value is the same.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

.createWithDefault(true)

val KUBERNETES_DELETED_EXECUTORS_CACHE_TTL_SECONDS =
ConfigBuilder("spark.kubernetes.executor.deletedExecutorsCacheTTLSeconds")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also accepts human-readable strings, like 10s, 3min, can we eliminate Seconds in the config name then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. You're right. I blindly searched TTL keyword and followed a bad practice. Let me revise this.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala

@dongjoon-hyun
Copy link
Member Author

Thank you, @pan3793 .

Also, cc @peter-toth .

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54184][K8S] Support spark.kubernetes.executor.deletedExecutorsCacheTTLSeconds [SPARK-54184][K8S] Support spark.kubernetes.executor.deletedExecutorsCacheTimeout Nov 5, 2025
@dongjoon-hyun
Copy link
Member Author

Thank you, @yaooqinn and @pan3793 .

Merged to master/4.1.

dongjoon-hyun added a commit that referenced this pull request Nov 5, 2025
…sCacheTimeout`

### What changes were proposed in this pull request?

This PR aims to support `spark.kubernetes.executor.deletedExecutorsCacheTimeout`.

### Why are the changes needed?

To allow users to control the TTL for the deleted executors cache.

Previously, it has a hard-coded value. In some very slow clusters, we need to remember longer than 3 minutes.

https://github.com/apache/spark/blob/a8e35c407bc5340f83b35e5a2f0b0767c6baadb0/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L54-L57

### Does this PR introduce _any_ user-facing change?

No behavior change because the default value is the same.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52884 from dongjoon-hyun/SPARK-54184.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit ada1908)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-54184 branch November 5, 2025 05:57
Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late LGTM.

@dongjoon-hyun
Copy link
Member Author

Thank you, @peter-toth .

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…sCacheTimeout`

### What changes were proposed in this pull request?

This PR aims to support `spark.kubernetes.executor.deletedExecutorsCacheTimeout`.

### Why are the changes needed?

To allow users to control the TTL for the deleted executors cache.

Previously, it has a hard-coded value. In some very slow clusters, we need to remember longer than 3 minutes.

https://github.com/apache/spark/blob/a8e35c407bc5340f83b35e5a2f0b0767c6baadb0/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala#L54-L57

### Does this PR introduce _any_ user-facing change?

No behavior change because the default value is the same.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52884 from dongjoon-hyun/SPARK-54184.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants