Open source Canon: Burst-aware Memory Allocation Algorithm for Spark@K8S #53190

YaoRazor · 2025-11-24T02:09:57Z

What changes were proposed in this pull request?

Intro

This PR represents Pinterest's work to boost Spark cluster efficiency.
A novel burst-aware memory allocation algorithm, Canon, that partitions part of the cluster memory into fixed and burst segments is proposed in this PR. This approach allows the burst segments to be shared among different pods, improving overall memory utilization.

this PR implements Canon: burst aware memory allocation algorithm for memoryOverhead in Spark. The basic idea is that, given the usage of memoryOverhead is pretty bursty, we can separate memoryOverhead into two parts, fixed part (F) and shard part (S). by using K8S request/limit concept, executor pod memory equals to heap size (H) + F and limit is H + F + S

to calculate F and S, we introduced spark.executor.memoryOverheadBurstyFactor (f) as the control factor, assuming users specified spark.executor.memoryOverhead as O

then

F = O - min{(H + O) * (f - 1), O}

users can use spark.executor.memoryOverheadBursty.enabled to control whether enabling this functionality and use spark.executor.memoryOverheadBurstyFactor to control how aggressive we want to share part of memoryOverhead among different pods.

The effectiveness of this algorithm has been validated through production tests at Pinterest.

Acknowledgement

This code in this PR is mainly implemented by Nan Zhu(@CodingCat) while he was working at Pinterest. The algorithm itself is
based on https://www.vldb.org/pvldb/vol17/p3759-shi.pdf

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT
Production tests at Pinterest

Was this patch authored or co-authored using generative AI tooling?

No

CodingCat · 2025-11-24T02:48:42Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

+  private[spark] val EXECUTOR_BURSTY_MEMORY_OVERHEAD_ENABLED =
+    ConfigBuilder("spark.executor.memoryOverheadBursty.enabled")
+      .doc("Whether to enable memory overhead bursty")
+      .version("3.2.0")


let's change these versions to 4.2.0?

CodingCat · 2025-11-24T02:51:39Z

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala

+        newSparkProperties = Map(EXECUTOR_BURSTY_MEMORY_OVERHEAD.key ->
+          newMemoryOverheadMiB.toString))
+      logInfo(s"newAppEnvironment spark properties count:" +
+        s" ${newAppEnvironment.sparkProperties.size}")


this logging can be removed

CodingCat · 2025-11-24T02:51:47Z

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala

+      val klass = classOf[ApplicationEnvironmentInfoWrapper]
+      val currentAppEnvironment = sparkContext._statusStore.store.read(klass, klass.getName()).info
+      logInfo(s"currentAppEnvironment spark properties count:" +
+        s" ${currentAppEnvironment.sparkProperties.size}")


this logging can be removed

CodingCat · 2025-11-24T02:52:27Z

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala

+      sparkContext._statusStore.store.write(new ApplicationEnvironmentInfoWrapper(
+        newAppEnvironment))
+      // we have to post full information here, but need ensure that the downstream pipeline can
+      // consume duplicate entries properly


this comment can also be removed

CodingCat · 2025-11-24T02:52:41Z

core/src/main/scala/org/apache/spark/resource/ResourceProfile.scala

+          }
+        }
+      }
+      logInfo(s"posted memoryoverhead update event")


this can be removed

CodingCat · 2025-11-24T02:57:05Z

...core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala

+    baseConf.remove(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)
+    baseConf.set("spark.app.name", "xyz.abc _i_am_a_app_name_w/_some_abbrs")
+    val basePod = SparkPod.initialPod()
+    // scalastyle:off


this can be removed

CodingCat · 2025-11-24T02:57:26Z

...core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala

+    baseConf.remove(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)
+    baseConf.set("spark.app.name", "xyz.abc _i_am_a_app_name_w/_some_abbrs")
+    val basePod = SparkPod.initialPod()
+    // scalastyle:off


this can be removed

CodingCat · 2025-11-24T02:57:34Z

...core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala

+    baseConf.remove(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)
+    baseConf.set("spark.app.name", "xyz.abc _i_am_a_app_name_w/_some_abbrs")
+    val basePod = SparkPod.initialPod()
+    // scalastyle:off


this can be removed

CodingCat · 2025-11-24T02:57:41Z

...core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala

+    baseConf.remove(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)
+    baseConf.set("spark.app.name", "xyz.abc _i_am_a_app_name_w/_some_abbrs")
+    val basePod = SparkPod.initialPod()
+    // scalastyle:off


this can be removed

CodingCat · 2025-11-24T20:32:53Z

thank you @YaoRazor for open sourcing it, we have deployed Canon to 1000s of machines in PINS and hopefully it will benefit broad community as well

and , most importantly, really appreciate the innovation from the bytedance team ... this algorithm is implemented based on their paper of https://www.vldb.org/pvldb/vol17/p3759-shi.pdf

@YaoRazor would you mind marking this PR as ready to review?

Hi, @sunchao , as we have discussed offline, would you mind giving it a review?

holdenk · 2025-11-24T21:59:19Z

Oh this is interesting :)

holdenk · 2025-11-24T23:14:39Z

So this probably requires an SPIP

Open source Canon: Burst-aware Memory Allocation Algorithm for Spark@K8S

1b4385a

github-actions bot added KUBERNETES CORE labels Nov 24, 2025

CodingCat reviewed Nov 24, 2025

View reviewed changes

CodingCat added 6 commits November 23, 2025 21:27

update

3ace5d2

more revision

5a8831a

fix build

955901f

fix build

983f92e

fix build

7e6220b

fix test

65cf046

YaoRazor marked this pull request as ready for review November 24, 2025 21:53

Open source Canon: Burst-aware Memory Allocation Algorithm for Spark@K8S #53190

Are you sure you want to change the base?

Open source Canon: Burst-aware Memory Allocation Algorithm for Spark@K8S #53190

Uh oh!

Conversation

YaoRazor commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Intro

Acknowledgement

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CodingCat commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holdenk commented Nov 24, 2025

Uh oh!

holdenk commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YaoRazor commented Nov 24, 2025 •

edited

Loading

CodingCat commented Nov 24, 2025 •

edited

Loading