Skip to content

Conversation

@dfercode
Copy link

Currently spark driver and exec pod request and limit memory can only be same, not able to set separately when cluster get diff memory limit and request quota. then not able to fully use cluster memory resource.

E.g. cluster get total 50G memory request, 200G memory limit, but every spark pod limit memory are same as request memory, then in this cluster total memory spark can use is depends on the smaller one: 50G

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

@dfercode dfercode force-pushed the feature/k8s_mem_req branch 5 times, most recently from b9ce5be to 81f78b1 Compare April 13, 2023 13:03
@dfercode dfercode force-pushed the feature/k8s_mem_req branch from 81f78b1 to 9cacc19 Compare April 13, 2023 13:55
@dfercode
Copy link
Author

cc @dongjoon-hyun @yaooqinn @Yikun, can anyone help review?

@yaooqinn
Copy link
Member

The cpu limits are set by spark.kubernetes.{driver,executor}.limit.cores. The cpu is set by spark.{driver,executor}.cores. The memory request and limit are set by summing the values of spark.{driver,executor}.memory and spark.{driver,executor}.memoryOverhead. Other resource limits are set by spark.{driver,executor}.resources.{resourceName}.* configs.

Referring to the doc, we can actually set driver pod memory alone

@dfercode
Copy link
Author

dfercode commented Apr 14, 2023

The cpu limits are set by spark.kubernetes.{driver,executor}.limit.cores. The cpu is set by spark.{driver,executor}.cores. The memory request and limit are set by summing the values of spark.{driver,executor}.memory and spark.{driver,executor}.memoryOverhead. Other resource limits are set by spark.{driver,executor}.resources.{resourceName}.* configs.

Referring to the doc, we can actually set driver pod memory alone

It`s about container memory request&limit, not about the driver or exec.
Current code is setting the pod request.memory and limit.memory in the same value from by summing the values of spark.{driver,executor}.memory and spark.{driver,executor}.memoryOverhead.
But request.memory and limit.memory are different kind of params from k8s, always keep same value may not a good practice. In most case the request memory quota we can get is always smaller than limit.memory from infrastructure team. If spark pods request.memory can only same as limit.memory, then total memory we can use is based on the smaller one, thus can not fully use the memory resource.
requests 定义了对应的容器所需要的最小资源量。
limits 定义了对应容器最大可以消耗的资源上限。

@zwangsheng
Copy link
Contributor

There was a discussion about this on the spark dev mailing list earlier, hope it helps you.
spark executor pod has same memory value for request and limit

@dfercode
Copy link
Author

@zwangsheng do we have any timeline to add this feature?

@zwangsheng
Copy link
Contributor

spark executor pod has same memory value for request and limit

You can found this in the mail discussion:

There is a very good reason for this. It is recommended using k8s that you set
memory request and limit to the same value, set a cpu request, but not a cpu
limit. More info here https://home.robusta.dev/blog/kubernetes-memory-limit

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jul 28, 2023
@github-actions github-actions bot closed this Jul 29, 2023
@bnetzi
Copy link

bnetzi commented Jan 28, 2025

I know this PR is closed for a while but I think the decision not to merge it was not correct.
As mentioned in this issue:
https://issues.apache.org/jira/browse/SPARK-35723
And in
https://issues.apache.org/jira/browse/SPARK-37358

Although the default of using limits same as request make some sense, in some cases it is much wiser to have a larger limit.

Let's say many executors / drivers are sharing the same instance - if the limit and the request are the same - you must allocate to each pod the max memory it would use, causing a huge over allocation, although it is very reasonable they would not peak at the same time - depends on the workload.
It is very typical to have a momentarily high memory usage, every user should have the power to decide if it makes sense for him to have a large limit. it has its risks, but in many use cases the cost saving might worth it.

For now in our env we are actually using (for almost a year) webhooks to override this behavior, I think it should be allowed natively.

@dhia-gharsallaoui
Copy link

Hi @dfercode and @bnetzi,

I've been following this issue as we've encountered the same limitation in our Kubernetes environment. Deploying hundreds of Spark applications becomes challenging when memory requests/limits are strictly coupled, significantly reducing our cluster's elasticity and forcing us to operate at the lower request-based capacity ceiling.

@bnetzi makes a critical point - while equal requests/limits might be safe defaults, real-world workloads often benefit from strategic overcommitment where transient spikes can be tolerated. This capability is particularly valuable for cost optimization in large-scale deployments.

+1 for reopening/reconsidering this PR (SPARK-35723). In the meantime, @bnetzi, sharing your webhook-based approach would be invaluable to many of us working around this limitation. Could you elaborate on your implementation or share code examples?

@bnetzi
Copy link

bnetzi commented Jan 29, 2025

Hi, @dhia-gharsallaoui our approach is preformed by using spark operator webhook, we've recently opened a PR for the new version, available here: kubeflow/spark-operator#2383

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants