-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35723] set k8s pod container request, limit memory separately. #40771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b9ce5be to
81f78b1
Compare
81f78b1 to
9cacc19
Compare
|
cc @dongjoon-hyun @yaooqinn @Yikun, can anyone help review? |
Referring to the doc, we can actually set driver pod memory alone |
It`s about container memory request&limit, not about the driver or exec. |
|
There was a discussion about this on the spark dev mailing list earlier, hope it helps you. |
|
@zwangsheng do we have any timeline to add this feature? |
You can found this in the mail discussion:
|
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
|
I know this PR is closed for a while but I think the decision not to merge it was not correct. Although the default of using limits same as request make some sense, in some cases it is much wiser to have a larger limit. Let's say many executors / drivers are sharing the same instance - if the limit and the request are the same - you must allocate to each pod the max memory it would use, causing a huge over allocation, although it is very reasonable they would not peak at the same time - depends on the workload. For now in our env we are actually using (for almost a year) webhooks to override this behavior, I think it should be allowed natively. |
|
I've been following this issue as we've encountered the same limitation in our Kubernetes environment. Deploying hundreds of Spark applications becomes challenging when memory requests/limits are strictly coupled, significantly reducing our cluster's elasticity and forcing us to operate at the lower request-based capacity ceiling. @bnetzi makes a critical point - while equal requests/limits might be safe defaults, real-world workloads often benefit from strategic overcommitment where transient spikes can be tolerated. This capability is particularly valuable for cost optimization in large-scale deployments. +1 for reopening/reconsidering this PR (SPARK-35723). In the meantime, @bnetzi, sharing your webhook-based approach would be invaluable to many of us working around this limitation. Could you elaborate on your implementation or share code examples? |
|
Hi, @dhia-gharsallaoui our approach is preformed by using spark operator webhook, we've recently opened a PR for the new version, available here: kubeflow/spark-operator#2383 |
Currently spark driver and exec pod request and limit memory can only be same, not able to set separately when cluster get diff memory limit and request quota. then not able to fully use cluster memory resource.
E.g. cluster get total 50G memory request, 200G memory limit, but every spark pod limit memory are same as request memory, then in this cluster total memory spark can use is depends on the smaller one: 50G
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?