-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set 0.05 CPU guarantee for all user pods #801
Conversation
When limit is set but not guarantee, guarantee is set to match limit! For most use cases, when we set limit but not guarantee, we want to offer no guarantee and a limit. This doesn't seem to be possible at all (need to investigate why). In the meantime, setting a super low guarantee here means we can guard against issues like 2i2c-org#790, where setting the limit but not guarantee just gave users a huge guarantee, causing node spin ups to fail Ref 2i2c-org#790
@consideRatio how would kubespawner currently represent 'no guarantee but a limit'? Is that even possible? |
From the k8s docs at https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits
I assume kubespawner would represent no cpu/memory guarantee as an unset resource requests, which then makes the k8s api-server automatically set the resource requests to the explicit limit as documented. |
hub-templates/basehub/values.yaml
Outdated
cpu: | ||
# When limit is set but not guarantee, guarantee is set to match limit! For most | ||
# use cases, when we set limit but not guarantee, we want to offer no guarantee and | ||
# a limit. This doesn't seem to be possible at all (need to investigate why). In the | ||
# meantime, setting a super low guarantee here means we can guard against issues like | ||
# https://github.com/2i2c-org/infrastructure/issues/790, where setting the limit | ||
# but not guarantee just gave users a huge guarantee, causing node spin ups to fail | ||
guarantee: 0.01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can make sense in a way, but it also forces everyone to learn about 2i2c specific configuration.
Its a trade off between that users can assume k8s native behavior vs that they can't and potentially avoids mistakes due to it.
I'm conflicted about setting various defaults in the base chart overall due to this, but since we already have defaults about resource requests/limits I don't see it as an escalation of such complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm conflicted about setting various defaults in the base chart overall due to this, but since we already have defaults about resource requests/limits I don't see it as an escalation of such complexity.
+1. Specifically, we already set default memory requests and limits here, and I don't think of these as anything 2i2c specific.
@consideRatio right - so this means that 'limit but no guarantee' is unrepresentable in kubernetes? |
You can set resource requests to zero explicitly to represent 'limit but no guarantee', but having them unset makes it imply something else than 'no guarantee'. |
Since we are here... do we want to also have a |
@damianavila I believe that this PR is setting a guarantee on the CPU - it already exists for RAM I believe. EDIT: i see you're talking about limits not guaranteed, ignore my comment, sorry for the noise! also it feels like this is something that any community would run into themselves if they ran their own Z2JH setup, so to me this feels non 2i2c-specific |
@damianavila for very efficient CPU utilization and avoiding starvation of CPU resources, I think a good strategy is to set CPU guarantee + but use unbounded CPU limits. By using a cpu limit, you make pods constrainted to use CPU without accomplishing anything I'd say - unless - there was a container that got CPU starved. A pod/container can be starved from cpu resources if it is guaranteed/requested something very very small (minimum of 0.002 I think - you always get this minimum at least), and there was another container that went full throttle and had a request/guarantee of a lot more, like 2 - then they would share the CPU capacity based on the difference in request. So, in the 2 vs 0.002 example, on a 4 core machine would mean the pod with 2 in CPU requests/guarantee would make use of 3.992 of the CPU, and the other would be using 0.008 of the CPU which could have been to little to function. |
So, given that all the user pods have the same request/guarantee CPU value, you are not expecting a starvation scenario like the one you described above, and then, there are no additional benefits for using a CPU limit, am I correct? |
I see there's some discussion in this PR but I'm not quite sure how/if it's blocking an approval? |
I suggest we close this PR. This PR is trying to mitigate us making a common mistake, but by doing so, it also makes us more prone to actually make the mistake in other contexts where this mitigation strategy isn't applied behind the curtains. I think the best bet we have is to embrace the complexity without trying to mitigate it. Practically, I suggest we don't set either guarantee or limits without setting both whenever we set it for either cpu or memory in a chart - unless it's the final user config for some chart.
I'm fine with it being merged, but I find that this kind of complexity has made my work in the JMTE config a bit harder overall - as the configuration is harder to understand without manually inspecting configurations. Most helm chart leaves all resource specifications unset, both for cpu/guarantee, so it is in general a broken expectation if you have a Helm chart and one specific part of the resource requests/limits is specified already. |
If we are going with this approach, we need to have a unified direction... I guess it should be all or nothing... closing this PR without further modifications leaves us in the middle of something where we have limits/guarantees for RAM but not CPU. In addition, if we are going with the above approach, we need to make sure we are setting up limits and guarantee on all the final/end hubs config and, maybe, establish some "checks" so we are sure we do not miss adding those values, right? |
I think the practice is worth thinking about, but I don't feel confident to champion any one specific choice because there are trade offs. I don't want to hold back on taking any kind of action, or invest my own time to think much more about it since I think everything is acceptable even though I have a preference towards either closing options.
|
I think our alternatives are:
My current preference is for (2), mostly because I think the advantage we get out of it is worth that. (1) is more perfect, but I don't want that to block this good enough PR |
@yuvipanda I agree with your assessment. Let's merge this and open a new issue to iterate on implementation improvements! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, let's merge this one as is and open a follow-up issue for option 1.
So I ended up encountering a related problem on the Berkeley DataHub a few days ago (berkeley-dsep-infra/datahub#2966), where a few users running expensive simulations meant basically the entire hub was 'down' for most people - even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new rationale is sensible to me so re-approving this one.
If no CPU limit is set, it is possible for a single user or group of users to
starve everyone else of CPU time on a node, even causing new user pods to completely
fail as the notebook server process gets no CPU to complete auth handshake with
the server, and even trivial cells like
print("hello world")
may not run.Unlike memory guarantees, CPU guarantees are actually enforced by the Linux Kernel
(see https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b)
By giving each user a 5% CPU guarantee (represented by 0.05), we ensure that:
guarantee, at which point kubernetes treats the limit as the guarantee! This causes
far more nodes to be scaled up than needed, making everything super slow (like in
[Incident] PaleoHack User sessions won't start #790)
when a node is full on its memory guarantees. But a 0.05 guarantee means a n1-highmem-8
node can fit 160 user pods, and since kubernetes already caps us at 100 pods a node,
this guarantee doesn't actually change our scheduling.
Ref #790