Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set 0.05 CPU guarantee for all user pods #801

Merged
merged 2 commits into from
Nov 9, 2021

Conversation

yuvipanda
Copy link
Member

@yuvipanda yuvipanda commented Nov 1, 2021

If no CPU limit is set, it is possible for a single user or group of users to
starve everyone else of CPU time on a node, even causing new user pods to completely
fail as the notebook server process gets no CPU to complete auth handshake with
the server, and even trivial cells like print("hello world") may not run.
Unlike memory guarantees, CPU guarantees are actually enforced by the Linux Kernel
(see https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b)
By giving each user a 5% CPU guarantee (represented by 0.05), we ensure that:

  1. Simple cells will always execute
  2. Notebook server processes will always start - so users won't have server spawn failure
  3. We don't accidentally set just a high limit for a particular hub and not set a
    guarantee, at which point kubernetes treats the limit as the guarantee! This causes
    far more nodes to be scaled up than needed, making everything super slow (like in
    [Incident] PaleoHack User sessions won't start #790)
  4. Most of our workloads are still memory bound, and we want scaling to happen only
    when a node is full on its memory guarantees. But a 0.05 guarantee means a n1-highmem-8
    node can fit 160 user pods, and since kubernetes already caps us at 100 pods a node,
    this guarantee doesn't actually change our scheduling.

Ref #790

When limit is set but not guarantee, guarantee is set to match limit! For most
use cases, when we set limit but not guarantee, we want to offer no guarantee and
a limit. This doesn't seem to be possible at all (need to investigate why). In the
meantime, setting a super low guarantee here means we can guard against issues like
2i2c-org#790, where setting the limit
but not guarantee just gave users a huge guarantee, causing node spin ups to fail

Ref 2i2c-org#790
@yuvipanda
Copy link
Member Author

@consideRatio how would kubespawner currently represent 'no guarantee but a limit'? Is that even possible?

@consideRatio
Copy link
Member

From the k8s docs at https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits

Note: If a Container specifies its own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit. Similarly, if a Container specifies its own CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches the limit.

I assume kubespawner would represent no cpu/memory guarantee as an unset resource requests, which then makes the k8s api-server automatically set the resource requests to the explicit limit as documented.

Comment on lines 153 to 160
cpu:
# When limit is set but not guarantee, guarantee is set to match limit! For most
# use cases, when we set limit but not guarantee, we want to offer no guarantee and
# a limit. This doesn't seem to be possible at all (need to investigate why). In the
# meantime, setting a super low guarantee here means we can guard against issues like
# https://github.com/2i2c-org/infrastructure/issues/790, where setting the limit
# but not guarantee just gave users a huge guarantee, causing node spin ups to fail
guarantee: 0.01
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can make sense in a way, but it also forces everyone to learn about 2i2c specific configuration.

Its a trade off between that users can assume k8s native behavior vs that they can't and potentially avoids mistakes due to it.

I'm conflicted about setting various defaults in the base chart overall due to this, but since we already have defaults about resource requests/limits I don't see it as an escalation of such complexity.

Copy link
Member Author

@yuvipanda yuvipanda Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm conflicted about setting various defaults in the base chart overall due to this, but since we already have defaults about resource requests/limits I don't see it as an escalation of such complexity.

+1. Specifically, we already set default memory requests and limits here, and I don't think of these as anything 2i2c specific.

@yuvipanda
Copy link
Member Author

@consideRatio right - so this means that 'limit but no guarantee' is unrepresentable in kubernetes?

@consideRatio
Copy link
Member

so this means that 'limit but no guarantee' is unrepresentable in kubernetes?

You can set resource requests to zero explicitly to represent 'limit but no guarantee', but having them unset makes it imply something else than 'no guarantee'.

@damianavila
Copy link
Contributor

Since we are here... do we want to also have a limit on the cpu?

@choldgraf
Copy link
Member

choldgraf commented Nov 1, 2021

@damianavila I believe that this PR is setting a guarantee on the CPU - it already exists for RAM I believe. EDIT: i see you're talking about limits not guaranteed, ignore my comment, sorry for the noise!

also it feels like this is something that any community would run into themselves if they ran their own Z2JH setup, so to me this feels non 2i2c-specific

@consideRatio
Copy link
Member

@damianavila for very efficient CPU utilization and avoiding starvation of CPU resources, I think a good strategy is to set CPU guarantee + but use unbounded CPU limits.

By using a cpu limit, you make pods constrainted to use CPU without accomplishing anything I'd say - unless - there was a container that got CPU starved. A pod/container can be starved from cpu resources if it is guaranteed/requested something very very small (minimum of 0.002 I think - you always get this minimum at least), and there was another container that went full throttle and had a request/guarantee of a lot more, like 2 - then they would share the CPU capacity based on the difference in request. So, in the 2 vs 0.002 example, on a 4 core machine would mean the pod with 2 in CPU requests/guarantee would make use of 3.992 of the CPU, and the other would be using 0.008 of the CPU which could have been to little to function.

@damianavila
Copy link
Contributor

So, given that all the user pods have the same request/guarantee CPU value, you are not expecting a starvation scenario like the one you described above, and then, there are no additional benefits for using a CPU limit, am I correct?

@sgibson91
Copy link
Member

I see there's some discussion in this PR but I'm not quite sure how/if it's blocking an approval?

@consideRatio
Copy link
Member

consideRatio commented Nov 4, 2021

I suggest we close this PR.

This PR is trying to mitigate us making a common mistake, but by doing so, it also makes us more prone to actually make the mistake in other contexts where this mitigation strategy isn't applied behind the curtains.

I think the best bet we have is to embrace the complexity without trying to mitigate it. Practically, I suggest we don't set either guarantee or limits without setting both whenever we set it for either cpu or memory in a chart - unless it's the final user config for some chart.

From the k8s docs at https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits

Note: If a Container specifies its own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit. Similarly, if a Container specifies its own CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches the limit.


I'm fine with it being merged, but I find that this kind of complexity has made my work in the JMTE config a bit harder overall - as the configuration is harder to understand without manually inspecting configurations.

Most helm chart leaves all resource specifications unset, both for cpu/guarantee, so it is in general a broken expectation if you have a Helm chart and one specific part of the resource requests/limits is specified already.

@damianavila
Copy link
Contributor

I think the best bet we have is to embrace the complexity without trying to mitigate it. Practically, I suggest we don't set either guarantee or limits without setting both whenever we set it for either cpu or memory in a chart - unless it's the final user config for some chart.

If we are going with this approach, we need to have a unified direction... I guess it should be all or nothing... closing this PR without further modifications leaves us in the middle of something where we have limits/guarantees for RAM but not CPU.

In addition, if we are going with the above approach, we need to make sure we are setting up limits and guarantee on all the final/end hubs config and, maybe, establish some "checks" so we are sure we do not miss adding those values, right?

@consideRatio
Copy link
Member

I think the practice is worth thinking about, but I don't feel confident to champion any one specific choice because there are trade offs.

I don't want to hold back on taking any kind of action, or invest my own time to think much more about it since I think everything is acceptable even though I have a preference towards either closing options.

  • merging this
  • closing this and doing nothing
  • closing this and systematically making basehub not request anything and require the end configuration set it all

@yuvipanda
Copy link
Member Author

yuvipanda commented Nov 5, 2021

I think our alternatives are:

  1. Ensure somehow that each hub in our config has explicit CPU and memory set, and validate this on CI
  2. Merge this PR as is, and open an issue for (1)

My current preference is for (2), mostly because I think the advantage we get out of it is worth that. (1) is more perfect, but I don't want that to block this good enough PR

@sgibson91
Copy link
Member

@yuvipanda I agree with your assessment. Let's merge this and open a new issue to iterate on implementation improvements!

Copy link
Contributor

@damianavila damianavila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, let's merge this one as is and open a follow-up issue for option 1.

@yuvipanda yuvipanda changed the title Set a tiny CPU guarantee for all user pods Set 0.5 CPU guarantee for all user pods Nov 9, 2021
@yuvipanda
Copy link
Member Author

So I ended up encountering a related problem on the Berkeley DataHub a few days ago (berkeley-dsep-infra/datahub#2966), where a few users running expensive simulations meant basically the entire hub was 'down' for most people - even print("hello world") won't work. In investigating and fixing that, I've actually changed my approach to this PR and have increased the CPU guarantee as well. I've laid out my new rationale in the PR description now, although I believe this is something that should actually make it into the z2jh docs...

Copy link
Contributor

@damianavila damianavila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new rationale is sensible to me so re-approving this one.

@consideRatio consideRatio changed the title Set 0.5 CPU guarantee for all user pods Set 0.05 CPU guarantee for all user pods Nov 9, 2021
@yuvipanda yuvipanda merged commit f859c86 into 2i2c-org:master Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants