Add k8s resource limit patch lib #19

sed-i · 2022-06-21T03:44:32Z

Issue

Need to be able to limit resource usage of a charm.

Crossref: OPENG-272

Solution

Create lib to patch k8s compute resource limits and requests.
Turn the placeholder charm into a tester charm, and use it in itests.

Context

NTA.

Testing Instructions

Deploy prom/318.
Compare to kubectl patch statefulset prom -n welcome -p '{"spec": {"template": {"spec": {"containers": [{"name":"prometheus", "resources": {"limits": {"cpu": "2", "memory":"2Gi"}, "requests": {"cpu": "2", "memory": "1Gi"}} }] }}}}'

Release Notes

Add k8s resource limit patch lib.

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py

sed-i · 2022-06-23T03:11:24Z

There's a complication:

When patching the StatefulSet with {"memory": "0.9Gi"}, k8s generates a PodSpec with {"memory": "966367641600m"}, i.e. millibytes (e.g. to convert it back to GiB, value = bitmath.Byte(float(memory[:-1]) / 1000).to_GiB().value).
K8s docs claim that the m suffix if part of SI, but that seems to be false, at least for binary multiples.

The .is_ready() method compares statefulset to actual pod and currently it is buggy because of the above.

Is there a way to force K8s to KEEP the same unit the user provided? @simskij @rbarry82

UPDATE: similarly, {"cpu": "0.30000000000000004"} -> {"cpu": "301m"}

simskij · 2022-06-23T06:54:36Z

There's a complication:

When patching the StatefulSet with {"memory": "0.9Gi"}, k8s generates a PodSpec with {"memory": "966367641600m"}, i.e. millibytes (e.g. to convert it back to GiB, value = bitmath.Byte(float(memory[:-1]) / 1000).to_GiB().value).

K8s docs claim that the m suffix if part of SI, but that seems to be false, at least for binary multiples.

The .is_ready() method compares statefulset to actual pod and currently it is buggy because of the above.

Is there a way to force K8s to KEEP the same unit the user provided? @simskij @rbarry82

Not that I know off, but I'll dig. What happens if you instead of 0.9Gi set it to 966367641600m from the get-go? Does it still bug?

simskij · 2022-06-23T07:18:46Z

There's a complication:

When patching the StatefulSet with {"memory": "0.9Gi"}, k8s generates a PodSpec with {"memory": "966367641600m"}, i.e. millibytes (e.g. to convert it back to GiB, value = bitmath.Byte(float(memory[:-1]) / 1000).to_GiB().value).

K8s docs claim that the m suffix if part of SI, but that seems to be false, at least for binary multiples.

The .is_ready() method compares statefulset to actual pod and currently it is buggy because of the above.
Is there a way to force K8s to KEEP the same unit the user provided? @simskij @rbarry82

Not that I know off, but I'll dig. What happens if you instead of 0.9Gi set it to 966367641600m from the get-go? Does it still bug?

Ok, so found it. The reason for this is that 0.9GiB (Gibibyte) in its canonical form (ie. without fractional digits, using the largest possible suffix, is m.

Before serializing, Quantity will be put in "canonical form". This means that Exponent/suffix will be adjusted up or down (with a corresponding increase or decrease in Mantissa) such that: a. No precision is lost b. No fractional digits will be emitted c. The exponent (or suffix) is as large as possible. The sign will be omitted unless the number is negative.

If we just disallow setting fractions in the first place, the problem will be solved. For instance, 900Mi will not get converted, while 0.9Gi will. Likely, 900Mi was also what the user tried to express when they put in 0.9Gi, which isn't entirely true as 0.9Gi = 6866,46Mi.

sed-i · 2022-06-23T21:01:34Z

disallow setting fractions in the first place, the problem will be solved.

There's also the case of 1000000Mi. We should probably support what k8s supports and do proper conversion.

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py

requirements.txt

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py

sed-i · 2022-07-07T03:35:09Z

Problem: when there is a single unit of prometheus and the user sets too high resource limits, then juju is stuck in

prometheus-k8s/0  unknown   lost   0/1

because the pod cannot be scheduled and there is no charm to take juju config changes to repatch the statefulset.

Any ideas @simskij @rbarry82?
For example, is there a way to ask K8s ahead of time if a given limit is going to be a problem, so I could BlockStatus instead?

rbarry82 · 2022-07-07T11:23:34Z

Problem: when there is a single unit of prometheus and the user sets too high resource limits, then juju is stuck in
prometheus-k8s/0  unknown   lost   0/1
because the pod cannot be scheduled and there is no charm to take juju config changes to repatch the statefulset.

Any ideas @simskij @rbarry82? For example, is there a way to ask K8s ahead of time if a given limit is going to be a problem, so I could BlockStatus instead?

This is kind of a consistent mess with the kube scheduler. That is -- the kube scheduler is not aware of what else is happening on the system, and a trival process which just malloc()s up to 80% will happily let "guaranteed" pods be scheduled which are then OOM killed.

Inside kube itself, though...

In general, kubelet itself can (and by best practice, does) reserve a certain amount of memory beyond which it will refuse to schedule because the kernel OOM killer may accidentally kill important things (like the kubelet, even, or anything else) otherwise, or swap it out.

Inside the pod, /proc is present. So from a basic level, you can check /proc/meminfo and do comparisons to see whether there is free memory. Then, you'd need to determine where the pod is running (if we hope/assume that, once a limit is set, even if the other nodes in the cluster are under MemoryPressure, that it can be rescheduled here, though that could also be racy), and describe the node to see whether there's room. Such as:

status:
  ...
  allocatable:
    cpu: "4"
    ephemeral-storage: "113180564088"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    hugepages-32Mi: "0"
    hugepages-64Ki: "0"
    memory: 7895304Ki
    pods: "110"
  capacity:
    cpu: "4"
    ephemeral-storage: 122808772Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    hugepages-32Mi: "0"
    hugepages-64Ki: "0"
    memory: 7997704Ki
    pods: "110"

Lastly, you'd need to check on the pod itself, which may be further limited (if there's no resource limits set in the podspec, this is probably unnecessary paperwork, but regardless) via /sys/fs/cgroup/memory/memory.limit_in_bytes inside the pod.

Trivially, checking /proc/meminfo to see the amount of free memory (which kube doesn't actually care about, but we may, lest it get OOM killed immediately) versus status["allocatable"]["memory"] on a single node under the guess that the cluster is homogenous (and/or the Juju admin deployed with some constraints applied, so we'll end up on an identical worker node) will let you check via differencing whether /proc/meminfo:MemTotal and allocatable memory differ to determine the amount of memory reserved in the kubelet (if any -- if there isn't one set, memory.limit_in_bytes will be like 9.2 exabytes).

If there isn't a reservation, then all you have to go on is whether a particular node (the one you're running on, maybe) has enough free memory, or you can check all of them, then 🤞 that some other pod which is pending or cannot be scheduled due to MemoryPressure doesn't steal it from you when you patch the StatefulSet. Or just set BlockedStatus and with a message that the requested limits won't be applied.

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py

simskij

Love the reduced complexity by dropping the scaling factor functionality! Good job!

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py

Resolved and updated the loki PR accordingly.

sed-i · 2022-08-02T22:55:07Z

Ready for re-review, @simskij @rbarry82 @dstathis.

This was referenced Jun 21, 2022

Use compute resource patch to limit resources canonical/prometheus-k8s-operator#318

Merged

Should observe be able to take MagicMock? canonical/operator#783

Closed

sed-i force-pushed the feature/resource-patch branch from 33c79e8 to bd99feb Compare June 22, 2022 20:33

sed-i marked this pull request as ready for review June 22, 2022 20:33

sed-i requested review from Abuelodelanada, rbarry82, balbirthomas, dstathis and simskij as code owners June 22, 2022 20:33

Abuelodelanada previously approved these changes Jun 22, 2022

View reviewed changes

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py Outdated Show resolved Hide resolved

sed-i dismissed Abuelodelanada’s stale review via c4052cd June 23, 2022 22:42

sed-i requested a review from Abuelodelanada June 24, 2022 05:45

dstathis previously approved these changes Jun 28, 2022

View reviewed changes

sed-i dismissed dstathis’s stale review via 44d5fbc June 29, 2022 02:52

Abuelodelanada approved these changes Jun 29, 2022

View reviewed changes

Abuelodelanada previously approved these changes Jun 29, 2022

View reviewed changes

sed-i requested a review from dstathis June 29, 2022 13:36

balbirthomas previously approved these changes Jun 30, 2022

View reviewed changes

sed-i dismissed stale reviews from balbirthomas and Abuelodelanada via 4a8ab5a July 5, 2022 06:17

dstathis previously approved these changes Jul 6, 2022

View reviewed changes

sed-i dismissed dstathis’s stale review via ec9f1ed July 6, 2022 14:22

sed-i force-pushed the feature/resource-patch branch from 4268e7a to ba4b9d7 Compare July 14, 2022 20:44

sed-i requested a review from Abuelodelanada July 14, 2022 23:19

sed-i requested review from Abuelodelanada and balbirthomas July 20, 2022 05:55

Abuelodelanada previously approved these changes Jul 21, 2022

View reviewed changes

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py Outdated Show resolved Hide resolved

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py Outdated Show resolved Hide resolved

Move the ray hopper from prom to here

7d20d91

sed-i dismissed Abuelodelanada’s stale review via 7d20d91 July 22, 2022 02:23

sed-i mentioned this pull request Jul 22, 2022

Use compute resource patch to limit resources canonical/alertmanager-k8s-operator#87

Merged

sed-i added 4 commits July 25, 2022 15:58

Add 'adjust_limits_and_requests' func

77d0da0

Return ResourceRequirements instead of tuple

619ed9d

Remove ResourceSpecDict

fff87d6

Cleanup

eadef95

Abuelodelanada requested changes Jul 26, 2022

View reviewed changes

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py Outdated Show resolved Hide resolved

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py Show resolved Hide resolved

Abuelodelanada previously requested changes Jul 26, 2022

View reviewed changes

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py Show resolved Hide resolved

Abuelodelanada mentioned this pull request Jul 26, 2022

Add KubernetesComputeResourcesPatch to Loki charm canonical/loki-k8s-operator#192

Merged

sed-i added 3 commits July 26, 2022 22:28

copy k-v pairs from "limits" to "requests" if missing

a85daa1

Address review comments

7739794

Lint

001b3aa

simskij previously approved these changes Aug 1, 2022

View reviewed changes

lib/charms/observability_libs/v0/kubernetes_compute_resources_patch.py Show resolved Hide resolved

sed-i requested a review from Abuelodelanada August 2, 2022 09:21

sed-i added 2 commits August 2, 2022 18:16

Simplify

ff693c9

Validate keys

286f147

sed-i dismissed simskij’s stale review via 286f147 August 2, 2022 22:36

Pin flake8 < 5

bcb3eec

sed-i requested a review from simskij August 2, 2022 22:53

dstathis approved these changes Aug 3, 2022

View reviewed changes

rbarry82 approved these changes Aug 3, 2022

View reviewed changes

sed-i merged commit af7dffa into main Aug 3, 2022

sed-i deleted the feature/resource-patch branch August 3, 2022 14:16

sed-i mentioned this pull request Aug 3, 2022

enable resource limits canonical/grafana-k8s-operator#110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add k8s resource limit patch lib #19

Add k8s resource limit patch lib #19

sed-i commented Jun 21, 2022 •

edited

sed-i commented Jun 23, 2022 •

edited

simskij commented Jun 23, 2022

simskij commented Jun 23, 2022

sed-i commented Jun 23, 2022 •

edited

sed-i commented Jul 7, 2022 •

edited

rbarry82 commented Jul 7, 2022 •

edited

simskij left a comment

sed-i commented Aug 2, 2022

Add k8s resource limit patch lib #19

Add k8s resource limit patch lib #19

Conversation

sed-i commented Jun 21, 2022 • edited

Issue

Solution

Context

Testing Instructions

Release Notes

sed-i commented Jun 23, 2022 • edited

simskij commented Jun 23, 2022

simskij commented Jun 23, 2022

sed-i commented Jun 23, 2022 • edited

sed-i commented Jul 7, 2022 • edited

rbarry82 commented Jul 7, 2022 • edited

simskij left a comment

Choose a reason for hiding this comment

sed-i commented Aug 2, 2022

sed-i commented Jun 21, 2022 •

edited

sed-i commented Jun 23, 2022 •

edited

sed-i commented Jun 23, 2022 •

edited

sed-i commented Jul 7, 2022 •

edited

rbarry82 commented Jul 7, 2022 •

edited