Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable k8s reserved cpus #3964

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

james-masson
Copy link

@james-masson james-masson commented May 16, 2024

Issue number:

As per discussions with @yeazelm

Description of changes:

Adds support for K8s reserved-cpus functionality

https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list

Testing done:

Warning - not functional.

Seems I've added a new package to Cargo for the migrations, but can't figure out how to only update the Cargo.lock with only my package changes.

Local build, unit-test, deployed AMI using new config.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking like a reasonable addition, thanks for the PR! I'd like to note the migration has to go into the next release so I've put some comments in the PR to shift you there. This is the first PR that has a migration for 1.21.0 (which may become 1.20.1 but the maintainers can sort that out if it changes) so that is why there are no directories showing you the way. You'll just want to shift the path for the actual migration crate to v1.21.0 and rename them. Otherwise the migration looks right from reading the code. If you can do the rename I'll kick off some testing to ensure it works.

Release.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to pull this branch and validate that the settings take. I couldn't verify that the reserved CPUs were actually reserved. Do you have some commands I we can document to use for validation of this change?

Release.toml Outdated Show resolved Hide resolved
@yeazelm
Copy link
Contributor

yeazelm commented Jun 3, 2024

I've tested the migration works as well. We are pretty close, as a tidiness ask, can you squash the commits into two commits?

  • one to cover adding the setting and templates
  • and another to add the migration

If you can provide the commands to validate the actual functionality and clean up the commits, we should be in a good spot to merge! Thanks for the work on this @james-masson

@james-masson
Copy link
Author

I was able to pull this branch and validate that the settings take. I couldn't verify that the reserved CPUs were actually reserved. Do you have some commands I we can document to use for validation of this change?

Given an 8 CPU core system as an example

  1. Configure the system
[settings.kubernetes]
cpu-manager-policy = "static"
reserved-cpus = "0-1" # don't use these CPUs for any high-priority k8s workloads
  1. Inject a guaranteed class pod into the cluster...

... with an integer number of CPUs that matches the number of CPUs that are not in the "reserved-cpus" mask - in this case 6

This assumes the node is otherwise empty, apart from daemonsets - otherwise you'll have to reduce the number of cpus the pod requests to get it to fit.

eg.

---
apiVersion: v1
kind: Pod
metadata:
  name: guarenteed-qos
spec:
  containers:
    - name: qos-demo
      image: nginx
      resources:
        limits:
          memory: "200Mi"
          cpu: "6"
  1. Look at the cpuset that is assigned to the guaranteed class pod.

Assuming cgroup v2

kubectl exec -it <guaranteed pod> -- cat /sys/fs/cgroup/cpuset.cpus.effective

The set should not include the CPU numbers set in the "reserved-cpus" mask in step 1

  1. Look at the cpuset that is assigned to a random daemonset
kubectl -n kube-system exec -it <daemonset pod> -- cat /sys/fs/cgroup/cpuset.cpus.effective

The set should include the CPU numbers set in the "reserved-cpus" mask in step 1, and also should not overlap with the guaranteed-class pod cpuset values.

@james-masson
Copy link
Author

@yeazelm - This has now been built and tested, and should have all the changes you asked for.

@@ -295,6 +295,7 @@ struct KubernetesSettings {
shutdown_grace_period_for_critical_pods: KubernetesDurationValue,
memory_manager_reserved_memory: HashMap<Identifier, KubernetesMemoryReservation>,
memory_manager_policy: KubernetesMemoryManagerPolicy,
reserved_cpus: SingleLineString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code of how the Kubernetes parses this line, and I think we should implement the same parsing logic in a struct. We already do something similar for other values that kubernetes expects in other places (see KubernetesDurationValue). This will prevent users from setting a random, since the API will refuse to apply the value, which provides a better experience than chasing errors in the kubelet logs. The new type could be KubernetesCpuSet, and implement what the kubernetes folks did here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably this is not a kubernetes specific string - it's a kernel concept, "cpuset string" is effectively a standard.

https://docs.kernel.org/admin-guide/cgroup-v2.html#cpuset-interface-files

It is probably a useful concept to have in Bottlerocket, as my future PRs will include a lot more use of "cpuset string" that are not tied to Kubernetes.

So, if you want this, best it's KernelCpuSet - not KubernetesCpuSet

But it's almost academic - I lack the skills to do this in Rust ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0c9d63c I wrote up a quick approach to this to assist with getting you there. This should work for your needs but I didn't test the integration myself. I'm happy to help get you the rest of the way if needed on this change!


/// Add the option to set Kubernetes reserved-cpus
fn run() -> Result<()> {
migrate(AddSettingsMigration(&["settings.kubernetes.reserved-cpus"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to limit this for k8s only? And have a conditional here to perform a Noop migration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants