-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pod: set both the resource CPU request and limit #112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
hmm. I kind of liked the fact that if there were spare resources available we could "go faster". I guess there's no difference between "use resources if they are available" and "prevent other people from using resources we are currently using". i.e. it would be nice if CPUs are idling for us to use them, but if other pods got scheduled on the node to be able to yield that extra CPU back to them. |
Right now, we're not setting any CPU limit in the pods we schedule. This means our workloads runs without constraints and can hog the hosts they're running on if there are no namespace-wide default resource limits. Unlike with memory requests/limits, limits are enforced at the CPU scheduling level, so IIUC it's not possible to be evicted due to excessive CPU usage. So let's start defaulting to a reasonable not too large value and ensure we set both the CPU requests and limits. This also allows cgroups-aware applications to derive the correct number of "equivalent CPUs" to use for multi-threaded/multi-processing steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yeah, this is an intricate topic, and I'm definitely not an SME. But the base problem as I understand it is that there's no reliable way to know how much CPU we should use when there is only a soft limit (e.g. Kubernetes' Being able to declare upfront how much CPU we need also means we'll be nicer to others and ourselves (since we schedule multiple pods across multiple namespaces and jobs) and could actually improve CI and pipeline reliability. It might lead to realizing we're not reserving enough CPU, in which case we should feel free in e.g. the pipeline to request more as necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We weren't specifying a number. Previously that applied no limit at all, but with coreos/coreos-ci-lib#112, the default will now be 2.
We weren't specifying a number. Previously that applied no limit at all, but with coreos/coreos-ci-lib#112, the default will now be 2.
With the recent changes in coreos/coreos-assembler#2975 we now need to set a cpu request limit. cosaPod() does this for us now that coreos/coreos-ci-lib#112 is merged. Let's convert our jobs to use cosaPod().
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy I/O contention it could race with services being terminated prior to switching to the real root. Before=initrd.target should fix this, so we specify it, but it isn't enough for the same unknown reason as in 8b80486. Also specify Before=initrd-parse-etc.service to avoid that problem. Fixes occasional multipath.day1 flakes in coreos/fedora-coreos-tracker#1105, which became more frequent after coreos/coreos-ci-lib#112 landed. Fixes coreos/fedora-coreos-tracker#1105.
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy I/O contention it could race with services being terminated prior to switching to the real root. Before=initrd.target should fix this, so we specify it, but it isn't enough for the same unknown reason as in 8b80486. Also specify Before=initrd-parse-etc.service to avoid that problem. Fixes occasional multipath.day1 flakes in coreos/fedora-coreos-tracker#1105, which became more frequent after coreos/coreos-ci-lib#112 landed. Fixes coreos/fedora-coreos-tracker#1105. (cherry picked from commit 651846e)
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy I/O contention it could race with services being terminated prior to switching to the real root. Before=initrd.target should fix this, so we specify it, but it isn't enough for the same unknown reason as in 8b80486. Also specify Before=initrd-parse-etc.service to avoid that problem. Fixes occasional multipath.day1 flakes in coreos/fedora-coreos-tracker#1105, which became more frequent after coreos/coreos-ci-lib#112 landed. Fixes coreos/fedora-coreos-tracker#1105. (cherry picked from commit 651846e)
We weren't specifying a number. Previously that applied no limit at all, but with coreos/coreos-ci-lib#112, the default will now be 2. (cherry picked from commit f634965)
We weren't specifying a number. Previously that applied no limit at all, but with coreos/coreos-ci-lib#112, the default will now be 2. (cherry picked from commit f634965)
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy I/O contention it could race with services being terminated prior to switching to the real root. Before=initrd.target should fix this, so we specify it, but it isn't enough for the same unknown reason as in 8b80486. Also specify Before=initrd-parse-etc.service to avoid that problem. Fixes occasional multipath.day1 flakes in coreos/fedora-coreos-tracker#1105, which became more frequent after coreos/coreos-ci-lib#112 landed. Fixes coreos/fedora-coreos-tracker#1105.
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy I/O contention it could race with services being terminated prior to switching to the real root. Before=initrd.target should fix this, so we specify it, but it isn't enough for the same unknown reason as in 8b80486. Also specify Before=initrd-parse-etc.service to avoid that problem. Fixes occasional multipath.day1 flakes in coreos/fedora-coreos-tracker#1105, which became more frequent after coreos/coreos-ci-lib#112 landed. Fixes coreos/fedora-coreos-tracker#1105.
Right now, we're not setting any CPU limit in the pods we schedule. This
means our workloads runs without constraints and can hog the hosts
they're running on if there are no namespace-wide default resource
limits.
Unlike with memory requests/limits, limits are enforced at the CPU
scheduling level, so IIUC it's not possible to be evicted due to
excessive CPU usage.
So let's start defaulting to a reasonable not too large value and ensure
we set both the CPU requests and limits. This also allows cgroups-aware
applications to derive the correct number of "equivalent CPUs" to use
for multi-threaded/multi-processing steps.