Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod: set both the resource CPU request and limit #112

Merged
merged 1 commit into from
Jul 29, 2022

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Jul 27, 2022

Right now, we're not setting any CPU limit in the pods we schedule. This
means our workloads runs without constraints and can hog the hosts
they're running on if there are no namespace-wide default resource
limits.

Unlike with memory requests/limits, limits are enforced at the CPU
scheduling level, so IIUC it's not possible to be evicted due to
excessive CPU usage.

So let's start defaulting to a reasonable not too large value and ensure
we set both the CPU requests and limits. This also allows cgroups-aware
applications to derive the correct number of "equivalent CPUs" to use
for multi-threaded/multi-processing steps.

vars/pod.groovy Outdated Show resolved Hide resolved
Copy link
Member

@ravanelli ravanelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dustymabe
Copy link
Member

hmm. I kind of liked the fact that if there were spare resources available we could "go faster". I guess there's no difference between "use resources if they are available" and "prevent other people from using resources we are currently using".

i.e. it would be nice if CPUs are idling for us to use them, but if other pods got scheduled on the node to be able to yield that extra CPU back to them.

Right now, we're not setting any CPU limit in the pods we schedule. This
means our workloads runs without constraints and can hog the hosts
they're running on if there are no namespace-wide default resource
limits.

Unlike with memory requests/limits, limits are enforced at the CPU
scheduling level, so IIUC it's not possible to be evicted due to
excessive CPU usage.

So let's start defaulting to a reasonable not too large value and ensure
we set both the CPU requests and limits. This also allows cgroups-aware
applications to derive the correct number of "equivalent CPUs" to use
for multi-threaded/multi-processing steps.
Copy link
Contributor

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlebon
Copy link
Member Author

jlebon commented Jul 28, 2022

hmm. I kind of liked the fact that if there were spare resources available we could "go faster". I guess there's no difference between "use resources if they are available" and "prevent other people from using resources we are currently using".

i.e. it would be nice if CPUs are idling for us to use them, but if other pods got scheduled on the node to be able to yield that extra CPU back to them.

Yeah, this is an intricate topic, and I'm definitely not an SME. But the base problem as I understand it is that there's no reliable way to know how much CPU we should use when there is only a soft limit (e.g. Kubernetes' request, which gets translated into cpu.shares). It feels wrong to have e.g. a compressor think it can use 64 threads. And in theory being able to use idle CPU sounds nice, but in practice this makes us noisy neighbours which can affect performance (esp. latency) in colocated workloads.

Being able to declare upfront how much CPU we need also means we'll be nicer to others and ourselves (since we schedule multiple pods across multiple namespaces and jobs) and could actually improve CI and pipeline reliability. It might lead to realizing we're not reserving enough CPU, in which case we should feel free in e.g. the pipeline to request more as necessary.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jlebon pushed a commit to jlebon/coreos-assembler that referenced this pull request Jul 29, 2022
We weren't specifying a number.  Previously that applied no limit at
all, but with coreos/coreos-ci-lib#112, the
default will now be 2.
@jlebon jlebon merged commit 7b552e1 into coreos:main Jul 29, 2022
@jlebon jlebon deleted the pr/cpu-limit branch July 29, 2022 14:52
jlebon pushed a commit to coreos/coreos-assembler that referenced this pull request Jul 29, 2022
We weren't specifying a number.  Previously that applied no limit at
all, but with coreos/coreos-ci-lib#112, the
default will now be 2.
dustymabe added a commit to dustymabe/fedora-coreos-pipeline that referenced this pull request Aug 1, 2022
With the recent changes in coreos/coreos-assembler#2975
we now need to set a cpu request limit. cosaPod() does this for us
now that coreos/coreos-ci-lib#112 is merged.
Let's convert our jobs to use cosaPod().
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this pull request Aug 2, 2022
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy
I/O contention it could race with services being terminated prior to
switching to the real root.  Before=initrd.target should fix this, so we
specify it, but it isn't enough for the same unknown reason as in
8b80486.  Also specify Before=initrd-parse-etc.service to avoid that
problem.

Fixes occasional multipath.day1 flakes in
coreos/fedora-coreos-tracker#1105, which became
more frequent after coreos/coreos-ci-lib#112
landed.

Fixes coreos/fedora-coreos-tracker#1105.
mike-nguyen pushed a commit to mike-nguyen/fedora-coreos-config that referenced this pull request Aug 3, 2022
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy
I/O contention it could race with services being terminated prior to
switching to the real root.  Before=initrd.target should fix this, so we
specify it, but it isn't enough for the same unknown reason as in
8b80486.  Also specify Before=initrd-parse-etc.service to avoid that
problem.

Fixes occasional multipath.day1 flakes in
coreos/fedora-coreos-tracker#1105, which became
more frequent after coreos/coreos-ci-lib#112
landed.

Fixes coreos/fedora-coreos-tracker#1105.

(cherry picked from commit 651846e)
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this pull request Aug 3, 2022
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy
I/O contention it could race with services being terminated prior to
switching to the real root.  Before=initrd.target should fix this, so we
specify it, but it isn't enough for the same unknown reason as in
8b80486.  Also specify Before=initrd-parse-etc.service to avoid that
problem.

Fixes occasional multipath.day1 flakes in
coreos/fedora-coreos-tracker#1105, which became
more frequent after coreos/coreos-ci-lib#112
landed.

Fixes coreos/fedora-coreos-tracker#1105.

(cherry picked from commit 651846e)
dustymabe pushed a commit to dustymabe/coreos-assembler that referenced this pull request Nov 3, 2022
We weren't specifying a number.  Previously that applied no limit at
all, but with coreos/coreos-ci-lib#112, the
default will now be 2.

(cherry picked from commit f634965)
dustymabe pushed a commit to dustymabe/coreos-assembler that referenced this pull request Nov 3, 2022
We weren't specifying a number.  Previously that applied no limit at
all, but with coreos/coreos-ci-lib#112, the
default will now be 2.

(cherry picked from commit f634965)
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy
I/O contention it could race with services being terminated prior to
switching to the real root.  Before=initrd.target should fix this, so we
specify it, but it isn't enough for the same unknown reason as in
8b80486.  Also specify Before=initrd-parse-etc.service to avoid that
problem.

Fixes occasional multipath.day1 flakes in
coreos/fedora-coreos-tracker#1105, which became
more frequent after coreos/coreos-ci-lib#112
landed.

Fixes coreos/fedora-coreos-tracker#1105.
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
coreos-boot-edit.service wasn't sequenced Before anything, so under heavy
I/O contention it could race with services being terminated prior to
switching to the real root.  Before=initrd.target should fix this, so we
specify it, but it isn't enough for the same unknown reason as in
8b80486.  Also specify Before=initrd-parse-etc.service to avoid that
problem.

Fixes occasional multipath.day1 flakes in
coreos/fedora-coreos-tracker#1105, which became
more frequent after coreos/coreos-ci-lib#112
landed.

Fixes coreos/fedora-coreos-tracker#1105.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants