Skip to content
This repository has been archived by the owner. It is now read-only.

CFS scheduler bug throttles highly threaded I/O blocked applications in Kubernetes #2623

dharmab opened this issue Oct 30, 2019 · 8 comments


Copy link

@dharmab dharmab commented Oct 30, 2019

Issue Report


Container Linux Version

NAME="Container Linux by CoreOS"
PRETTY_NAME="Container Linux by CoreOS 2191.5.0 (Rhyolite)"


Azure, AWS, and VMware

Expected Behavior

Highly threaded, I/O blocked containers running on Kubernetes on CoreOS should be able to use their full configured CPU request.

Actual Behavior

Highly threaded, I/O blocked containers running on Kubernetes on CoreOS are heavily throttled well before they approach their CPU request. We are seeing CPU performance impact of 50% in production for some Java web applications- i.e. if we request 6 cores, we are throttled at 3.

See for the kernel patch to fix this, which is landing in 5.4.

Because this bug heavily impacts the main intended use case of CoreOS Container Linux, would it be possible to prioritize this patch for backport?

Copy link

@jcrowthe jcrowthe commented Oct 30, 2019

Small clarification: the issue lies in cfs, the CPU bandwidth control mechanism. Due to this bug in the kernel, cfs may throttle a pod well before its requests are reached. Hence the issue is in how the kernel enforces the kubernetes pod limits rather than pod requests.

Copy link

@jhohertz jhohertz commented Nov 4, 2019

Just wanted to reference some work I've done on this, overlay commit w/ the patch here:


Edit: This ⬆️ is obsoleted by the PR below ⬇️ .

Copy link

@chiluk chiluk commented Nov 4, 2019

cpu ".request" = cpu.shares which is a different mechanism from ".limits" which uses cfs bandwidth control. If you really are using .request you are probably being hit by over-committed cpus on your nodes.

That being said yes CoreOS should backport the following linux commits onto their kernel if they haven't already.

Copy link

@jhohertz jhohertz commented Nov 5, 2019

Update: I am reworking my PR w/ input from the developers now.

Copy link

@dharmab dharmab commented Nov 6, 2019

@chiluk good callout. We were setting both requests and limits to 6 in our test, and reproduced the throttling on a node we tainted to run just the one application Pod that had many more cores available.

@jhohertz thanks for submitting the patches! We've built a CoreOS image with those patches for some testing and are also awaiting an official release.

Copy link

@dharmab dharmab commented Nov 7, 2019

We confirmed this patch fixed our throttling issue! Thank you!

@dharmab dharmab closed this Nov 7, 2019
Copy link

@bgilbert bgilbert commented Nov 7, 2019

This should be fixed in alpha 2317.0.1, due shortly. We'll probably roll the fix into beta and stable more quickly than the normal promotion schedule would suggest. Reopening until that happens.

@bgilbert bgilbert reopened this Nov 7, 2019
Copy link

@bgilbert bgilbert commented Nov 19, 2019

This will also be fixed in beta 2303.2.0 and stable 2247.7.0, due shortly. Thanks for reporting.

@bgilbert bgilbert closed this Nov 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants