Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use kernel 4.18 in EKS and ECS Amazon Linux AMIs to solve CFS throttling issues. #175

Closed
willejs opened this issue Feb 27, 2019 · 16 comments
Labels
AL2 Amazon Linux Proposed Community submitted issue

Comments

@willejs
Copy link

willejs commented Feb 27, 2019

Apologies as this is kind of an Amazon linux 2 issue, but directly effects EKS and ECS.
Does anyone know when the Linux kernel version 4.18 will be shipped in Amazon Linux 2?
When it is, can the EKS and ECS images be rebuilt?

torvalds/linux@512ac99#diff-1c5364196d98130348bddabaad0a701f
The patch above should fix the issue with CFS quotas leading to process throttling. This should enable us to use limits everywhere again without setting off prometheus alerts or degrading performance in latency sensitive components!
kubernetes/kubernetes#67577

@tabern tabern added EKS Amazon Elastic Kubernetes Service ECS Amazon Elastic Container Service AL2 Amazon Linux and removed ECS Amazon Elastic Container Service EKS Amazon Elastic Kubernetes Service labels Mar 2, 2019
@abby-fuller abby-fuller added the Proposed Community submitted issue label Mar 7, 2019
@zflamig
Copy link

zflamig commented Mar 12, 2019

+1 this, would be great for FUSE too that shipped in 4.18.

@BrianChristie
Copy link

BrianChristie commented Mar 25, 2019

According to an excellent detailed response from AWS Support, the sched/fair: Fix bandwidth timer clock drift condition patch [0] is merged into the Linux kernel version 4.14.y branch and is available as part of 4.14.95 release [1].

The latest EKS Optimized Amazon Linux 2 AMI has a Linux kernel version which includes the patch.

# uname -a
Linux ip-192-168-213-62.eu-west-1.compute.internal 4.14.97-90.72.amzn2.x86_64 #1 SMP Tue Feb 5 20:46:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Reference:
[0] torvalds/linux@512ac99#diff-1c5364196d98130348bddabaad0a701f

[1] Linux kernel patch 4.14.95
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-4.14.y&ofs=1000

I believe this issue can be closed.

@szymonpk
Copy link

szymonpk commented Mar 26, 2019

I have switched all the nodes to 4.14.104-95.84.amzn2.x86_64, it didn't helped much. I still see throttled containers where CPU usage is minimal, mostly node-exporters.

@otterley
Copy link

otterley commented Jun 4, 2019

It was subsequently reported that torvalds/linux@512ac99 introduced a regression. A patch to correct this can be found at https://lkml.org/lkml/2019/5/17/581 (not yet merged AFAICT).

@whereisaaron
Copy link

whereisaaron commented Jun 5, 2019

Not merged. The author, Dave Chiluk, is looking for support on the LKML to get attention to this patch, to get it reviewed and merged.

@willejs
Copy link
Author

willejs commented Jul 16, 2019

It looks like that patch is getting closer to being merged from that thread. I can imagine it's still probably months away from getting into the amazon linux build though.

@whereisaaron
Copy link

whereisaaron commented Jul 31, 2019

Reports of great improvements using 4.14.133 with 512ac99 and Dave Chiluk’s patch backported into it: kubernetes/kubernetes#67577 (comment) https://gist.github.com/PaulFurtado/ff6c67ec87416b66ba1c6fc70f7beec1

Hopefully test results like these will help it get merged soon.

@andrew-howden
Copy link

andrew-howden commented Aug 1, 2019

In our case we're looking for:

https://github.com/torvalds/linux/commit/79e9fed46038/

As we're exhausting the ephemeral TCP port range in containers.

Also in 4.18.

@hugoprudente
Copy link

hugoprudente commented Oct 2, 2019

@andrew-howden,

As per release-notes the Amazon Linux 2, base image for the EKS/ECS Optmized AMI's already have available the kernel 4.19.x[1]

I have upgraded my EKS Opmtized AMI with the 4.19.x kernel to confirm the version/patch available.

$ uname -r
4.19.72-25.58.amzn2.x86_64

With that I was able to confirm that the net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization. patch that you required is available already on the 4.19 kernel provided by the AL2

Now is a question of time to the Amazon Linux 2 to start use the Kernel 4.19 as default, or you can use the https://github.com/awslabs/amazon-eks-ami, and build your own AMI with the AWS Supported Kernel as default.

Sadly the Dave Chiluk’s patch is not backported to neither versions of the kernel till now. Altought the version 512ac99 is available on both 4.19, 4.14

Reference
[1]: https://aws.amazon.com/amazon-linux-2/release-notes/

@alakesh
Copy link

alakesh commented Oct 2, 2019

Amazon linux extras provide kernel-ng which is a 4.19 based kernel.
$ sudo amazon-linux-extras install kernel-ng

@whereisaaron
Copy link

whereisaaron commented Oct 4, 2019

Dave Chiluk’s patch is en route to be in the 5.4 kernel (the next release now that 5.3 out), and should shortly be available in 5.4-rc1.
kubernetes/kubernetes#67577 (comment)

This has now been merged into Linus' tree and should be released with 5.4. I also just submitted it to linux-stable, and assuming that goes smoothly all distros that correctly follow stable process should start picking it up shortly.

@hugoprudente
Copy link

hugoprudente commented Nov 13, 2019

Hi all,

The Dave Chiluk’s patch was back-ported today, it's available from version v4.14.154 && v4.19.84 towards.

Now we just need to way to be available on AmazonLinux2 to yum update it and use the feature.

I'll keep track of it and let you guys know.

@hugoprudente
Copy link

hugoprudente commented Nov 26, 2019

Amazon Linux 2 just had rolled out the new kernel version 4.14.154 is available for usage.

[ec2-user@ip-172-31-38-202 ~]$ uname -a
Linux ip-172-31-38-202.eu-west-1.compute.internal 4.14.154-128.181.amzn2.x86_64 #1 SMP Sat Nov 16 21:49:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@f3d3Cz
Copy link

f3d3Cz commented Nov 26, 2019

@juniorz
Copy link

juniorz commented Dec 17, 2019

EKS AMIs have just been released with Kernel 4.14.154

sh-4.2$ uname -a
Linux ip-10-0-0-35.ec2.internal 4.14.154-128.181.amzn2.x86_64 #1 SMP Sat Nov 16 21:49:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

See ami-087a82f6b78a07557 in us-east-1 for example.

@mikestef9
Copy link
Contributor

mikestef9 commented Dec 17, 2019

Closing this issue as resolved with the release of the latest EKS AMI. See release notes here

https://github.com/awslabs/amazon-eks-ami/releases/tag/v20191213

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AL2 Amazon Linux Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests