Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: patches runc for al2023 kind image runing on al2 host #2821

Closed
wants to merge 1 commit into from

Conversation

jaxesn
Copy link
Member

@jaxesn jaxesn commented Jan 16, 2024

Issue #, if available:

Description of changes:

Background

Our kind image is based on AL23. When we GA'd eks-a it was based on AL2 to ensure we are shipping Amazon built components. AL2 is old and the version of systemd only support cgroup v1. A couple years ago this started to be an issue due to newer OSs switching to cgroup v2, such as Ubuntu 22.04. When users tried to use admin machines based running an OS using cgroup v2, our kind image would stall out due to systemd not supporting v2.

Shortly after AL23 went GA, we switched our image to AL23 solving the v2 issue. This also continued to support older OSs, like AL2 since newer systemd versions still support cgroup v1. Its important to note that we still use AL2 as our e2e tests host machines. That said, its rare in real world usage for customers to be using AL2, since for most use cases they are in infrastructure providers outside of AWS and generally use Ubuntu or RHEL.

About the middle of last year there was the runc/kubelet issue due to runc's version 1.1.6 release. This required new releases of kubernetes and changes to kind to support. At this time we introduced this patch locking the versions of containerd and runc because we saw issues with our kind image on AL2 nodes. While debugging and not finding the answer, we assumed (incorrectly) the issue had to do with the misc controller which was the runc 1.1.6 issue. At the time this not make sense because that cgroup controller was not introduced until a later kernel which was not available on AL2. Since then we kept the version of runc to 1.1.5 in our kind image.

I was attempting to see what happens if we update to the latest runc today if we have the same issue. We do in fact... I tried reverting the misc controller change, since I was still convinced this was related. That had no affect. I think found this which if you follow the original PR and discussions, it is mentioned that code was originally added to support docker-in-docker workflows... It was also said that this predates the availability of cgroup namespaces.

Around the time this runc change was made, the kind maintainers made the change to require cgroupns=private. When this is in use, the change made to runc is not a problem at all, and likely almost all cases that matter its totally fine, based on the discussion and research done by the runc maintainers.

Kind 0.20.0 is what included the private change and unfortunately when we updated we ran into issues with, surprise surprise, AL2... The version of docker shipped in AL2 supports cgroupns=private, however, something about the kernel does not. See this issue for more info. To work around this, we patch this change out in our kind build.

Not being able to use the private ns and using the 1.1.6+ runc stops us from using AL2 nodes for our capd based clusters. Oddly, using it for bootstrap clusters which do not use cilium, or much for that matter, ends up working fine with the new runc. I assume this is just luck... the issue must have something to do with the kinds of containers or amount of containers being launched.

What do we do?

I dont know... for now we can probably do nothing since our kind image is not meant for production workloads, however, at some point we need to be able to update runc. For that I see a few options

  • patch runc to remove this change. If we were to do this, I would either do it in a way that we can configure it to only revert the behavior if inside our kind image so that when running runc on a real node it behaviors as intended. Or we could have a separate runc build just for our kind image vs our real node images.
  • switch our e2e tests host to al23 which do not have this problem. We could then probably remove the cgroupns patch and deprecate usage of al2 with eks-anywhere or we could keep the patch and still "sorta" support AL2 as admin machines, but not for capd clusters.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jaxesn. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eks-distro-bot eks-distro-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 16, 2024
@eks-distro-bot
Copy link
Collaborator

@jaxesn: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
runc-tooling-presubmit-arm64 1a729aa link true /test runc-tooling-presubmit-arm64
runc-tooling-presubmit 1a729aa link true /test runc-tooling-presubmit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jaxesn jaxesn closed this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants