Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cgroups v2 #900

Closed
nsmith5 opened this issue Oct 15, 2019 · 30 comments · Fixed by #2844
Closed

Support cgroups v2 #900

nsmith5 opened this issue Oct 15, 2019 · 30 comments · Fixed by #2844

Comments

@nsmith5
Copy link

nsmith5 commented Oct 15, 2019

k3s-agent is failing to run on Fedora 31. I suspect this is due to cgroups v2 being the default on Fedora 31, but can't quite figure out how to see which version of runc is shipping with k3s (runc recently landed support for cgroups v2)

Version:
k3s version v0.9.1 (755bd1c6)

Describe the bug
k3s-agent exits 1 and dies repeatedly with the following log output:

Oct 15 07:41:51 mercury systemd[1]: k3s-agent.service: Failed with result 'exit-code'.
Oct 15 07:41:51 mercury systemd[1]: k3s-agent.service: Main process exited, code=exited, status=1/FAILURE
Oct 15 07:41:51 mercury k3s[23212]: time="2019-10-15T07:41:51.403568863-07:00" level=fatal msg="failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Rasp>
Oct 15 07:41:51 mercury k3s[23212]: time="2019-10-15T07:41:51.403559607-07:00" level=error msg="Failed to find memory cgroup, you may need to add \"cgroup_memory=1 cgroup_enable=memory\" to your linux cmdline (/boot/cmdline.txt on a Rasp>
Oct 15 07:41:51 mercury k3s[23212]: time="2019-10-15T07:41:51.403539664-07:00" level=warning msg="Failed to find cpuset cgroup, you may need to add \"cgroup_enable=cpuset\" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)"
Oct 15 07:41:51 mercury k3s[23212]: time="2019-10-15T07:41:51.403421758-07:00" level=info msg="Starting k3s agent v0.9.1 (755bd1c6)"
Oct 15 07:41:51 mercury systemd[1]: Started Lightweight Kubernetes.
Oct 15 07:41:51 mercury systemd[1]: Starting Lightweight Kubernetes...
Oct 15 07:41:51 mercury systemd[1]: Stopped Lightweight Kubernetes.
Oct 15 07:41:51 mercury systemd[1]: k3s-agent.service: Scheduled restart job, restart counter is at 1076.
Oct 15 07:41:51 mercury systemd[1]: k3s-agent.service: Service RestartSec=5s expired, scheduling restart.
.
.
.

To Reproduce
Start up a k3s agent on Fedora 31 Server

Expected behavior
Should start

Actual behavior
Doesn't and poops out the logs above

Additional context
Again, I think this is related to the cgroups v2 change, but couldn't figure out which version of runc is getting packaged with k3s right now to confirm. The most recent version of runc (1.0.0-rc9) is required for cgroups v2 to function.

@AkihiroSuda
Copy link
Contributor

rc9 still doesn't really support cgroup2, especially still lacks support for device controller

@nsmith5
Copy link
Author

nsmith5 commented Oct 15, 2019

Bummer. Ok well I suppose there is nothing k3s can really do to move that along so feel free to close this issue if you wish.

@leigh-j
Copy link

leigh-j commented Oct 30, 2019

https://github.com/containers/crun is a good drop in for runc and has full cgroupv2 support. Not a trivial change but crun seems better than runc in every way.

@davidnuzik davidnuzik added this to the Backlog milestone Nov 5, 2019
@davidnuzik davidnuzik added [zube]: To Triage kind/bug Something isn't working labels Nov 5, 2019
@T0MASD
Copy link

T0MASD commented Nov 8, 2019

I'm hitting the same issue on k3 master on fedora 31. As a workaround I've added systemd.unified_cgroup_hierarchy=0 to GRUB_CMDLINE_LINUX in /etc/default/grub

As per:
https://fedoraproject.org/wiki/Common_F31_bugs#Docker_package_no_longer_available_and_will_not_run_by_default_.28due_to_switch_to_cgroups_v2.29

@fire
Copy link

fire commented Nov 8, 2019

Running this command from the Fedora wiki worked:

sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"

@rektide
Copy link

rektide commented Dec 21, 2019

i'm sorry i'm kind of confused. what are the main things k3s depends on that need to gain cgroup v2 support?

here's some tickets for various projects. i'm not sure which of these we are going to need.

also enjoying a ~nov 1 post from @AkihiroSuda, The current adoption status of cgroup v2 in containers, which helped me understand some of this all.

@nsmith5
Copy link
Author

nsmith5 commented Dec 21, 2019

Here is my understanding: k3s uses containerd which uses runc which only supports cgroup v1. crun is an alternative to runc that supports cgroup v2. If containerd supports crun then containerd will support cgroup v2 inherently and so too will k3s.

@AkihiroSuda
Copy link
Contributor

The biggest missing part is kubelet
kubernetes/enhancements#1370

@leigh-j
Copy link

leigh-j commented Apr 8, 2020

crio already supports crun, for a minimalist kubernetes deployment crio would be simpler than containerd, containerd has a lot of baggage and currently the main blocker for 1370.

Edit: open issue for cgroup v2 support in containerd containerd/containerd#3726

the work to resolve runc is partly being done by porting across what crun has already done.
opencontainers/runtime-spec#1002 (comment)

@AkihiroSuda
Copy link
Contributor

containerd has a lot of baggage and currently the main blocker for 1370.

Wrong. containerd already supports cgroup v2.

@o-lenczyk
Copy link

is there any plan to support cgroup v2?

@nefelim4ag
Copy link

Kubelet now support Cgroup V2

@juchiast
Copy link

Every linked issues on this thread is closed.

@AkihiroSuda
Copy link
Contributor

cgroup2 PR for k3s is here #2584

@cablespaghetti
Copy link

cablespaghetti commented Nov 27, 2020 via email

@nsmith5
Copy link
Author

nsmith5 commented Nov 27, 2020

❤️ @AkihiroSuda Stunning work. I saw your Kubecon talk and hoped this might be on the horizon. I see rootless is coming up afterwards from the PR description. That is fantastic to hear! 🍻

@brandond
Copy link
Contributor

brandond commented Dec 10, 2020

Based on that error I think we need some additional changes on both the server and agent side:
https://github.com/k3s-io/k3s/blob/master/pkg/daemons/agent/agent.go#L133
https://github.com/k3s-io/k3s/blob/master/pkg/daemons/agent/agent.go#L178

@AkihiroSuda
Copy link
Contributor

Does it work if you run echo +pids > /sys/fs/cgroup/cgroup.subtree_control?

@rancher-max
Copy link
Contributor

No it doesn't. I tried it in a fresh setup, running as root, but got the same error.

@davidnuzik
Copy link
Contributor

Hi. I don't want to put any pressure on anyone, I just want to set some expectations. We are planning to release v1.20.0+k3s1 not any later than December 16th. Our due date to get things in is pretty much today, Monday the 14th is kind of a stretch. If we can't verify this is working I will need to be moving this to our next milestone, which is slated for a January 13th release (when upstream will deliver next set of patches).

I will be moving this issue to the next Milestone for January next year if it looks like this requires much more effort.

@zdzichu
Copy link

zdzichu commented Dec 12, 2020

Just to note, it was working fine as of fadc5a8 (i.e. before 1.20 landed and before f3de60f).

@davidnuzik davidnuzik modified the milestones: v1.20.0+k3s1, v1.20.1+k3s1 Dec 15, 2020
@davidnuzik
Copy link
Contributor

We need some more time to resolve the problems Max outlined. I've bumped this to our next planned release in Mid-January.

@AkihiroSuda
Copy link
Contributor

Working in Development 26 days ago

Is anyone working?

@brandond
Copy link
Contributor

@AkihiroSuda not at the moment; if you have cycles to dig into this we'd all be very grateful ;)

@brandond brandond moved this from Working to Next Up in Development [DEPRECATED] Jan 14, 2021
@davidnuzik davidnuzik modified the milestones: v1.20.2+k3s1, v1.20.3+k3s1 Jan 15, 2021
@AkihiroSuda
Copy link
Contributor

PR: #2844

@davidnuzik davidnuzik moved this from Next Up to Peer Review in Development [DEPRECATED] Jan 25, 2021
@davidnuzik
Copy link
Contributor

PR exists and is in review, so I have set the issue to Peer Review status for you.
image

Development [DEPRECATED] automation moved this from Peer Review to Done Issue / Merged PR Jan 26, 2021
@rancher-max
Copy link
Contributor

This has been validated as working. Validated using commit f3c41b7650340bddfa44129c72e7f9fb79061b90. Ensured working on hybrid, unified, and legacy cgroups. Am able to join server and agent nodes, deploy workloads, and do standard operations successfully. Thank you for the fix!

@brandond
Copy link
Contributor

I just want to call out that hybrid mode is only supported if the required cgroup controllers are in v1. If v2 has claimed them we will not check there for them in hybrid mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging a pull request may close this issue.