Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker nodes fail to start after reboot as nf_conntrack kernel module not loaded #4462

Closed
rp42 opened this issue Mar 15, 2024 · 8 comments · Fixed by #4705
Closed

Worker nodes fail to start after reboot as nf_conntrack kernel module not loaded #4462

rp42 opened this issue Mar 15, 2024 · 8 comments · Fixed by #4705

Comments

@rp42
Copy link

rp42 commented Mar 15, 2024

Summary

I added a node as a worker to a small cluster that had GPU enabled. It works fine initially, but on rebooting the node the microk8s.daemon-kubelite.service fails to start as it is unable to open /proc/sys/net/netfilter/nf_conntrack_max :

microk8s.daemon-kubelite[2119]: E0314 18:28:26.546586    2119 server.go:537] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory"

Adding nf_conntrack to the end of /etc/modules-load.d/modules.conf in the worker node VM works around the issue.

Nodes are all running Ubuntu Server 22.04.4 and microk8s v1.28.7 from snap. They run as VMs in a Proxmox cluster.

What Should Happen Instead?

Node should come up into Ready status after it is rebooted

Reproduction Steps

  1. Single node with GPU enabled, but no GPU h/w
  2. Add a GPU node as a worker to the non-GPU node
  3. Verify all nodes are ready and cluster is functional
  4. Reboot the GPU node and wait for it to return to Ready status

Introspection Report

Please contact me directly if this is required.

Can you suggest a fix?

Ensure the nf_conntrack module is loaded on worker nodes as it is on full nodes.

Are you interested in contributing with a fix?

Not sure where to fix this issue properly.

@andrew-landsverk-win
Copy link

We're running microk8s on Red Hat 9 and saw the same problem during our patching for this cycle. The suggested fix of adding nf_conntrack to modules.conf has also corrected the issue on our end. Is there a long term fix coming for this issue?

Thanks!

@geocomm-jmeunier
Copy link

We're running microk8s on Ubuntu 22.04 and saw this problem in different environments. The suggested fix of adding nf_conntrack to modules.conf has fixed our issue. We would appreciate a long-term fix.

@SphtKr
Copy link

SphtKr commented Jun 22, 2024

Seeing this also with 1.29.4 (snap) on Ubuntu 22.04, single node cluster with Calico (and multus, if relevant). The modules.conf worked for me as well, but I have no idea what changed or why it didn't require this before.

@giner
Copy link
Contributor

giner commented Jul 1, 2024

We are experiencing the same issue with Microk8s 1.29.4 running on AWS EC2 with Ubuntu 22.04. The current workaround is to forcefully load the module however it's not clear what has changed since before and why we have to do this manually.

echo nf_conntrack | sudo tee /etc/modules-load.d/nf_conntrack.conf

@pvginkel
Copy link

I suddenly got the same issue on Ubuntu 24.04. The fix solved the issue for me also.

@hackstepz
Copy link

I suddenly got the same issue on Ubuntu 24.04 Desktop. The fix solved the issue for me also.

@henioStraszny
Copy link

We're running microk8s on Debian 12 and saw this problem. The suggested fix of adding nf_conntrack to modules.conf has fixed our issue. We would appreciate a long-term fix.

@bschimke95
Copy link
Contributor

Hey folks,
the fix for this issue just got backported up until 1.28 and should be available in a couple of hours. (See e.g. #4733)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants