Worker nodes fail to start after reboot as nf_conntrack kernel module not loaded #4462

rp42 · 2024-03-15T15:53:41Z

Summary

I added a node as a worker to a small cluster that had GPU enabled. It works fine initially, but on rebooting the node the microk8s.daemon-kubelite.service fails to start as it is unable to open /proc/sys/net/netfilter/nf_conntrack_max :

microk8s.daemon-kubelite[2119]: E0314 18:28:26.546586    2119 server.go:537] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory"

Adding nf_conntrack to the end of /etc/modules-load.d/modules.conf in the worker node VM works around the issue.

Nodes are all running Ubuntu Server 22.04.4 and microk8s v1.28.7 from snap. They run as VMs in a Proxmox cluster.

What Should Happen Instead?

Node should come up into Ready status after it is rebooted

Reproduction Steps

Single node with GPU enabled, but no GPU h/w
Add a GPU node as a worker to the non-GPU node
Verify all nodes are ready and cluster is functional
Reboot the GPU node and wait for it to return to Ready status

Introspection Report

Please contact me directly if this is required.

Can you suggest a fix?

Ensure the nf_conntrack module is loaded on worker nodes as it is on full nodes.

Are you interested in contributing with a fix?

Not sure where to fix this issue properly.

The text was updated successfully, but these errors were encountered:

andrew-landsverk-win · 2024-04-24T16:48:09Z

We're running microk8s on Red Hat 9 and saw the same problem during our patching for this cycle. The suggested fix of adding nf_conntrack to modules.conf has also corrected the issue on our end. Is there a long term fix coming for this issue?

Thanks!

geocomm-jmeunier · 2024-05-22T16:57:43Z

We're running microk8s on Ubuntu 22.04 and saw this problem in different environments. The suggested fix of adding nf_conntrack to modules.conf has fixed our issue. We would appreciate a long-term fix.

SphtKr · 2024-06-22T23:12:36Z

Seeing this also with 1.29.4 (snap) on Ubuntu 22.04, single node cluster with Calico (and multus, if relevant). The modules.conf worked for me as well, but I have no idea what changed or why it didn't require this before.

giner · 2024-07-01T07:18:14Z

We are experiencing the same issue with Microk8s 1.29.4 running on AWS EC2 with Ubuntu 22.04. The current workaround is to forcefully load the module however it's not clear what has changed since before and why we have to do this manually.

echo nf_conntrack | sudo tee /etc/modules-load.d/nf_conntrack.conf

pvginkel · 2024-07-14T06:40:01Z

I suddenly got the same issue on Ubuntu 24.04. The fix solved the issue for me also.

hackstepz · 2024-07-22T14:11:55Z

I suddenly got the same issue on Ubuntu 24.04 Desktop. The fix solved the issue for me also.

henioStraszny · 2024-10-01T14:17:59Z

We're running microk8s on Debian 12 and saw this problem. The suggested fix of adding nf_conntrack to modules.conf has fixed our issue. We would appreciate a long-term fix.

bschimke95 · 2024-10-29T17:39:10Z

Hey folks,
the fix for this issue just got backported up until 1.28 and should be available in a couple of hours. (See e.g. #4733)

bschimke95 mentioned this issue Aug 28, 2024

kube-proxy service start failure on boot canonical/k8s-snap#626

Open

aznashwan mentioned this issue Oct 16, 2024

fix: ensure nf_conntrack module loaded for kubelite. #4705

Merged

3 tasks

bschimke95 closed this as completed in #4705 Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker nodes fail to start after reboot as nf_conntrack kernel module not loaded #4462

Worker nodes fail to start after reboot as nf_conntrack kernel module not loaded #4462

rp42 commented Mar 15, 2024 •

edited

Loading

andrew-landsverk-win commented Apr 24, 2024

geocomm-jmeunier commented May 22, 2024

SphtKr commented Jun 22, 2024

giner commented Jul 1, 2024

pvginkel commented Jul 14, 2024

hackstepz commented Jul 22, 2024

henioStraszny commented Oct 1, 2024

bschimke95 commented Oct 29, 2024

Worker nodes fail to start after reboot as nf_conntrack kernel module not loaded #4462

Worker nodes fail to start after reboot as nf_conntrack kernel module not loaded #4462

Comments

rp42 commented Mar 15, 2024 • edited Loading

Summary

What Should Happen Instead?

Reproduction Steps

Introspection Report

Can you suggest a fix?

Are you interested in contributing with a fix?

andrew-landsverk-win commented Apr 24, 2024

geocomm-jmeunier commented May 22, 2024

SphtKr commented Jun 22, 2024

giner commented Jul 1, 2024

pvginkel commented Jul 14, 2024

hackstepz commented Jul 22, 2024

henioStraszny commented Oct 1, 2024

bschimke95 commented Oct 29, 2024

rp42 commented Mar 15, 2024 •

edited

Loading