Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calicp-kube-controllers can't get API Server: context deadline exceeded #3085

Closed
atimin opened this issue Apr 22, 2022 · 46 comments · Fixed by #3102
Closed

calicp-kube-controllers can't get API Server: context deadline exceeded #3085

atimin opened this issue Apr 22, 2022 · 46 comments · Fixed by #3102
Assignees
Labels
inactive kind/bug Something isn't working version/1.23 affects microk8s version 1.23

Comments

@atimin
Copy link

atimin commented Apr 22, 2022

I have the problem after having installed microk8s on my Ubuntu 21.10 server:

sudo snap install microk8s --channel=1.23 --classic

Checked the pods and saw that one crushed:

$ microk8s.kubectl get pods -n kube-system
calico-node-c7h46                          1/1     Running            1 (7m38s ago)   10m
calico-kube-controllers-5ddf994775-gp8cv   0/1     CrashLoopBackOff   7 (34s ago)     10m

In logs , I see that something is wrong with the API Server:

$ microk8s.kubectl logs calico-kube-controllers-5ddf994775-gp8cv  -n kube-system
2022-04-22 13:26:15.311 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0422 13:26:15.312587       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2022-04-22 13:26:15.313 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2022-04-22 13:26:25.313 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-04-22 13:26:25.313 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded

Result of microk8s.inspect

inspection-report-20220422_132011.tar.gz

@balchua
Copy link
Collaborator

balchua commented Apr 23, 2022

I see these these in the kubelite logs

Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.899796     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs"
Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.916612     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_rr"
Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.918799     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_wrr"
Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.930530     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_sh"
Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.931278     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="nf_conntrack"

I don't know if there is a need to load these manually. I suppose kube-proxy loads them on start thats why it triggered this...
If you can try manually loading these like this.

$ sudo modprobe ip_vs
$ sudo modprobe ip_vs_rr
$ sudo modprobe ip_vs_wrr
$ sudo modprobe ip_vs_sh
$ sudo modprobe nf_conntrack

Then restart microk8s. I am not sure if this will work though.
@neoaggelos @ktsakalozos thoughts?

@nc-kab
Copy link

nc-kab commented Apr 23, 2022

I have exactly the same probelm with my Raspberry Pi cluster running Microk8s.
Manually loading the above mentioned modules did not help.

@burtonr
Copy link

burtonr commented Apr 24, 2022

I too have the same issue, although mine is an existing cluster.

$ snap list
microk8s  v1.22.8        3057   1.22/stable      canonical✓  classic

I noticed the issue after installing the latest (5.13.0-40-generic) kernel and restarting the server.

$ uname -a
Linux homeserver 5.13.0-40-generic #45-Ubuntu SMP Tue Mar 29 14:48:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ kubectl get pods -n kube-system 
NAME                                       READY   STATUS             RESTARTS         AGE
calico-node-cc4m2                          1/1     Running            1 (11m ago)      30m
coredns-7f9c69c78c-kh7df                   0/1     Running            9 (11m ago)      139d
calico-kube-controllers-75c5f98cdc-9xhc7   0/1     CrashLoopBackOff   14 (4m2s ago)    29m
hostpath-provisioner-5c65fbdb4f-w298z      0/1     CrashLoopBackOff   504 (2m5s ago)   139d

The logs of the calido-kube-controllers is the same as above.

I've tried to specify the calico interface in the /var/snap/microk8s/current/args/cni-network/cni.yml file, but that had no effect. I see the calico-node pod is running, and the logs show the expected IP address.

Happy to help in any way.
inspection-report-20220424_004208.tar.gz

@oleo65
Copy link

oleo65 commented Apr 24, 2022

I am having the same issue on Ubuntu 21.10 running on multiple Raspberry Pi 4s with the latest kernel installed 5.13.0-1025-raspi

After long trail and error I at least found a solution to temporarily get the cluster sort of up and running again.

Possible temporary mitigation/solution

  • From the logs of various pods (I have a kubernetes dashboard, traefik, among others) it seems that the master control plane node is not reachable via the internal ip address 10.152.183.1 (same as in opening post).
  • As soon as I forcefully moved the calico-kube-controller pod to the master control plane node, it started working again.
  • From there I drained and cordoned all other nodes except the master (only one in my setup now).
  • After some time the workloads healed and came all up again. 🎉

Side note:

  • I also downgraded the kernel to the previous version as I suspected some kind of change during the last apt upgrade to sudo flash-kernel --force 5.13.0-1024-raspi.
  • This did not suffice in restoring the cluster and I took the route described above.

Hopefully this helps in either fixing the issue and restoring some smaller clusters. 😉

I am happy to provide more diagnostics if needed.

@atimin
Copy link
Author

atimin commented Apr 25, 2022

Hey @balchua, the modules are loaded:

$ sudo lsmod | grep ip_vs
ip_vs_wrr              16384  0
ip_vs_sh               16384  0
ip_vs_rr               16384  0
ip_vs                 163840  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack          151552  6 xt_conntrack,nf_nat,xt_nat,xt_MASQUERADE,ip_vs,xt_REDIRECT
nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs
libcrc32c              16384  7 nf_conntrack,nf_nat,btrfs,nf_tables,xfs,raid456,ip_vs

It maybe the reason for RasPI, but I'm using Ubuntu Server 21.10 for amd64.

@JoergSnn
Copy link

Dear all,

I have the same issue on a single node raspberry pi 4 deployment on ubuntu server 21.10.
I set up a clean virtual machine (amd64) with ubuntu server 21.10 and ran into the same issue after installing microk8s.

Regards

Jörg

@blicknix
Copy link

Dear all,

I have the same problem on an AMD Ubuntu server 21.10 deployment. Even tried to update Ubuntu to 22.04 and microk8s to 1.23.5. But the problem still exists.
I'm happy to test any suggestions on the vm.

Best regards,
Samuel

@AlexsJones
Copy link
Contributor

Hey folks, we'll take a look at this right away. Thank you all for raising this!

@AlexsJones
Copy link
Contributor

AlexsJones commented Apr 25, 2022

Can I please ask you to try our other channel? snap refresh microk8s --channel=latest/edge ?
This will help us/me eliminate a few variables.

@AlexsJones AlexsJones added kind/bug Something isn't working version/1.23 affects microk8s version 1.23 labels Apr 25, 2022
@AlexsJones
Copy link
Contributor

AlexsJones commented Apr 25, 2022

What's the output of ethtool --show-offload vxlan.calico ?

If its on... please try running ethtool --offload vxlan.calico rx off tx off then restart microk8s?

This might be offloading related 🤔

@blicknix
Copy link

blicknix commented Apr 25, 2022

Changed the channel to latest/edge and did an update of microk8s.

root@kubernetes:~# ethtool --offload vxlan.calico rx off tx off
netlink error: no device matches name (offset 24)
netlink error: No such device

A second attempt of the command showed no output

@nc-kab
Copy link

nc-kab commented Apr 25, 2022

Updating to edge did not make a difference for me.
"ethtool --offload vxlan.calico rx off tx off doesn't give any output.

@AlexsJones
Copy link
Contributor

AlexsJones commented Apr 25, 2022

Changed the channel to latest/edge and did an update of microk8s.

root@kubernetes:~# ethtool --offload vxlan.calico rx off tx off
netlink error: no device matches name (offset 24)
netlink error: No such device

A second attempt of the command showed no output

Thanks @blicknix the idea here is we sometimes come across issues with offloading being enabled.

Most likely that interface doesn't exist because calico-node isn't running yet.

@AlexsJones
Copy link
Contributor

AlexsJones commented Apr 25, 2022

Updating to edge did not make a difference for me.
"ethtool --offload vxlan.calico rx off tx off doesn't give any output.

Apologies I reworded the ask, this will disable offloading, if you restart the pods/microk8s lets see whether that brings back up the controller.

I meant to ask for ethtool --show-offload vxlan.calico to see what your settings were.

continuing to debug

@nc-kab
Copy link

nc-kab commented Apr 25, 2022

I did
kubectl rollout restart deployment -n kube-system calico-kube-controllers
The pod is still in crashloop.

Here is the output of ethtool --show-offload vxlan.calico

Features for vxlan.calico:
rx-checksumming: off
tx-checksumming: off
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: off
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
        tx-tcp-segmentation: off [requested on]
        tx-tcp-ecn-segmentation: off [requested on]
        tx-tcp-mangleid-segmentation: off [requested on]
        tx-tcp6-segmentation: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

@dereisele
Copy link

Hi all, I've got the same error on Ubuntu 21.10 on AMD64 with Linux 5.13.0-40-generic and MicroK8s 1.23/stable. Let me know if I should try something

@burtonr
Copy link

burtonr commented Apr 25, 2022

Similar to @nc-kab, I've updated to the latest/edge and restarted the calico pods. Still having the same issue.

The output of ethtool --show-offload vxlan.calico is the same (adding this as I believe @nc-kab is on an RPi, while I am running an R720 with Ubuntu 21.10 in case there may be a difference)

$ ethtool --show-offload vxlan.calico
Features for vxlan.calico:
rx-checksumming: off
tx-checksumming: off
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: off
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [requested on]
	tx-tcp-mangleid-segmentation: off [requested on]
	tx-tcp6-segmentation: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

@ktsakalozos
Copy link
Member

ktsakalozos commented Apr 25, 2022

Here is a source of errors I would appreciate if you could please eliminate.

When the calico CNI sets up the network it needs to select a network interface through which it will route traffic. In /var/snap/microk8s/current/args/cni-network/cni.yaml search for IP_AUTODETECTION_METHOD and you will see that calico will use by default the "first-found" interface to route traffic. It is possible this interface auto-detection method is selecting an inappropriate interface (eg an interface of lxd). Let's try to provide a hint on which interface it should be used. Edit /var/snap/microk8s/current/args/cni-network/cni.yaml and replace first-found with can-reach=<IP_IN_NETWORK_TO_BE USED> with <IP_IN_NETWORK_TO_BE USED> being an IP of a machine in the network we want to use for routing traffic. I think that could be the public facing IP of the host. Then reapply the cni.yaml with microk8s kubectl apply -f /var/snap/microk8s/current/args/cni-network/cni.yaml. In case of multi-node clusters we are able to identify where we should route traffic through because we know where the join node is reached from so this problem should not be present in multi-node clusters.

@blicknix
Copy link

Changed IP_AUTODETECTION_METHOD but didn't change anything in the outcome, but also I only use a Single Node Cluster.

@dereisele
Copy link

Here is a source of errors I would appreciate if you could please eliminate.

When the calico CNI sets up the network it needs to select a network interface through which it will route traffic. In /var/snap/microk8s/current/args/cni-network/cni.yaml search for IP_AUTODETECTION_METHOD and you will see that calico will use by default the "first-found" interface to route traffic. It is possible this interface auto-detection method is selecting an inappropriate interface (eg an interface of lxd). Let's try to provide a hint on which interface it should be used. Edit /var/snap/microk8s/current/args/cni-network/cni.yaml and replace first-found with can-reach=<IP_IN_NETWORK_TO_BE USED> with <IP_IN_NETWORK_TO_BE USED> being an IP of a machine in the network we want to use for routing traffic. I think that could be the public facing IP of the host. Then reapply the cni.yaml with microk8s kubectl apply -f /var/snap/microk8s/current/args/cni-network/cni.yaml. In case of multi-node clusters we are able to identify where we should route traffic through because we know where the join node is reached from so this problem should not be present in multi-node clusters.

Didn't work for me, too, but thank you

@burtonr
Copy link

burtonr commented Apr 25, 2022

@ktsakalozos I've tried that and had no effect on my cluster. I am also running as a single node.
Looking at the logs of the calico-node pod, I can see that it is selecting the appropriate interface. Both before adjusting the IP_AUTODETECTION_METHOD, and after explicitly setting that value.

# Log from calico-node pod
2022-04-25 13:24:05.708 [INFO][9] startup/startup.go 402: Checking datastore connection
2022-04-25 13:24:05.762 [INFO][9] startup/startup.go 426: Datastore connection verified
2022-04-25 13:24:05.762 [INFO][9] startup/startup.go 109: Datastore is ready
2022-04-25 13:24:05.815 [INFO][9] startup/startup.go 714: Using autodetected IPv4 address on interface eno1: 192.168.0.70/24
2022-04-25 13:24:05.815 [INFO][9] startup/startup.go 791: No AS number configured on node resource, using global value
2022-04-25 13:24:05.849 [INFO][9] startup/startup.go 646: FELIX_IPV6SUPPORT is false through environment variable
# Log from calico-kube-controller pod
2022-04-25 14:34:05.963 [INFO][1] main.go 115: Ensuring Calico datastore is initialized
2022-04-25 14:34:15.964 [ERROR][1] client.go 272: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-04-25 14:34:15.964 [FATAL][1] main.go 120: Failed to initialize Calico datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded

@ktsakalozos
Copy link
Member

Could you also modprobe br_netfilter ?

@burtonr
Copy link

burtonr commented Apr 25, 2022

Could you also modprobe br_netfilter ?

$ modprobe br_netfilter
modprobe: ERROR: could not insert 'br_netfilter': Operation not permitted
$ sudo modprobe br_netfilter
[sudo] password for burtonr: 
$ 

@dereisele
Copy link

Same for me. I'm able to run all the modprobes in by terminal, but the error message is still there in the kubelite log

@atimin
Copy link
Author

atimin commented Apr 25, 2022

Hey, @AlexsJones

Can I please ask you to try our other channel? snap refresh microk8s --channel=latest/edge ?

I did it:

$ sudo snap refresh microk8s --channel=latest/edge
microk8s (edge) v1.23.6 from Canonical✓ refreshed

The same problem:

microk8s.kubectl logs calico-kube-controllers-5c668bb7c-dnlnm -n kube-system
2022-04-25 15:21:04.734 [INFO][1] main.go 94: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0425 15:21:04.735812       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2022-04-25 15:21:04.736 [INFO][1] main.go 115: Ensuring Calico datastore is initialized
2022-04-25 15:21:14.737 [ERROR][1] client.go 272: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-04-25 15:21:14.737 [FATAL][1] main.go 120: Failed to initialize Calico datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded

Also please try showing us output from ethtool --offload vxlan.calico rx off tx off

 ethtool --show-offload vxlan.calico
Features for vxlan.calico:
rx-checksumming: off
tx-checksumming: off
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: off
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [requested on]
	tx-tcp-mangleid-segmentation: off [requested on]
	tx-tcp6-segmentation: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

@atimin
Copy link
Author

atimin commented Apr 25, 2022

Changing IP_AUTODETECTION_METHOD doesn't work for me as well.

@dereisele
Copy link

I just upgraded to Ubuntu 22.04 with kernel 5.15 and I still got the same error

@ktsakalozos
Copy link
Member

@dereisele could you share a microk8s inspect tarball?

@blicknix
Copy link

Sorry for the maybe stupid question. Is there an easy way to share the tarball with you and without anybody getting the information?

@ktsakalozos
Copy link
Member

Is there an easy way to share the tarball with you and without anybody getting the information?

@blicknix you can find us in #microk8s on the Kubernetes slack. I am kjackal there, ping me.

@ktsakalozos
Copy link
Member

Hi could you please try this.

  1. Edit /etc/modules and add in there a new line: br_netfilter . This will load br_netfilter at boot time.
  2. sudo microk8s stop to stop MicroK8s services
  3. Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode completely.
  4. sudo modprobe br_netfilter to load the br_netfilter if not already loaded.
  5. sudo microk8s start to start MicroK8s services

@nc-kab
Copy link

nc-kab commented Apr 27, 2022

Hi could you please try this.

  1. Edit /etc/modules and add in there a new line: br_netfilter . This will load br_netfilter at boot time.
  2. sudo microk8s stop to stop MicroK8s services
  3. Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode completely.
  4. sudo modprobe br_netfilter to load the br_netfilter if not already loaded.
  5. sudo microk8s start to start MicroK8s services

It looks like this works! 🎉

@dereisele
Copy link

Hi could you please try this.

1. Edit `/etc/modules` and add in there a new line: `br_netfilter` . This will load `br_netfilter` at boot time.

2. `sudo microk8s stop` to stop MicroK8s services

3. Edit `/var/snap/microk8s/current/args/kube-proxy` and remove the `--proxy-mode` completely.

4. `sudo modprobe br_netfilter` to load the `br_netfilter` if not already loaded.

5. `sudo microk8s start` to start MicroK8s services

This worked for me, too. Thank you very much! 🎉

@burtonr
Copy link

burtonr commented Apr 27, 2022

This worked for me as well. Thank you @ktsakalozos!

Would it be possible to explain what happened, and why br_netfilter and removing the proxy was the fix? I only ask to satisfy my own curiosity. Thanks again

@jramoseguinoa
Copy link

Yesterday I was having a hard time with a new install and came across this issue. I can confirm that this is also working for me too.
Thanks @ktsakalozos !

@usersina
Copy link

I don't see any --proxy-mode under /var/snap/microk8s/current/args/kube-proxy using MicroK8s v1.23.5

# Content of /var/snap/microk8s/current/args/kube-proxy
--kubeconfig=${SNAP_DATA}/credentials/proxy.config
--cluster-cidr=10.1.0.0/16
--healthz-bind-address=127.0.0.1
--profiling=false

Hi could you please try this.

  1. Edit /etc/modules and add in there a new line: br_netfilter . This will load br_netfilter at boot time.
  2. sudo microk8s stop to stop MicroK8s services
  3. Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode completely.
  4. sudo modprobe br_netfilter to load the br_netfilter if not already loaded.
  5. sudo microk8s start to start MicroK8s services

@ktsakalozos
Copy link
Member

Would it be possible to explain what happened, and why br_netfilter and removing the proxy was the fix? I only ask to satisfy my own curiosity.

The CNI used by default in MicroK8s is calico. Calico works best with the br_netfilter kernel module loaded. When MicroK8s starts it tries to load the br_netfilter module, if it fails it sets the proxy-mode to userspace. Userspace routing means that the routing is taken care in userspace instead of via iptable rules. This proxy-mode is the oldest mode and is kept for compatibility reasons. The issue you are seeing is that both MicroK8s fails to load the kernel module and calico fails to play well with the userspace routing. Reproducing this issue is not straight forward. I see it happening under certain conditions on Ubuntu 21.10 but not on any of the 18.04, 20.04, 22.04. Maybe some combination of libraries is at fault here that I only happen to find in 21.10.

In any case, we will be shipping a patch in the following days for this issue. We would appreciate if you could verify that the edge channel of the track you are using works for you. You can test this by doing a fresh install or refreshing to the respective channel, eg assuming you are on the 1.23 track you can do sudo snap refresh microk8s --channel=1.23/edge. Thank you and apologies any the trouble we may have caused.

@usrbinkat
Copy link

usrbinkat commented May 16, 2022

I just validated this workaround on 3 different Fedora 36 microk8s clusters based on comment 1111290817. Thank you!

@andrew-landsverk-win
Copy link

I also had to make these changes on a fresh cluster under Rocky Linux 8. @ktsakalozos , do these changes (#3085 (comment)) made to /var/snap/microk8s/current/args/kube-proxy persist after a microk8s auto update?

@andrew-landsverk-win
Copy link

For a bit of extra context, we are targeting 1.21/stable

@ktsakalozos
Copy link
Member

ktsakalozos commented May 26, 2022

do these changes (#3085 (comment)) made to /var/snap/microk8s/current/args/kube-proxy persist after a microk8s auto update?

Yes these changes will persist through snap refreshes.

@andrew-landsverk-win
Copy link

do these changes (#3085 (comment)) made to /var/snap/microk8s/current/args/kube-proxy persist after a microk8s auto update?

Yes these changes will persist through snap refreshes.

Awesome, thank you!

@fcastello
Copy link

This is happening to me in 1.24, ubuntu 22.04 on raspberry pi

@svabra
Copy link

svabra commented Dec 27, 2022

Yes, appears as if the error was reintroduced. Added br_netfilter as well to the /etc/modules file and restarted entire system. No resolution.
k3s works smoothly. All pods up and running without restart.

ENVIRONMENT

MicroK8s v1.26.0 revision 4390 on a NUC Intel Celeron N5095, 16GB RAM, 1TB SSD
MicroK8s v1.26.0 revision 4390 on a NUC Intel Celeron N3350, 4GB RAM, 512GB SSD

Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy

Reproduce:

NAMESPACE     NAME                                           READY   STATUS             RESTARTS      AGE
kube-system   pod/calico-node-gpj5s                          1/1     Running            3 (32s ago)   57m
kube-system   pod/calico-kube-controllers-7874bcdbb4-5ftc2   0/1     CrashLoopBackOff   14 (9s ago)   57m

NAMESPACE   NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
default     service/kubernetes   ClusterIP   10.152.183.1   <none>        443/TCP   57m

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/calico-node   1         1         1       1            1           kubernetes.io/os=linux   57m

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           57m

NAMESPACE     NAME                                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/calico-kube-controllers-79568db7f8   0         0         0       57m
kube-system   replicaset.apps/calico-kube-controllers-7874bcdbb4   1         1         1       57m

kc logs pod/calico-kube-controllers-7874bcdbb4-5ftc2 -n kube-system -f

2022-12-27 22:40:30.430 [WARNING][1] runconfig.go 162: unable to get KubeControllersConfiguration(default) error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/kubecontrollersconfigurations/default": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:30.430453       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:30.430461       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://10.152.183.1:443/api/v1/nodes?resourceVersion=3520": dial tcp 10.152.183.1:443: connect: no route to host
2022-12-27 22:40:34.242 [ERROR][1] client.go 272: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-12-27 22:40:34.242 [ERROR][1] main.go 242: Failed to verify datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-12-27 22:40:34.498 [ERROR][1] main.go 277: Received bad status code from apiserver error=Get "https://10.152.183.1:443/healthz?timeout=20s": dial tcp 10.152.183.1:443: connect: no route to host status=0
2022-12-27 22:40:34.498 [WARNING][1] runconfig.go 162: unable to get KubeControllersConfiguration(default) error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/kubecontrollersconfigurations/default": dial tcp 10.152.183.1:443: connect: no route to host
W1227 22:40:34.498077       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:34.498257       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host

Copy link

stale bot commented Nov 23, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the inactive label Nov 23, 2023
@stale stale bot closed this as completed Dec 23, 2023
@raveesh-me
Copy link

Following the steps here:
Rpi Cluster using microk8s

To get simillar error:

raveesh@rpifour:~$ sudo microk8s.kubectl get node
E0121 17:12:01.856814  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": net/http: TLS handshake timeout
E0121 17:12:39.979944  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": net/http: TLS handshake timeout
E0121 17:13:26.410135  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": context deadline exceeded
E0121 17:14:17.193100  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0121 17:14:32.681800  184356 request.go:697] Waited for 1.720689523s due to client-side throttling, not priority and fairness, request: GET:https://127.0.0.1:16443/api?timeout=32s
E0121 17:15:17.290230  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": context deadline exceeded
Unable to connect to the server: context deadline exceeded

What could be the reason...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive kind/bug Something isn't working version/1.23 affects microk8s version 1.23
Projects
None yet
Development

Successfully merging a pull request may close this issue.