calicp-kube-controllers can't get API Server: context deadline exceeded #3085

atimin · 2022-04-22T13:36:01Z

I have the problem after having installed microk8s on my Ubuntu 21.10 server:

sudo snap install microk8s --channel=1.23 --classic

Checked the pods and saw that one crushed:

$ microk8s.kubectl get pods -n kube-system
calico-node-c7h46                          1/1     Running            1 (7m38s ago)   10m
calico-kube-controllers-5ddf994775-gp8cv   0/1     CrashLoopBackOff   7 (34s ago)     10m

In logs , I see that something is wrong with the API Server:

$ microk8s.kubectl logs calico-kube-controllers-5ddf994775-gp8cv  -n kube-system
2022-04-22 13:26:15.311 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0422 13:26:15.312587       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2022-04-22 13:26:15.313 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2022-04-22 13:26:25.313 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-04-22 13:26:25.313 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded

Result of microk8s.inspect

inspection-report-20220422_132011.tar.gz

The text was updated successfully, but these errors were encountered:

balchua · 2022-04-23T07:27:00Z

I see these these in the kubelite logs

Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.899796     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs"
Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.916612     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_rr"
Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.918799     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_wrr"
Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.930530     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="ip_vs_sh"
Apr 22 13:19:39 drift-test-rig microk8s.daemon-kubelite[843]: I0422 13:19:39.931278     843 proxier.go:657] "Failed to load kernel module with modprobe, you can ignore this message when kube-proxy is running inside container without mounting /lib/modules" moduleName="nf_conntrack"

I don't know if there is a need to load these manually. I suppose kube-proxy loads them on start thats why it triggered this...
If you can try manually loading these like this.

$ sudo modprobe ip_vs
$ sudo modprobe ip_vs_rr
$ sudo modprobe ip_vs_wrr
$ sudo modprobe ip_vs_sh
$ sudo modprobe nf_conntrack

Then restart microk8s. I am not sure if this will work though.
@neoaggelos @ktsakalozos thoughts?

nc-kab · 2022-04-23T18:23:05Z

I have exactly the same probelm with my Raspberry Pi cluster running Microk8s.
Manually loading the above mentioned modules did not help.

burtonr · 2022-04-24T00:45:39Z

I too have the same issue, although mine is an existing cluster.

$ snap list
microk8s  v1.22.8        3057   1.22/stable      canonical✓  classic

I noticed the issue after installing the latest (5.13.0-40-generic) kernel and restarting the server.

$ uname -a
Linux homeserver 5.13.0-40-generic #45-Ubuntu SMP Tue Mar 29 14:48:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl get pods -n kube-system 
NAME                                       READY   STATUS             RESTARTS         AGE
calico-node-cc4m2                          1/1     Running            1 (11m ago)      30m
coredns-7f9c69c78c-kh7df                   0/1     Running            9 (11m ago)      139d
calico-kube-controllers-75c5f98cdc-9xhc7   0/1     CrashLoopBackOff   14 (4m2s ago)    29m
hostpath-provisioner-5c65fbdb4f-w298z      0/1     CrashLoopBackOff   504 (2m5s ago)   139d

The logs of the calido-kube-controllers is the same as above.

I've tried to specify the calico interface in the /var/snap/microk8s/current/args/cni-network/cni.yml file, but that had no effect. I see the calico-node pod is running, and the logs show the expected IP address.

Happy to help in any way.
inspection-report-20220424_004208.tar.gz

oleo65 · 2022-04-24T08:45:09Z

I am having the same issue on Ubuntu 21.10 running on multiple Raspberry Pi 4s with the latest kernel installed 5.13.0-1025-raspi

After long trail and error I at least found a solution to temporarily get the cluster sort of up and running again.

Possible temporary mitigation/solution

From the logs of various pods (I have a kubernetes dashboard, traefik, among others) it seems that the master control plane node is not reachable via the internal ip address 10.152.183.1 (same as in opening post).
As soon as I forcefully moved the calico-kube-controller pod to the master control plane node, it started working again.
From there I drained and cordoned all other nodes except the master (only one in my setup now).
After some time the workloads healed and came all up again. 🎉

Side note:

I also downgraded the kernel to the previous version as I suspected some kind of change during the last apt upgrade to sudo flash-kernel --force 5.13.0-1024-raspi.
This did not suffice in restoring the cluster and I took the route described above.

Hopefully this helps in either fixing the issue and restoring some smaller clusters. 😉

I am happy to provide more diagnostics if needed.

atimin · 2022-04-25T07:05:44Z

Hey @balchua, the modules are loaded:

$ sudo lsmod | grep ip_vs
ip_vs_wrr              16384  0
ip_vs_sh               16384  0
ip_vs_rr               16384  0
ip_vs                 163840  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack          151552  6 xt_conntrack,nf_nat,xt_nat,xt_MASQUERADE,ip_vs,xt_REDIRECT
nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs
libcrc32c              16384  7 nf_conntrack,nf_nat,btrfs,nf_tables,xfs,raid456,ip_vs

It maybe the reason for RasPI, but I'm using Ubuntu Server 21.10 for amd64.

JoergSnn · 2022-04-25T10:28:40Z

Dear all,

I have the same issue on a single node raspberry pi 4 deployment on ubuntu server 21.10.
I set up a clean virtual machine (amd64) with ubuntu server 21.10 and ran into the same issue after installing microk8s.

Regards

Jörg

blicknix · 2022-04-25T10:35:55Z

Dear all,

I have the same problem on an AMD Ubuntu server 21.10 deployment. Even tried to update Ubuntu to 22.04 and microk8s to 1.23.5. But the problem still exists.
I'm happy to test any suggestions on the vm.

Best regards,
Samuel

AlexsJones · 2022-04-25T12:26:17Z

Hey folks, we'll take a look at this right away. Thank you all for raising this!

AlexsJones · 2022-04-25T12:31:26Z

Can I please ask you to try our other channel? snap refresh microk8s --channel=latest/edge ?
This will help us/me eliminate a few variables.

AlexsJones · 2022-04-25T12:36:28Z

What's the output of ethtool --show-offload vxlan.calico ?

If its on... please try running ethtool --offload vxlan.calico rx off tx off then restart microk8s?

This might be offloading related 🤔

blicknix · 2022-04-25T12:42:20Z

Changed the channel to latest/edge and did an update of microk8s.

root@kubernetes:~# ethtool --offload vxlan.calico rx off tx off
netlink error: no device matches name (offset 24)
netlink error: No such device

A second attempt of the command showed no output

nc-kab · 2022-04-25T12:50:07Z

Updating to edge did not make a difference for me.
"ethtool --offload vxlan.calico rx off tx off doesn't give any output.

AlexsJones · 2022-04-25T12:52:04Z

Changed the channel to latest/edge and did an update of microk8s.
root@kubernetes:~# ethtool --offload vxlan.calico rx off tx off
netlink error: no device matches name (offset 24)
netlink error: No such device
A second attempt of the command showed no output

Thanks @blicknix the idea here is we sometimes come across issues with offloading being enabled.

Most likely that interface doesn't exist because calico-node isn't running yet.

AlexsJones · 2022-04-25T12:52:29Z

Updating to edge did not make a difference for me.
"ethtool --offload vxlan.calico rx off tx off doesn't give any output.

Apologies I reworded the ask, this will disable offloading, if you restart the pods/microk8s lets see whether that brings back up the controller.

I meant to ask for ethtool --show-offload vxlan.calico to see what your settings were.

continuing to debug

nc-kab · 2022-04-25T12:57:01Z

I did
kubectl rollout restart deployment -n kube-system calico-kube-controllers
The pod is still in crashloop.

Here is the output of ethtool --show-offload vxlan.calico

Features for vxlan.calico:
rx-checksumming: off
tx-checksumming: off
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: off
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
        tx-tcp-segmentation: off [requested on]
        tx-tcp-ecn-segmentation: off [requested on]
        tx-tcp-mangleid-segmentation: off [requested on]
        tx-tcp6-segmentation: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

dereisele · 2022-04-25T13:19:09Z

Hi all, I've got the same error on Ubuntu 21.10 on AMD64 with Linux 5.13.0-40-generic and MicroK8s 1.23/stable. Let me know if I should try something

burtonr · 2022-04-25T13:27:51Z

Similar to @nc-kab, I've updated to the latest/edge and restarted the calico pods. Still having the same issue.

The output of ethtool --show-offload vxlan.calico is the same (adding this as I believe @nc-kab is on an RPi, while I am running an R720 with Ubuntu 21.10 in case there may be a difference)

$ ethtool --show-offload vxlan.calico
Features for vxlan.calico:
rx-checksumming: off
tx-checksumming: off
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: off
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [requested on]
	tx-tcp-mangleid-segmentation: off [requested on]
	tx-tcp6-segmentation: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

ktsakalozos · 2022-04-25T14:05:21Z

Here is a source of errors I would appreciate if you could please eliminate.

When the calico CNI sets up the network it needs to select a network interface through which it will route traffic. In /var/snap/microk8s/current/args/cni-network/cni.yaml search for IP_AUTODETECTION_METHOD and you will see that calico will use by default the "first-found" interface to route traffic. It is possible this interface auto-detection method is selecting an inappropriate interface (eg an interface of lxd). Let's try to provide a hint on which interface it should be used. Edit /var/snap/microk8s/current/args/cni-network/cni.yaml and replace first-found with can-reach=<IP_IN_NETWORK_TO_BE USED> with <IP_IN_NETWORK_TO_BE USED> being an IP of a machine in the network we want to use for routing traffic. I think that could be the public facing IP of the host. Then reapply the cni.yaml with microk8s kubectl apply -f /var/snap/microk8s/current/args/cni-network/cni.yaml. In case of multi-node clusters we are able to identify where we should route traffic through because we know where the join node is reached from so this problem should not be present in multi-node clusters.

blicknix · 2022-04-25T14:15:34Z

Changed IP_AUTODETECTION_METHOD but didn't change anything in the outcome, but also I only use a Single Node Cluster.

dereisele · 2022-04-25T14:19:55Z

Here is a source of errors I would appreciate if you could please eliminate.

When the calico CNI sets up the network it needs to select a network interface through which it will route traffic. In /var/snap/microk8s/current/args/cni-network/cni.yaml search for IP_AUTODETECTION_METHOD and you will see that calico will use by default the "first-found" interface to route traffic. It is possible this interface auto-detection method is selecting an inappropriate interface (eg an interface of lxd). Let's try to provide a hint on which interface it should be used. Edit /var/snap/microk8s/current/args/cni-network/cni.yaml and replace first-found with can-reach=<IP_IN_NETWORK_TO_BE USED> with <IP_IN_NETWORK_TO_BE USED> being an IP of a machine in the network we want to use for routing traffic. I think that could be the public facing IP of the host. Then reapply the cni.yaml with microk8s kubectl apply -f /var/snap/microk8s/current/args/cni-network/cni.yaml. In case of multi-node clusters we are able to identify where we should route traffic through because we know where the join node is reached from so this problem should not be present in multi-node clusters.

Didn't work for me, too, but thank you

burtonr · 2022-04-25T14:35:41Z

@ktsakalozos I've tried that and had no effect on my cluster. I am also running as a single node.
Looking at the logs of the calico-node pod, I can see that it is selecting the appropriate interface. Both before adjusting the IP_AUTODETECTION_METHOD, and after explicitly setting that value.

# Log from calico-node pod
2022-04-25 13:24:05.708 [INFO][9] startup/startup.go 402: Checking datastore connection
2022-04-25 13:24:05.762 [INFO][9] startup/startup.go 426: Datastore connection verified
2022-04-25 13:24:05.762 [INFO][9] startup/startup.go 109: Datastore is ready
2022-04-25 13:24:05.815 [INFO][9] startup/startup.go 714: Using autodetected IPv4 address on interface eno1: 192.168.0.70/24
2022-04-25 13:24:05.815 [INFO][9] startup/startup.go 791: No AS number configured on node resource, using global value
2022-04-25 13:24:05.849 [INFO][9] startup/startup.go 646: FELIX_IPV6SUPPORT is false through environment variable

# Log from calico-kube-controller pod
2022-04-25 14:34:05.963 [INFO][1] main.go 115: Ensuring Calico datastore is initialized
2022-04-25 14:34:15.964 [ERROR][1] client.go 272: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-04-25 14:34:15.964 [FATAL][1] main.go 120: Failed to initialize Calico datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded

ktsakalozos · 2022-04-25T14:41:04Z

Could you also modprobe br_netfilter ?

burtonr · 2022-04-25T14:47:26Z

Could you also modprobe br_netfilter ?

$ modprobe br_netfilter
modprobe: ERROR: could not insert 'br_netfilter': Operation not permitted
$ sudo modprobe br_netfilter
[sudo] password for burtonr: 
$

dereisele · 2022-04-25T14:48:09Z

Same for me. I'm able to run all the modprobes in by terminal, but the error message is still there in the kubelite log

atimin · 2022-04-25T15:24:04Z

Hey, @AlexsJones

Can I please ask you to try our other channel? snap refresh microk8s --channel=latest/edge ?

I did it:

$ sudo snap refresh microk8s --channel=latest/edge
microk8s (edge) v1.23.6 from Canonical✓ refreshed

The same problem:

microk8s.kubectl logs calico-kube-controllers-5c668bb7c-dnlnm -n kube-system
2022-04-25 15:21:04.734 [INFO][1] main.go 94: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0425 15:21:04.735812       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2022-04-25 15:21:04.736 [INFO][1] main.go 115: Ensuring Calico datastore is initialized
2022-04-25 15:21:14.737 [ERROR][1] client.go 272: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-04-25 15:21:14.737 [FATAL][1] main.go 120: Failed to initialize Calico datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded

Also please try showing us output from ethtool --offload vxlan.calico rx off tx off

 ethtool --show-offload vxlan.calico
Features for vxlan.calico:
rx-checksumming: off
tx-checksumming: off
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: off
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
	tx-tcp-segmentation: off [requested on]
	tx-tcp-ecn-segmentation: off [requested on]
	tx-tcp-mangleid-segmentation: off [requested on]
	tx-tcp6-segmentation: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

atimin · 2022-04-25T15:34:43Z

Changing IP_AUTODETECTION_METHOD doesn't work for me as well.

dereisele · 2022-04-26T14:31:38Z

I just upgraded to Ubuntu 22.04 with kernel 5.15 and I still got the same error

ktsakalozos · 2022-04-26T20:38:23Z

@dereisele could you share a microk8s inspect tarball?

blicknix · 2022-04-27T06:32:43Z

Sorry for the maybe stupid question. Is there an easy way to share the tarball with you and without anybody getting the information?

ktsakalozos · 2022-04-27T07:07:12Z

Is there an easy way to share the tarball with you and without anybody getting the information?

@blicknix you can find us in #microk8s on the Kubernetes slack. I am kjackal there, ping me.

ktsakalozos · 2022-04-27T12:00:56Z

Hi could you please try this.

Edit /etc/modules and add in there a new line: br_netfilter . This will load br_netfilter at boot time.
sudo microk8s stop to stop MicroK8s services
Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode completely.
sudo modprobe br_netfilter to load the br_netfilter if not already loaded.
sudo microk8s start to start MicroK8s services

nc-kab · 2022-04-27T12:13:20Z

Hi could you please try this.

Edit /etc/modules and add in there a new line: br_netfilter . This will load br_netfilter at boot time.

sudo microk8s stop to stop MicroK8s services

Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode completely.

sudo modprobe br_netfilter to load the br_netfilter if not already loaded.

sudo microk8s start to start MicroK8s services

It looks like this works! 🎉

dereisele · 2022-04-27T12:53:52Z

Hi could you please try this.

1. Edit `/etc/modules` and add in there a new line: `br_netfilter` . This will load `br_netfilter` at boot time.

2. `sudo microk8s stop` to stop MicroK8s services

3. Edit `/var/snap/microk8s/current/args/kube-proxy` and remove the `--proxy-mode` completely.

4. `sudo modprobe br_netfilter` to load the `br_netfilter` if not already loaded.

5. `sudo microk8s start` to start MicroK8s services

This worked for me, too. Thank you very much! 🎉

burtonr · 2022-04-27T13:09:37Z

This worked for me as well. Thank you @ktsakalozos!

Would it be possible to explain what happened, and why br_netfilter and removing the proxy was the fix? I only ask to satisfy my own curiosity. Thanks again

jramoseguinoa · 2022-04-27T14:26:27Z

Yesterday I was having a hard time with a new install and came across this issue. I can confirm that this is also working for me too.
Thanks @ktsakalozos !

usersina · 2022-04-27T17:30:56Z

I don't see any --proxy-mode under /var/snap/microk8s/current/args/kube-proxy using MicroK8s v1.23.5

# Content of /var/snap/microk8s/current/args/kube-proxy
--kubeconfig=${SNAP_DATA}/credentials/proxy.config
--cluster-cidr=10.1.0.0/16
--healthz-bind-address=127.0.0.1
--profiling=false

Hi could you please try this.

Edit /etc/modules and add in there a new line: br_netfilter . This will load br_netfilter at boot time.

sudo microk8s stop to stop MicroK8s services

Edit /var/snap/microk8s/current/args/kube-proxy and remove the --proxy-mode completely.

sudo modprobe br_netfilter to load the br_netfilter if not already loaded.

sudo microk8s start to start MicroK8s services

ktsakalozos · 2022-04-28T12:46:13Z

Would it be possible to explain what happened, and why br_netfilter and removing the proxy was the fix? I only ask to satisfy my own curiosity.

The CNI used by default in MicroK8s is calico. Calico works best with the br_netfilter kernel module loaded. When MicroK8s starts it tries to load the br_netfilter module, if it fails it sets the proxy-mode to userspace. Userspace routing means that the routing is taken care in userspace instead of via iptable rules. This proxy-mode is the oldest mode and is kept for compatibility reasons. The issue you are seeing is that both MicroK8s fails to load the kernel module and calico fails to play well with the userspace routing. Reproducing this issue is not straight forward. I see it happening under certain conditions on Ubuntu 21.10 but not on any of the 18.04, 20.04, 22.04. Maybe some combination of libraries is at fault here that I only happen to find in 21.10.

In any case, we will be shipping a patch in the following days for this issue. We would appreciate if you could verify that the edge channel of the track you are using works for you. You can test this by doing a fresh install or refreshing to the respective channel, eg assuming you are on the 1.23 track you can do sudo snap refresh microk8s --channel=1.23/edge. Thank you and apologies any the trouble we may have caused.

usrbinkat · 2022-05-16T00:02:33Z

I just validated this workaround on 3 different Fedora 36 microk8s clusters based on comment 1111290817. Thank you!

andrew-landsverk-win · 2022-05-24T15:44:19Z

I also had to make these changes on a fresh cluster under Rocky Linux 8. @ktsakalozos , do these changes (#3085 (comment)) made to /var/snap/microk8s/current/args/kube-proxy persist after a microk8s auto update?

andrew-landsverk-win · 2022-05-24T19:48:51Z

For a bit of extra context, we are targeting 1.21/stable

ktsakalozos · 2022-05-26T13:59:01Z

do these changes (#3085 (comment)) made to /var/snap/microk8s/current/args/kube-proxy persist after a microk8s auto update?

Yes these changes will persist through snap refreshes.

andrew-landsverk-win · 2022-05-26T14:16:18Z

do these changes (#3085 (comment)) made to /var/snap/microk8s/current/args/kube-proxy persist after a microk8s auto update?

Yes these changes will persist through snap refreshes.

Awesome, thank you!

fcastello · 2022-10-26T03:53:55Z

This is happening to me in 1.24, ubuntu 22.04 on raspberry pi

svabra · 2022-12-27T22:56:04Z

Yes, appears as if the error was reintroduced. Added br_netfilter as well to the /etc/modules file and restarted entire system. No resolution.
k3s works smoothly. All pods up and running without restart.

ENVIRONMENT

MicroK8s v1.26.0 revision 4390 on a NUC Intel Celeron N5095, 16GB RAM, 1TB SSD
MicroK8s v1.26.0 revision 4390 on a NUC Intel Celeron N3350, 4GB RAM, 512GB SSD

Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy

Reproduce:

Install accordingly: https://microk8s.io/docs/getting-started
microk8s kubectl get all -A

NAMESPACE     NAME                                           READY   STATUS             RESTARTS      AGE
kube-system   pod/calico-node-gpj5s                          1/1     Running            3 (32s ago)   57m
kube-system   pod/calico-kube-controllers-7874bcdbb4-5ftc2   0/1     CrashLoopBackOff   14 (9s ago)   57m

NAMESPACE   NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
default     service/kubernetes   ClusterIP   10.152.183.1   <none>        443/TCP   57m

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/calico-node   1         1         1       1            1           kubernetes.io/os=linux   57m

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           57m

NAMESPACE     NAME                                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/calico-kube-controllers-79568db7f8   0         0         0       57m
kube-system   replicaset.apps/calico-kube-controllers-7874bcdbb4   1         1         1       57m

kc logs pod/calico-kube-controllers-7874bcdbb4-5ftc2 -n kube-system -f

2022-12-27 22:40:30.430 [WARNING][1] runconfig.go 162: unable to get KubeControllersConfiguration(default) error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/kubecontrollersconfigurations/default": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:30.430453       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:30.430461       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://10.152.183.1:443/api/v1/nodes?resourceVersion=3520": dial tcp 10.152.183.1:443: connect: no route to host
2022-12-27 22:40:34.242 [ERROR][1] client.go 272: Error getting cluster information config ClusterInformation="default" error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-12-27 22:40:34.242 [ERROR][1] main.go 242: Failed to verify datastore error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2022-12-27 22:40:34.498 [ERROR][1] main.go 277: Received bad status code from apiserver error=Get "https://10.152.183.1:443/healthz?timeout=20s": dial tcp 10.152.183.1:443: connect: no route to host status=0
2022-12-27 22:40:34.498 [WARNING][1] runconfig.go 162: unable to get KubeControllersConfiguration(default) error=Get "https://10.152.183.1:443/apis/crd.projectcalico.org/v1/kubecontrollersconfigurations/default": dial tcp 10.152.183.1:443: connect: no route to host
W1227 22:40:34.498077       1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host
E1227 22:40:34.498257       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.23.3/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get "https://10.152.183.1:443/api/v1/pods?resourceVersion=3624": dial tcp 10.152.183.1:443: connect: no route to host

stale · 2023-11-23T01:35:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

raveesh-me · 2024-01-21T11:52:58Z

Following the steps here:
Rpi Cluster using microk8s

To get simillar error:

raveesh@rpifour:~$ sudo microk8s.kubectl get node
E0121 17:12:01.856814  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": net/http: TLS handshake timeout
E0121 17:12:39.979944  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": net/http: TLS handshake timeout
E0121 17:13:26.410135  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": context deadline exceeded
E0121 17:14:17.193100  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0121 17:14:32.681800  184356 request.go:697] Waited for 1.720689523s due to client-side throttling, not priority and fairness, request: GET:https://127.0.0.1:16443/api?timeout=32s
E0121 17:15:17.290230  184356 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:16443/api?timeout=32s": context deadline exceeded
Unable to connect to the server: context deadline exceeded

What could be the reason...

AlexsJones assigned berkayoz Apr 25, 2022

AlexsJones added kind/bug Something isn't working version/1.23 affects microk8s version 1.23 labels Apr 25, 2022

This was referenced Apr 27, 2022

Include kernel module utils in the snap #3102

Merged

Basic new install ... pods in CrashLoopbackOff #3082

Closed

ktsakalozos closed this as completed in #3102 Apr 28, 2022

ktsakalozos reopened this Apr 28, 2022

neoaggelos mentioned this issue May 9, 2022

improve handling for br_netfilter module #3137

Merged

romeroyonatan mentioned this issue Jul 11, 2022

Pods fail to communicate with Apiserver on RHEL 8.4 #2473

Closed

JoergSnn mentioned this issue Aug 14, 2022

calico-kube-controller stays in pending state #3389

Closed

marxangels mentioned this issue Dec 25, 2022

CrashLoopBackOff on new installed microk8s v1.25.4 ubuntu 22.04 #3640

Closed

stale bot added the inactive label Nov 23, 2023

stale bot closed this as completed Dec 23, 2023

calicp-kube-controllers can't get API Server: context deadline exceeded #3085

calicp-kube-controllers can't get API Server: context deadline exceeded #3085

Comments

atimin commented Apr 22, 2022 • edited Loading

balchua commented Apr 23, 2022

nc-kab commented Apr 23, 2022 • edited Loading

burtonr commented Apr 24, 2022 • edited Loading

oleo65 commented Apr 24, 2022

Possible temporary mitigation/solution

Side note:

atimin commented Apr 25, 2022

JoergSnn commented Apr 25, 2022

blicknix commented Apr 25, 2022

AlexsJones commented Apr 25, 2022

AlexsJones commented Apr 25, 2022 • edited Loading

AlexsJones commented Apr 25, 2022 • edited Loading

blicknix commented Apr 25, 2022 • edited Loading

nc-kab commented Apr 25, 2022

AlexsJones commented Apr 25, 2022 • edited Loading

AlexsJones commented Apr 25, 2022 • edited Loading

nc-kab commented Apr 25, 2022

dereisele commented Apr 25, 2022

burtonr commented Apr 25, 2022

ktsakalozos commented Apr 25, 2022 • edited Loading

blicknix commented Apr 25, 2022

dereisele commented Apr 25, 2022

burtonr commented Apr 25, 2022

ktsakalozos commented Apr 25, 2022

burtonr commented Apr 25, 2022

dereisele commented Apr 25, 2022

atimin commented Apr 25, 2022 • edited Loading

atimin commented Apr 25, 2022

dereisele commented Apr 26, 2022

ktsakalozos commented Apr 26, 2022

blicknix commented Apr 27, 2022

ktsakalozos commented Apr 27, 2022

ktsakalozos commented Apr 27, 2022

nc-kab commented Apr 27, 2022

dereisele commented Apr 27, 2022

burtonr commented Apr 27, 2022

jramoseguinoa commented Apr 27, 2022

usersina commented Apr 27, 2022

ktsakalozos commented Apr 28, 2022

usrbinkat commented May 16, 2022 • edited Loading

andrew-landsverk-win commented May 24, 2022

andrew-landsverk-win commented May 24, 2022

ktsakalozos commented May 26, 2022 • edited Loading

andrew-landsverk-win commented May 26, 2022

fcastello commented Oct 26, 2022

svabra commented Dec 27, 2022 • edited Loading

stale bot commented Nov 23, 2023

raveesh-me commented Jan 21, 2024

atimin commented Apr 22, 2022 •

edited

Loading

nc-kab commented Apr 23, 2022 •

edited

Loading

burtonr commented Apr 24, 2022 •

edited

Loading

AlexsJones commented Apr 25, 2022 •

edited

Loading

AlexsJones commented Apr 25, 2022 •

edited

Loading

blicknix commented Apr 25, 2022 •

edited

Loading

AlexsJones commented Apr 25, 2022 •

edited

Loading

AlexsJones commented Apr 25, 2022 •

edited

Loading

ktsakalozos commented Apr 25, 2022 •

edited

Loading

atimin commented Apr 25, 2022 •

edited

Loading

usrbinkat commented May 16, 2022 •

edited

Loading

ktsakalozos commented May 26, 2022 •

edited

Loading

svabra commented Dec 27, 2022 •

edited

Loading