Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico pods report an error of no route to host #8764

Open
willzhang opened this issue Apr 26, 2024 · 5 comments
Open

calico pods report an error of no route to host #8764

willzhang opened this issue Apr 26, 2024 · 5 comments

Comments

@willzhang
Copy link

willzhang commented Apr 26, 2024

Expected Behavior

When configure the CPU irqaffinity in /etc/default/grub,the calico pods to run normally.

Current Behavior

When configure the CPU irqaffinity in /etc/default/grub,the calico pods crashloopback and report an error of no route to host.

what changed and calico apiserver logs
image

calico kube-controller logs
image

Possible Solution

Removing the kernel parameter CPU irqaffinity, calico will restore normal operation, but we need this parameter for CPU isolation to improve performance.

Steps to Reproduce (for bugs)

  1. install kubernetes and calico with kubeadm,cluster and calico is running
  2. config irqaffinity=0,10 kernel options.
  3. reboot kubernetes node
  4. calico pods crashloopback and report an error of no route to host
root@node31:~# cat /etc/default/grub | grep GRUB_CMDLINE_LINUX
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX="irqaffinity=0,6 noirqbalance intel_iommu=on iommu=pt"
root@node31:~# 
root@node31:~# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-5.15.0-102-generic root=/dev/mapper/ubuntu--vg-lv--0 ro irqaffinity=0,10 noirqbalance intel_iommu=on iommu=pt

and i also use reservedSystemCPUs in kubelet config for system progress

root@node31:~# cat /var/lib/kubelet/config.yaml  |grep -i cpu
cpuCFSQuota: true
cpuCFSQuotaPeriod: 100ms
cpuManagerPolicy: static
cpuManagerReconcilePeriod: 10s
  cpu: 500m
reservedSystemCPUs: 0,10-19
  cpu: 500m

Context

I need to isolate a portion of the exclusive CPU for the VPP application, so I use irqaffinity to concentrate CPU interrupts on other CPUs, eg 0 10.

Your Environment

  • Calico version: v3.26.1, install use helm with calico operator.
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes with kubeadm, v1.25.11, containterd,just one master node.
  • Operating System and version: ubuntu 22.04
  • Link to your project (optional):
@willzhang willzhang changed the title Configuring 'CPU irqaffinity' causes the calico kube controller pod to report an error of no route to host calico pods report an error of no route to host Apr 26, 2024
@fs2016l
Copy link

fs2016l commented Apr 27, 2024

i have same question
Warning FailedCreatePodSandBox 114m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "3d6733531c3caf74141893557e9d697f5ba909a3a08874087ed6892142153048" network for pod "calico-kube-controllers-7b84757b95-576fg": networkPlugin cni failed to set up pod "calico-kube-controllers-7b84757b95-576fg_kube-system" network: plugin type="calico" failed (add): error creating calico client: stat /etc/cni/net.d/calico-kubeconfig: no such file or directory

Warning Unhealthy 113m (x7 over 114m) kubelet Readiness probe failed: Error initializing datastore: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: connect: no route to host

@fs2016l
Copy link

fs2016l commented Apr 27, 2024

When I shut down the firewall, the error disappeared, but I need to use it while the firewall is running

@tomastigera
Copy link
Contributor

@willzhang do you use calico VPP?

@willzhang
Copy link
Author

@willzhang do you use calico VPP?

no, just calico ipip with helm install.

@tomastigera
Copy link
Contributor

@willzhang could you provide any logs from the failing pods? Why they are failing? It does not seem obvious why irq affinity would have such an effect, but perhaps some misconfiguration of network devices? Are you using some overlay? Are queues on the overlay assigned properly? I think vxlan.calico has a single queue only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants