Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services are unreachable after node is restarted #27848

Closed
2 tasks done
3u13r opened this issue Aug 31, 2023 · 24 comments
Closed
2 tasks done

Services are unreachable after node is restarted #27848

3u13r opened this issue Aug 31, 2023 · 24 comments
Assignees
Labels
info-completed The GH issue has received a reply from the author kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.

Comments

@3u13r
Copy link
Contributor

3u13r commented Aug 31, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When I install cilium in version >=1.14.0 and restart a Kubernetes node the node's pod cannot reach any service.

Reproduction:

  • Install Cilium 1.14.1 on Kubernetes 1.28
  • Restart a Kubernetes node kubectl debug node/<node-name> --image=ubuntu -- bash -c "echo reboot > reboot.sh && chroot /host < reboot.sh"
  • look at any pod that needs to connect to e.g. the APIServer and observe {"level":"error","ts":"2023-08-31T12:25:29Z","logger":"setup","msg":"unable to start manager","error":"Get \"https://10.96.0.1:443/api?timeout=32s\": dial tcp 10.96.0.1:443: i/o timeout"}

One can also observe the attached traffic flow when doing an nslookup inside the pod on this node:

root@fedora:/home/cilium# tcpdump -n -i any host 10.96.0.10 or host 10.244.0.24
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
12:33:05.057842 lxcf96ef40da1c4 In  IP 10.244.0.24.33911 > 10.96.0.10.53: 4076+ A? kubernetes.default.default.svc.cluster.local. (62)
12:33:05.057919 ens5  Out IP 192.168.179.24.33911 > 10.96.0.10.53: 4076+ A? kubernetes.default.default.svc.cluster.local. (62)
12:33:10.057791 lxcf96ef40da1c4 In  IP 10.244.0.24.33911 > 10.96.0.10.53: 4076+ A? kubernetes.default.default.svc.cluster.local. (62)
12:33:10.057817 ens5  Out IP 192.168.179.24.33911 > 10.96.0.10.53: 4076+ A? kubernetes.default.default.svc.cluster.local. (62)
12:33:15.057884 lxcf96ef40da1c4 In  IP 10.244.0.24.33911 > 10.96.0.10.53: 4076+ A? kubernetes.default.default.svc.cluster.local. (62)
12:33:15.057915 ens5  Out IP 192.168.179.24.33911 > 10.96.0.10.53: 4076+ A? kubernetes.default.default.svc.cluster.local. (62)

This should look like a flow on a not yet restarted node:

root@fedora:/home/cilium# tcpdump -n -i any host 10.96.0.10 or host 10.244.3.160
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
12:36:12.381250 lxc15c1e0437997 In  IP 10.244.3.160.35892 > 10.244.1.159.53: 60216+ A? kubernetes.default.default.svc.cluster.local. (62)
12:36:12.381262 cilium_vxlan Out IP 10.244.3.160.35892 > 10.244.1.159.53: 60216+ A? kubernetes.default.default.svc.cluster.local. (62)
12:36:12.381741 cilium_vxlan P   IP 10.244.1.159.53 > 10.244.3.160.35892: 60216 NXDomain*- 0/1/0 (155)
12:36:12.381763 lxc15c1e0437997 Out IP 10.244.1.159.53 > 10.244.3.160.35892: 60216 NXDomain*- 0/1/0 (155)
12:36:12.381998 lxc15c1e0437997 In  IP 10.244.3.160.39099 > 10.244.2.24.53: 43204+ A? kubernetes.default.svc.cluster.local. (54)
12:36:12.382009 cilium_vxlan Out IP 10.244.3.160.39099 > 10.244.2.24.53: 43204+ A? kubernetes.default.svc.cluster.local. (54)
12:36:12.382417 cilium_vxlan P   IP 10.244.2.24.53 > 10.244.3.160.39099: 43204*- 1/0/0 A 10.96.0.1 (106)
12:36:12.382450 lxc15c1e0437997 Out IP 10.244.2.24.53 > 10.244.3.160.39099: 43204*- 1/0/0 A 10.96.0.1 (106)
12:36:12.382728 lxc15c1e0437997 In  IP 10.244.3.160.33187 > 10.244.1.159.53: 28871+ AAAA? kubernetes.default.svc.cluster.local. (54)
12:36:12.382735 cilium_vxlan Out IP 10.244.3.160.33187 > 10.244.1.159.53: 28871+ AAAA? kubernetes.default.svc.cluster.local. (54)
12:36:12.383005 cilium_vxlan P   IP 10.244.1.159.53 > 10.244.3.160.33187: 28871*- 0/1/0 (147)
12:36:12.383026 lxc15c1e0437997 Out IP 10.244.1.159.53 > 10.244.3.160.33187: 28871*- 0/1/0 (147)

Cilium Version

1.14.0, 1.14.1, main (90a9402)

Kernel Version

Linux fedora 6.1.45-100.constellation.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Aug 14 17:39:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Server Version: version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.0", GitCommit:"855e7c48de7388eb330da0f8d9d2394ee818fb8d", GitTreeState:"clean", BuildDate:"2023-08-15T10:15:54Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

cilium-sysdump-20230831-142049.zip

Relevant log output

No response

Anything else?

The bug does not happen on 1.13.6 and I bisected the problem to the following commit: 68fd9ee (i.e. this commit is the first in which this error occurs).
When using Kubernetes 1.27 or 1.26 the problem disappears.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@3u13r 3u13r added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Aug 31, 2023
@3u13r
Copy link
Contributor Author

3u13r commented Aug 31, 2023

Ok, the bug disappears with Kubernetes v1.27.4. As 1.28 is not officially supported by Cilium feel free to close the issue or maybe use it as a todo item for Cilium's K8s 1.28 support.

@christarazi
Copy link
Member

K8s 1.28 support was added for v1.15 (current dev cycle, i.e. main) version of Cilium. For v1.14, Cilium supports up to K8s 1.27. See https://docs.cilium.io/en/stable/network/kubernetes/compatibility/#kubernetes-compatibility.

@christarazi christarazi added the need-more-info More information is required to further debug or fix the issue. label Aug 31, 2023
@3u13r
Copy link
Contributor Author

3u13r commented Sep 1, 2023

Ok, I didn't see the commit adding support. I also tried the latest main just now (90a9402) the problem still persists.

@github-actions github-actions bot added info-completed The GH issue has received a reply from the author and removed need-more-info More information is required to further debug or fix the issue. labels Sep 1, 2023
@christarazi
Copy link
Member

@cilium/loader Could there be a behavioral difference on reboot before and after 68fd9ee?

@youngnick youngnick added sig/loader Impacts the loading of BPF programs into the kernel. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Sep 4, 2023
@rgo3
Copy link
Contributor

rgo3 commented Sep 6, 2023

Yes, in general since 68fd9ee we try to use bpf_link to attach the socketlb programs to cgroups when the kernel is new enough (>=5.7). Scanning the sysdump for some relevant logs shows that updating the links after reboot was successful (or at least didn't yield an error), e.g. ... level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_post_bind for program cil_sock6_post_bind" subsys=socketlb.
I'm currently unable to repro the issue on a 1.28 kind cluster, tried restarting a kind node as well as restarting the cilium agents to trigger link deletion/recreation and updating an existing link.

@julianwiedmann julianwiedmann added the area/loadbalancing Impacts load-balancing and Kubernetes service implementations label Sep 7, 2023
@brb
Copy link
Member

brb commented Sep 7, 2023

I'm currently unable to repro the issue on a 1.28 kind cluster, tried restarting a kind node as well as restarting the cilium agents to trigger link deletion/recreation and updating an existing link

@rgo3 #27900 (comment)

@rgo3
Copy link
Contributor

rgo3 commented Sep 7, 2023

@3u13r Do you see the same changes when running bpftool cgroup tree before and after restart in your environment as described in #27900 (comment)? In a kind environment the links get deleted and recreated when restarting a kind node while your sysdump indicates that your links get updated through their pin in bpffs after a restart.

@3u13r
Copy link
Contributor Author

3u13r commented Sep 7, 2023

This is the output of bpftool cgroup tree of the host vm of a non-restarted node:

[root@fedora tmp]# ./bpftool cgroup tree
CgroupPath
ID       AttachType      AttachFlags     Name
/sys/fs/cgroup
1026     cgroup_inet4_connect multi           cil_sock4_connect
1028     cgroup_inet6_connect multi           cil_sock6_connect
1019     cgroup_inet4_post_bind multi           cil_sock4_post_bind
1020     cgroup_inet6_post_bind multi           cil_sock6_post_bind
1024     cgroup_udp4_sendmsg multi           cil_sock4_sendmsg
1025     cgroup_udp6_sendmsg multi           cil_sock6_sendmsg
1022     cgroup_udp4_recvmsg multi           cil_sock4_recvmsg
1021     cgroup_udp6_recvmsg multi           cil_sock6_recvmsg
1023     cgroup_inet4_getpeername multi           cil_sock4_getpeername
1027     cgroup_inet6_getpeername multi           cil_sock6_getpeername
/sys/fs/cgroup/system.slice/systemd-networkd.service
    144      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-udevd.service
    143      cgroup_inet_ingress multi           sd_fw_ingress
    142      cgroup_inet_egress multi           sd_fw_egress
/sys/fs/cgroup/system.slice/dbus-broker.service
    155      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-homed.service
    138      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-journald.service
    147      cgroup_inet_ingress multi           sd_fw_ingress
    146      cgroup_inet_egress multi           sd_fw_egress
    145      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-userdbd.service
    154      cgroup_inet_ingress multi           sd_fw_ingress
    153      cgroup_inet_egress multi           sd_fw_egress
    152      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-oomd.service
    141      cgroup_inet_ingress multi           sd_fw_ingress
    140      cgroup_inet_egress multi           sd_fw_egress
    139      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-resolved.service
    151      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-logind.service
    150      cgroup_inet_ingress multi           sd_fw_ingress
    149      cgroup_inet_egress multi           sd_fw_egress
    148      cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8f14acbafb995305adc1c7e3d370f06a.slice/cri-containerd-5d7609a21c3c9249a19b44be05283a3d208b6245dcfbf75de6d1a82f4953e501.scope
    179      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8f14acbafb995305adc1c7e3d370f06a.slice/cri-containerd-a8323001dff882291aba6d177ad0028b3f1c2ac7c42d2968d96cc0992bb03f0f.scope
    159      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2cc49391_b9f1_4052_9016_1e04317cfa78.slice/cri-containerd-6a8d01f321fe54caa8aaa014624ebbf36ebba876bf1c0695a384b775292ffe88.scope
    720      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2cc49391_b9f1_4052_9016_1e04317cfa78.slice/cri-containerd-65e444e57c824eba7d014bf68290f08ece719ab47776b08bf9d3d22e561bb167.scope
    682      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc1f83fcba7fe3a518018789c8304ccd0.slice/cri-containerd-c84436797233cb7ccc1d5d8135017fe64a78d650dc03caea65d77f94495f58e5.scope
    199      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc1f83fcba7fe3a518018789c8304ccd0.slice/cri-containerd-71ec7cbbd80335c9e572f5a81cf936817bbdcb4836e1afcefcaab09f61c27912.scope
    195      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podaf9efeece93da5ba7866d2faee3ac42b.slice/cri-containerd-b4756c9bc4d611f2e3f7dc8967ffd8ed7288aaf2f8198af4de9e13fdb726e85c.scope
    167      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podaf9efeece93da5ba7866d2faee3ac42b.slice/cri-containerd-b39650447c3e8ba6c39546d2294b92144395af2c046de69f7f601d8fcfe59fbf.scope
    183      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode997d354ccd23f179e373c3eac760d7a.slice/cri-containerd-713d948cfb7bfe7e69d762eb154d9f4acb675b6e675cb1bad8f4b624fee94a6f.scope
    163      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode997d354ccd23f179e373c3eac760d7a.slice/cri-containerd-d66e9bdd5798bd3e391593b50a859af40edf61c17719d3026a076beaf5c94c47.scope
    187      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc4e8de36_61ad_409c_a060_0be2e9a5fc5f.slice/cri-containerd-5e4c9848f6e0b9db125f37a8ca6fc6a465fc7988c84c88ee06f418fd181f8253.scope
    1149     cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc4e8de36_61ad_409c_a060_0be2e9a5fc5f.slice/cri-containerd-1df857158242dbc4b207d785dd117386856031a5c1175147520cff07d0f60413.scope
    1152     cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podf2e7000210da9062e64f0a796d6567e7.slice/cri-containerd-9ec2b00f0ac7f9a971ddaa77acfe74aae2fbd983f0741aca32bac66653909a4f.scope
    191      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podf2e7000210da9062e64f0a796d6567e7.slice/cri-containerd-8c5fe7f8d654d36e604f46fe0fa877ea580bc188acf580a9d39c0f53fc9315c3.scope
    171      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod969f1369_a806_405d_8b13_f80b935d8517.slice/cri-containerd-d69b4b5653f1e7528d001380f4c61214338082727a3cac08edfedd48d9e8c69f.scope
    651      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod969f1369_a806_405d_8b13_f80b935d8517.slice/cri-containerd-fb874e6603203f9f61f590330fe2a5b5b76ff594f9aefe28a7a4c679ba412c64.scope
    685      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc68f81c4_36de_4217_9941_152c1fa00e6d.slice/cri-containerd-6b8059d159ccbda2ee08a460c9ddbed756a89fd589d83a8c0d2b35bc678bf197.scope
    612      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc68f81c4_36de_4217_9941_152c1fa00e6d.slice/cri-containerd-a052b21f78252edd567905a74191616dc206f5cdd9b6dab51c9426dec310ea47.scope
    689      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod46144d16_6001_4250_8fe9_63503fdfb186.slice/cri-containerd-2d3b996b5f40e3f5c341a06b3f81009b7825c380450602b76319a9152c104c8b.scope
    647      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod46144d16_6001_4250_8fe9_63503fdfb186.slice/cri-containerd-290457e9ced0d755adf3eb94852d147fadf001ece8f24113d0eefeb6db801971.scope
    693      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod214291a9_d617_49eb_8b03_e42253069427.slice/cri-containerd-4bc457b1a450001cf375cd41c652ba3b2580d4e10595d8c3016c81239e816d0d.scope
    678      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod214291a9_d617_49eb_8b03_e42253069427.slice/cri-containerd-c47c276e0f9e90751d94c91909113fc117aa039170672b2e4768c10ad207e598.scope
    641      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podfb83de4a_ab4a_469b_aae0_7ca2bec408de.slice/cri-containerd-1578a695300edda110845cd767e13dc279b97c804f492611082ff26c75e3c656.scope
    586      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podfb83de4a_ab4a_469b_aae0_7ca2bec408de.slice/cri-containerd-270665ed3a3513d497ad07e096ad743d2aa65b7943d160f27fed15fa9cb80b05.scope
    675      cgroup_device   multi

And this is the output after the restart:

[root@fedora tmp]# ./bpftool cgroup tree
CgroupPath
ID       AttachType      AttachFlags     Name
/sys/fs/cgroup/system.slice/systemd-networkd.service
    995      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-udevd.service
    988      cgroup_inet_ingress multi           sd_fw_ingress
    987      cgroup_inet_egress multi           sd_fw_egress
/sys/fs/cgroup/system.slice/dbus-broker.service
    1014     cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-homed.service
    1010     cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-journald.service
    1005     cgroup_inet_ingress multi           sd_fw_ingress
    1004     cgroup_inet_egress multi           sd_fw_egress
    1003     cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-userdbd.service
    1013     cgroup_inet_ingress multi           sd_fw_ingress
    1012     cgroup_inet_egress multi           sd_fw_egress
    1011     cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-oomd.service
    994      cgroup_inet_ingress multi           sd_fw_ingress
    993      cgroup_inet_egress multi           sd_fw_egress
    992      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-resolved.service
    989      cgroup_device   multi           sd_devices
/sys/fs/cgroup/system.slice/systemd-logind.service
    1001     cgroup_inet_ingress multi           sd_fw_ingress
    1000     cgroup_inet_egress multi           sd_fw_egress
    999      cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7dbeea8a2d5a40badde3f5bc0bf67ba3.slice/cri-containerd-24c75378b46f0c02552892a7f8b43f5c2dedd75002aed0a76cba9c6c80d9bf68.scope
    133      cgroup_device   multi
    1008     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7dbeea8a2d5a40badde3f5bc0bf67ba3.slice/cri-containerd-8bfee0272065402bccad337047fd8e24e05a89e0a345974cb33308addf951474.scope
    105      cgroup_device   multi
    998      cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod02d03117f7aa76ba06e486aad8a6362d.slice/cri-containerd-0e042929c210080a23a975edfc3c886ec6d167a7b58c8533c7258957bf6c880f.scope
    137      cgroup_device   multi
    1009     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod02d03117f7aa76ba06e486aad8a6362d.slice/cri-containerd-db13a572ddeca6979eeced244c0cc0868b6bbe1ae994073d6433e1bcbef2d4eb.scope
    114      cgroup_device   multi
    996      cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9169eb295ec3ab0886eb40f49d6b9928.slice/cri-containerd-de3e93244818e5f637ce375eb09753d59b0defe68c114d79a3d9f7ca4e29ff69.scope
    117      cgroup_device   multi
    1019     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9169eb295ec3ab0886eb40f49d6b9928.slice/cri-containerd-ee9e9760455559caa7d7e1d48ad09ddddba9f7390d2acfadc6f7bb7a508905fb.scope
    129      cgroup_device   multi
    984      cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb38e301b_c99a_4cb5_8ba4_f2aacf9b1ae8.slice/cri-containerd-136d681401ecb1468ba2688eb68607ceb55113d0c7886d10d86c4b6beb98be50.scope
    1152     cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb38e301b_c99a_4cb5_8ba4_f2aacf9b1ae8.slice/cri-containerd-65d55e1ff33ee707fa030ce4bbbc0e04a7a23f79b5dba4e8ae36e576b10add44.scope
    1160     cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podaee65d93db944d83f000555e6a498685.slice/cri-containerd-a75fc70b119ddeb8330b8bed0cf04f9c231292da4be6273b0a6693b6f07c6ded.scope
    109      cgroup_device   multi
    1017     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podaee65d93db944d83f000555e6a498685.slice/cri-containerd-b77aaa94320abbd74766bfa88aa8ef6c3658264c4ed91b1aaea2adaaf5bb2d55.scope
    141      cgroup_device   multi
    990      cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod9cf44c1a60ffa3e9e50c09bec8a42563.slice/cri-containerd-c0995b5cbdb1a1e388dd07efdbaf1708180c318eef4985233cc1814df74b77b9.scope
    145      cgroup_device   multi
    991      cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod9cf44c1a60ffa3e9e50c09bec8a42563.slice/cri-containerd-06cbedf384a7ff698f1c8178b9d76c683f221939c669ff0fe449fdc74d8b6980.scope
    121      cgroup_device   multi
    1026     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb35cb72e_4cd7_4c1a_b9b3_8c461f4a41bc.slice/cri-containerd-7eb38fd3501c77052dd008db5192b0a72dda8a5d140398aa16b9b3ae2eb3b925.scope
    569      cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb35cb72e_4cd7_4c1a_b9b3_8c461f4a41bc.slice/cri-containerd-0a79baf62b5d597debd4d88b71c0c7cb0175a9e2bbff3ec6b1f6e08729ffc528.scope
    543      cgroup_device   multi
    1027     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podd387a567_683d_4028_b448_565d60db667d.slice/cri-containerd-d634bde7433d5363d387425aa057101abfa3f55e255039d9c2217a9b1b39ca05.scope
    564      cgroup_device   multi
    997      cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podba5e83c7_7092_4530_a684_6b158a5ca663.slice/cri-containerd-f89891fc6a435b6646c2c7c80915fbb495f46d919e860ebe9e604ca821d46c11.scope
    1603     cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podba5e83c7_7092_4530_a684_6b158a5ca663.slice/cri-containerd-e0bf855c1dfe9184d0b22aa72ea61664b23222f79490bc014c807a0127459041.scope
    1606     cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod863e8544_30bc_4a9e_9e51_da414852a3b1.slice/cri-containerd-bfb215cc2007e72b02bfea35087cc1e1c82c8f3921a19d1f20b82096ef032fbf.scope
    595      cgroup_device   multi
    1023     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod863e8544_30bc_4a9e_9e51_da414852a3b1.slice/cri-containerd-1c71c28ebb5cbb92c0768690e23c76d606319c15217407c935ec5ca5d59ea842.scope
    525      cgroup_device   multi
    1002     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podcb6da0c8_358e_4538_bbb4_e2b217eb9644.slice/cri-containerd-22328b0792235fe26adc57f72dae79e6b9cacefb01211740105914dfa8a27707.scope
    1630     cgroup_device   multi
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod456e8d1b_796b_4a5b_8123_e5312ec2e16e.slice/cri-containerd-298b8e3bd0e796a409dea7f39d617d9efdd954b3fdbd183304e34dbe20f00e73.scope
    532      cgroup_device   multi
    1007     cgroup_device   multi           sd_devices
/sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod456e8d1b_796b_4a5b_8123_e5312ec2e16e.slice/cri-containerd-3ab8619b7be4243645c0c8f4ce2284186850b643523d79f46d0d3ba07f60dc99.scope
    585      cgroup_device   multi
    986      cgroup_device   multi           sd_devices

Note that the restarted node's cilium logs include:

level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_getpeername for program cil_sock6_getpeername" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock4_connect for program cil_sock4_connect" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock4_sendmsg for program cil_sock4_sendmsg" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock4_getpeername for program cil_sock4_getpeername" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_recvmsg for program cil_sock6_recvmsg" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_post_bind for program cil_sock6_post_bind" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock4_recvmsg for program cil_sock4_recvmsg" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock4_post_bind for program cil_sock4_post_bind" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_connect for program cil_sock6_connect" subsys=socketlb
level=info msg="Updated link /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_sendmsg for program cil_sock6_sendmsg" subsys=socketlb

@jspaleta
Copy link
Contributor

jspaleta commented Sep 7, 2023

@3u13r I might have a reproducer of the symptoms using just a Kind cluster, running on a CentOS 9 host... but for my procedure dropping back and using Cilium 1.13.6 doesn't fix the problem... which is different from your experience.

Ref Gist with my testing scenarios including the Cilium 1.13.6 scenario:
https://gist.github.com/jspaleta/6f3f5bdadc83790be978999ff9693082

My spidey sense tells me my procedure with the Kind node container restarts should be hitting the same underlying problem that your procedure is...but my Cilium 1.13.6 test failed and yours didn't... that's curious. Is there something racing on node restart that my procedure is just better at causing more frequently?

@brb
Copy link
Member

brb commented Sep 8, 2023

And this is the output after the restart:

From the output of @3u13r I don't see any Cilium cgroup BPF progs attached, which explains why services are not reachable.

@rgo3
Copy link
Contributor

rgo3 commented Sep 8, 2023

@jspaleta I think your reproducers trigger a different issue than what @3u13r is seeing. In your case, after a restart the cgroup BPF progs are attached to the wrong cgroup as @brb has shown in #27900 (comment). From what I could see when debugging, this happens because the /run/cilium/cgroupv2 points to different "locations" (not sure if that is the correct terminology here). This also explains why you can see the issue happening across cilium versions (1.14, 1.13.6 respectively) because even though 68fd9ee changes how we attach cgroup programs, in both versions we try to attach to /run/cilium/cgroupv2. To understand the issue in your reproducer we'll need to understand why this doesn't point to the same cgroup root across restarts and if it only happens in kind.

However in @3u13r 's case it looks like we have a bug when updating the bpf links after a restart. My current suspicion is that retrieving the link from bpffs and just update the program after a restart doesn't work for some reason because no cgroup progs are attached anymore. But as mentioned before I don't have a good reproducer for this exact behavior yet 😞

@rgo3
Copy link
Contributor

rgo3 commented Sep 8, 2023

@3u13r So bpf links and the pin in bpffs shouldn't actually be persistent across reboots, so it's strange that we can still pick up the pins and try to update them. Are you doing a proper reboot of the node?

@3u13r
Copy link
Contributor Author

3u13r commented Sep 8, 2023

As mentioned I do a kubectl debug node/<node-name> --image=ubuntu -- bash -c "echo reboot > reboot.sh && chroot /host < reboot.sh" and also see a full restart log in the serial console of the VM. I'm testing this on mostly AWS but also reproduced this on GCP and Azure.

@jspaleta
Copy link
Contributor

jspaleta commented Sep 8, 2023

okay @rgo3 so this one is probably not reproducible just with kind. Maybe I need to kick my k3s home lab worker node in just the right way....

@3u13r
Copy link
Contributor Author

3u13r commented Sep 8, 2023

I'm 99% sure that the K8s upstream fix mentioned in the other issue will also solve this issue. Note that the order of dependency changed in the commit I bisected i.e. before that commit the pinning took place before the cilium-agent container and then after all the other init-containers.
Just waiting for a new K8s release now. If you want I'm fine with closing this issue since it is a K8s upstream bug.

@aditighag
Copy link
Member

aditighag commented Sep 8, 2023

Looks like there could be multiple issues at play here. 😕 Issue (1) k8s v1.28 regression, see - #27900 (comment). This might be exposing some other issues in the loader logic as @rgo3 pointed out due to which BPF cgroup programs are not getting attached at all after node restart?

I'm 99% sure that the K8s upstream fix mentioned in the other issue will also solve this issue. Note that the order of dependency changed in the commit I bisected i.e. before that commit the pinning took place before the cilium-agent container and then after all the other init-containers.

👍 Could you confirm the bisected commit, please? Thanks!

Just waiting for a new K8s release now. If you want I'm fine with closing this issue since it is a K8s upstream bug.

I would suggesting keeping this issue open for the potential issue in the loader logic.

(Edit: Sorry, I accidentally deleted my previous comment while editing.)

@3u13r
Copy link
Contributor Author

3u13r commented Sep 8, 2023

Hm, I just remembered that we also do a restart of the cilium pod once it is healthy since there was a connection issue that was fixed by that. I don't think it's needed anymore and likely skews the behavior/logs. Let me remove that code and let me see if it makes a difference.

Could you confirm the bisected commit, please?

What do you mean? Should I get the cgroup attachment again for the commit (and the one before it)?

@aditighag
Copy link
Member

What do you mean? Should I get the cgroup attachment again for the commit (and the one before it)?

What's the bisected commit ID you were referring to? Is it the same as 68fd9ee?

@3u13r
Copy link
Contributor Author

3u13r commented Sep 8, 2023

What's the bisected commit ID you were referring to? Is it the same as 68fd9ee?

Yes this is the commit that breaks the behavior on K8s 1.28 for me.

@rgo3
Copy link
Contributor

rgo3 commented Sep 8, 2023

that we also do a restart of the cilium pod

The restart of the cilium pod could explain the log line about an updated link as it will have created one at initial startup, but still a bit odd that it succeeds and then bpftool cgroup tree is empty.

@jspaleta
Copy link
Contributor

jspaleta commented Sep 8, 2023

Okay just for clarity I got my k3s cluster in a bad state. I'm going to try the downgrade of cilium to 1.13.6 on it just to confirm I'm seeing the same issue now. I'll update this comment with more info. Sorry for the noise with the spurious other issue reproducer

So i rolled back to cilium 1.13.6 on my k3s 2 node intel nuc cluster using a k8s 1.28, confirmed all pods were up and running and cilium status is in the greent, rebooted both my k3s nodes and coreDNS pods didn't come back up...so its same symptoms as my kind cluster reproducer...

Do i need to document this further, what information should i provide?

There are a lot of moving parts here the k3s images are release candidates themselves, so ruling out k3s rc specific "type 2 fun" {tm} is not possible.

kubectl get nodes
NAME     STATUS   ROLES                  AGE   VERSION
nuc-02   Ready    control-plane,master   42h   v1.28.1-rc2+k3s1
nuc-01   Ready    <none>                 39h   v1.28.1-rc2+k3s1
cilium version
cilium-cli: v0.15.7 compiled with go1.21.0 on linux/amd64
cilium image (default): v1.14.1
cilium image (stable): v1.14.1
cilium image (running): 1.13.6
kubectl logs -n kube-system coredns-77ccd57875-p4lbq  |grep WARNING
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.43.0.1:443/version": dial tcp 10.43.0.1:443: i/o timeout

sysdump attached
cilium-sysdump-20230123-123225.zip

@3u13r
Copy link
Contributor Author

3u13r commented Sep 11, 2023

I had a look at the logs of the pods again and in the restarted node the cilium pod has the following warning:

level=warning msg="No valid cgroup base path found: socket load-balancing tracing with Hubble will not work.See the kubeproxy-free guide for more details." subsys=cgroup-manager

Otherwise all observation from above regarding the unattached programs still apply.

@aanm
Copy link
Member

aanm commented Sep 13, 2023

@3u13r would you be able to test it again with Kubernetes 1.28.2? Thank you

@aanm aanm added need-more-info More information is required to further debug or fix the issue. and removed info-completed The GH issue has received a reply from the author labels Sep 13, 2023
@brb brb added sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. info-completed The GH issue has received a reply from the author and removed kind/bug This is a bug in the Cilium logic. need-more-info More information is required to further debug or fix the issue. sig/loader Impacts the loading of BPF programs into the kernel. area/loadbalancing Impacts load-balancing and Kubernetes service implementations labels Sep 14, 2023
@3u13r
Copy link
Contributor Author

3u13r commented Sep 18, 2023

Sorry for the delay. I can confirm that using K8s v1.28.2 fixes the reported problem. Thanks for the help and investigation.

@3u13r 3u13r closed this as completed Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info-completed The GH issue has received a reply from the author kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
None yet
Development

No branches or pull requests

9 participants