Cilium v1.14.2 with Kubernetes v1.28 is unstable #27982

dghubble · 2023-09-07T02:56:12Z

Is there an existing issue for this?

I have searched the existing issues

What happened?

Starting in Cilium v1.14.0 on Kubernetes v1.28.1, Cilium agents can lose connection to kube-apisever when using kube-proxy and the kubernetes service ClusterIP. This looks closely related to #27900

Cilium supports hybrid modes in which Cilium can coexist with kube-proxy while performing some or all of its responsibilities (e.g. there are reasons one might not wish to remove kube-proxy). Cilium v1.14 removed the kube-proxy-replacement partial mode and changed it to either true or false. But something else appears to have changed:

Consider a cluster with a kube-proxy daemonset. kube-proxy uses ipvs to load balance the default kubernetes service ClusterIP to a kube-apiserver endpoint.

kubectl get service
NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
kubernetes        ClusterIP   10.3.0.1       <none>        443/TCP                      26h

kubectl get endpoints
NAME              ENDPOINTS                                         AGE
kubernetes      10.0.8.71:6443                                    26h

Cilium agent's respect the default KUBERNETES_SERVICE_HOST by default (10.3.0.1), which usually works fine.

level=info msg="Establishing connection to apiserver" host="https://10.3.0.1:443" subsys=k8s-client
level=info msg="Establishing connection to apiserver" host="https://10.3.0.1:443" subsys=k8s-client
level=error msg="Unable to contact k8s api-server" error="Get \"https://10.3.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.3.0.1:443: connect: operation not permitted" ipAddr="https://10.3.0.1:443" subsys=k8s-client

But I've noticed there is a (yet unknown) sequence of events whereby connectivity to the kubernetes service Cluster IP breaks on certain nodes. This can happen after days of otherwise running normally. I think it's related to node restarts because I see it more on spot instances. The result is that the Cilium agent on those nodes crashloops, unable to reach the apiserver.

Workaround

The workaround is updating Cilium agent to have an explicit kube-apiserver IP address or DNS record in a KUBERNETES_SERVICE_HOST environment variable, but this should not be neccessary and is undesired. Workloads (including Cilium agent) on clusters with kube-proxy should be able to use in-cluster service discovery

I suspect the wrinkle here is that Cilium itself can interact with Kubernetes Service mappings. That or something about Kubernetes v1.28 itself.

Scope

I've observed in this with KubeProxyReplacement false (enabling the individual features) and KubeProxyReplacement enabled.

kube-proxy-replacement:  "false"
bpf-lb-sock: "true"
bpf-lb-external-clusterip: "true"
enable-node-port: "true"
enable-health-check-nodeport: "false"
enable-external-ips: "true"
enable-host-port: "true"

And with KubeProxyReplacement true

kube-proxy-replacement:  "true"

Neither mode is related to the fix.

Cilium Version

Cilium v1.14.0, v1.14.1

Kernel Version

Linux ip-10-0-11-132 6.4.7-200.fc38.aarch64 #1 SMP PREEMPT_DYNAMIC Thu Jul 27 20:22:11 UTC 2023 aarch64 GNU/Linux

Kubernetes Version

Kubernetes v1.28.1

Sysdump

No response

Relevant log output

No response

Anything else?

The fallback here should be kube-proxy's IPVS, which does program the right LVS rules

ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.3.0.1:443 rr
  -> 10.0.8.71:6443               Masq    1      0          0
  ...

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

dghubble · 2023-09-07T04:21:35Z

When this occurs on a node, the kubernetes Service cluster IP can't be used from the host, which aligns with what Pods see.

curl https://10.3.0.1
hangs

Once Cilium agents start (using the workaround above), the same curl command works from hosts. Seems like Cilium having run on the node in the past interferes with IPVS functionality.

youngnick · 2023-09-08T06:45:25Z

Thanks for this issue @dghubble, and especially for the great investigation.

Cilium 1.14 actually only supports up to Kubernetes 1.27 - the client upgrade has only been merged into main at this point. It is odd that there's much more compatibility issues than usual (see #27900 and #27965 for other examples), so we'll see if we can do something to resolve all of these. Feels like at the very least we should call out that there are known issues with Cilium 1.14.x and Kubernetes 1.28, but I'll talk to some other folks and see what the consensus is.

Silvest89 · 2023-09-10T17:01:25Z

This actually happens after a node reboot. I am using a k3s cluster with kured to have automatic reboots after updates. It won't be able to access the kube-dns anymore so it cannot reach anything.

youngnick · 2023-09-11T03:25:43Z

It looks like there may be two issues in play here: An upstream issue (kubernetes/kubernetes#120247), and possibly #27848. The upstream issue fix is in and will be included in Kubernetes 1.28.2, due out soon, and the other investigation is ongoing at #27848.

aanm · 2023-09-13T21:12:37Z

@dghubble @Silvest89 would you be able to test it again with Kubernetes 1.28.2? Thank you

zhurkin · 2023-09-14T12:52:05Z

@dghubble @Silvest89 would you be able to test it again with Kubernetes 1.28.2? Thank you
I don't know if the information will be useful .

cicd-kub-control-01:/home/icce# cilium status
/¯¯
/¯¯_/¯¯\ Cilium: 1 errors, 3 warnings
_/¯¯_/ Operator: OK
/¯¯_/¯¯\ Envoy DaemonSet: disabled (using embedded mode)
_/¯¯_/ Hubble Relay: disabled
__/ ClusterMesh: disabled

DaemonSet cilium Desired: 3, Unavailable: 3/3
Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1
Containers: cilium Pending: 3
cilium-operator Running: 1
Cluster Pods: 0/2 managed by Cilium
Helm chart version: 1.15.0-pre.0
Image versions cilium quay.io/cilium/cilium:v1.15.0-pre.0: 3
cilium-operator quay.io/cilium/operator-generic:v1.15.0-pre.0: 1
Errors: cilium cilium 3 pods of DaemonSet cilium are not ready
Warnings: cilium cilium-9ch5r pod is pending
cilium cilium-f8zgf pod is pending
cilium cilium-j6kql pod is pending

cicd-kub-control-01:/home/icce# kubectl get nodes

NAME STATUS ROLES AGE VERSION
cicd-kub-control-01 Ready control-plane 3d1h v1.28.2
cicd-kub-control-02 Ready control-plane 2d23h v1.28.2
cicd-kub-control-03 Ready control-plane 2d22h v1.28.2
cicd-kub-control-01:/home/icce# kubectl get po -A | grep -e cilium -e core
kube-system cilium-9ch5r 0/1 Init:CreateContainerError 0 98s
kube-system cilium-f8zgf 0/1 Init:CreateContainerError 0 98s
kube-system cilium-j6kql 0/1 Init:CreateContainerError 0 98s
kube-system cilium-operator-756dfd6d4d-nfxk5 1/1 Running 0 98s
kube-system coredns-5dd5756b68-brfj8 0/1 Pending 0 2d16h
kube-system coredns-5dd5756b68-ql7mw 0/1 Pending 0 2d16h

cicd-kub-control-01:/home/icce# kubectl -n kube-system logs cilium-9ch5r
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
Error from server (BadRequest): container "cilium-agent" in pod "cilium-9ch5r" is waiting to start: PodInitializing`

dghubble · 2023-09-14T20:41:18Z

Preliminarily, on a Kubernetes v1.28.2 cluster, I've not been able to reproduce the issue. Restarting nodes, Cilium can reach the apiserver just fine, which I suspected was the trigger before. I observed the original issue in real production clusters though, after several days of use, so I'll have more confidence in a few days.

julianwiedmann · 2023-09-15T12:12:34Z

Preliminarily, on a Kubernetes v1.28.2 cluster, I've not been able to reproduce the issue. Restarting nodes, Cilium can reach the apiserver just fine, which I suspected was the trigger before. I observed the original issue in real production clusters though, after several days of use, so I'll have more confidence in a few days.

Thank you for the feedback! Let's leave it in need-more-info then until you have full confidence.

aojea · 2023-09-18T07:24:03Z

This seems a duplicate of #27900, should we close it @aanm @julianwiedmann ?

dghubble · 2023-09-18T07:34:32Z

I've seen this occur once on a new cluster with Kubernetes v1.28.2 and Cilium v1.14.2. Most clusters have been fine since those upgrades.

Is there anything specific I should be collecting? To confirm it's the same issue. Unfortunately, I usually have to apply mitigations asap and can't afford to leave clusters in this broken state for long.

Silvest89 · 2023-09-21T11:51:01Z

@aanm @dghubble
Upgraded my cluster from 1.27.6 to 1.28.2. Everything seems to be working smooth =]

dghubble · 2023-09-23T06:37:33Z

This issue can still happen. I've had to explicitly set a KUBERNETES_SERVICE_HOST to an external DNS record so that Cilium can reliably find the apiserver. This should not be required, in-cluster kube-proxy should be sufficient.

level=info msg="Establishing connection to apiserver" host="https://10.3.0.1:443" subsys=k8s-client
level=error msg="Unable to contact k8s api-server" error="Get \"https://10.3.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.3.0.1:443: connect: operation not permitted" ipAddr="https://10.3.0.1:443" subsys=k8s-client
level=error msg="Start hook failed" error="Get \"https://10.3.0.1:443/api/v1/namespaces/kube-system\": dial tcp 10.3.0.1:443: connect: operation not permitted" function="client.(*compositeClientset).onStart" subsys=hive

This can take days of real-world usage to become evident. Fresh clusters looked fine, but they're not fine.

aanm · 2023-09-26T11:44:36Z

@dghubble that could be related with an issue with kube-proxy and not Cilium itself as you have pointed it out that connecting to connecting directly to an external DNS record it works but not with a cluster IP, for which kube-proxy does the service translation.

dghubble · 2023-10-16T16:34:03Z

@squeed 👋🏻 long time! Yeah, my suspicion is that it's related to this overlapping responsibility kube-proxy and Cilium have for managing the apiserver's own Kubernetes Service traffic. Having socket-lb optionally exclude the apiserver itself could be helpful. Odd part is this was never an issue before, I'm not sure if something changed here with the shifting kube-proxy modes. Next Kubernetes patch release, I should have an opportunity to test this again and capture logs (or if I find time sooner).

@tedli The workaround to this issue is giving Cilium explicit IP addresses for the apiserver (undesired). If you're seeing issues in that case, you're probably describing a separate issue.

K8s v1.28.0 causes the following regression: #27982. Most noticably, this has been causing k8s conformance test failures. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>

K8s v1.28.0 causes the following regression: #27982. Most noticeably, this has been causing k8s conformance test failures. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>

dghubble · 2023-10-19T15:31:11Z

I saw it recur on one node today! A Cilium agent pod was unable to reach kube-apiserver. My usual workaround is to modify the Cilium DaemonSet to explicitly set a KUBERNETES_SERVICE_HOST, but this time I tried rebooting the host, which does mitigate the issue. This seems to support the theory that old BPF rules are lying around preventing the new Cilium Pod from reaching kube-apiserver via kube-proxy, as it normally can.

Bugtool: https://storage.googleapis.com/dghubble/bugtool.tar.gz (too big for GitHub)

squeed · 2023-10-20T11:13:05Z

So, looking at the bugtool, I see the following set of backends for the apiserver service:

10.3.0.11:443        0.0.0.0:0 (16) (0) [ClusterIP]   (ignore this bit)
                     10.2.1.160:443 (16) (2)          
                     10.2.2.35:443 (16) (1)

As of the time of the bugtool, are those correct? My theory is that cilium is missing changes to the default/kubernetes service, and once that happens it can't recover. (This is why you should not use the apiserver ClusterIP with socket-lb).

Separately, #25169 may also be relevant to this.

dghubble · 2023-10-20T15:26:56Z

Yeah, this looks ood. Those two backend IPs are from the Pod CIDR range (10.2.0.0/16). But the apiserver runs on controller/master node(s) with host network (10.0.4.0/22 in this case), as a static pod. Kubernetes and kube-proxy see this as I'd expect, just one apiserver/master in this cluster:

kubectl get endpoints kubernetes
NAME         ENDPOINTS        AGE
kubernetes   10.0.4.26:6443   29d

I'm not sure what the Pod IPs Cilium was seeing correspond to anymore. Cilium now shows the right backend.

ID   Frontend             Service Type   Backend                          
...
2    10.3.0.1:443         ClusterIP      1 => 10.0.4.26:6443 (active)

kubectl get pods -n kube-system -o wide
NAME                                                    READY   STATUS    RESTARTS      AGE   IP           NODE                            NOMINATED NODE     
kube-apiserver-magnesium.region.dghubble.io            1/1     Running   3 (33h ago)   29d   10.0.4.26    magnesium.region.dghubble.io   <none>           <none>

Interesting that the apiserver restarted 33h ago, but maybe a coincidence. And only Cilium one node got into this bad state. The prior apiserver logs show it clearing the kubernetes endpoints.

W1019 06:10:07.698343       1 lease.go:263] Resetting endpoints for master service "kubernetes" to []

dghubble · 2023-10-20T16:04:06Z

Btw, I've preferred Cilium using in-cluster discovery (i.e. 10.3.0.1 via kube-proxy) in a Kubernetes distro because it's platform agnostic. Giving Cilium an IP hardcodes a value (and to support multi-master I'd need to create VIPs in a platform agnostic way on behalf of users) and giving Cilium a DNS record pointing to the apiserver is hard to do in a platform agnostic way (e.g. public vs private clusters use different FQDNs depending on AWS private Endpoints Azure PrivateLink, etc). Though that may be the way we go. The Cilium docs just say give us the real apiserver address somehow.

In theory, Cilium could just read the kubernetes endpoints directly and do it's own load balancing (since client-go doesn't load balance multiple IPs) similar to what kube-proxy is providing. Distros wouldn't have to decide how to give Cilium a "real" apiserver address.

squeed · 2023-10-23T12:48:42Z

@dghubble makes perfect sense; the ultimate solution may be to add a flag disabling socket-lb for host-netns processes. (or perhaps just the Cilium agent). Then Cilium would receive load balancing from kube-proxy, and pods would use socket-lb. WDYT?

dghubble · 2023-10-29T22:31:31Z

I suspect that would fix this situation. Did Cilium 1.13 with partial kube-proxy previously work this way? It's odd this became a problem so recently.

* With Cilium v1.14, Cilium's kube-proxy partial mode changed to either be enabled or disabled (not partial). This somtimes leaves Cilium (and the host) unable to reach the kube-apiserver via the in-cluster Kubernetes Service IP, until the host is rebooted * As a workaround, configure Cilium to rely on external DNS resolvers to find the IP address of the apiserver. This is less portable and less "clean" than using in-cluster discovery, but also what Cilium wants users to do. Revert this when the upstream issue cilium/cilium#27982 is resolved

Silvest89 · 2023-11-14T19:36:28Z

@dghubble I've been running my cluster for more than a month now, haven't run into any issues. Cluster bootstrapped using k3s on Hetzner Cloud

squeed · 2023-11-14T20:06:28Z

I believe 1.14 brings changes to the socket-lb, but that’s not my area of expertise. @aditighag, any pointers?

n-able-consulting · 2023-12-05T15:59:54Z

I was fixated on this issue for a day before I found this threat.
I am using k8s 1.28.4 and cilium 1.14.4, with my own programmed provisioning. Suddenly it breaks when enabling OIDC...
I run on bare metal. I have multiple interfaces to different VLANs, so I can use the 'internal' interface in my configuration. Technically, it's no big deal.

I have problems with what it means when I do a high-available setup because I then have to address the 'outside' load balancer for 'in cluster traffic'. I do not like it. Also, I can not imagine this is the way it is meant to be.

We built in PKI and overlay to secure all communication, then we break it open to talk over the 'outside' infrastructure network. Instead of using the build in overlay Kubernetes service. I can not believe this?

n-able-consulting · 2023-12-05T20:49:42Z

Issue is older. Also 1.13.9 breaks when configuring OIDC on api-server...

n-able-consulting · 2023-12-05T21:57:46Z

I can concure it is a 1.28 issue. Provisioning k8s 1.27 Cilium does not break, after configuring OIDC. With breaking I refer to the familiar issue: Unable to contact k8s api-server / Forbidden 10.2.0.1;6443.
It works when setting k8sServiceHost in 1.28, which, as I stated above, in my opinion, is digging a whole out of the k8s security plane... And as such no solution.
If not clear yet, I do not like this Cilium feature.

github-actions · 2024-02-04T01:45:45Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2024-02-19T01:45:02Z

This issue has not seen any activity since it was marked stale.
Closing.

dghubble · 2024-02-19T08:51:07Z

I ultimately had to adapt our Kubernetes distro to tell Cilium the DNS name resolving to any of the apiservers. The approach and resolver varies based on the cloud provider.

It's a shame, Cilium used to support the in-cluster kubernetes ClusterIP, but now effectively relies on an external resolver.

dghubble added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Sep 7, 2023

dghubble changed the title ~~Cilium v1.14 with Kubernetes v1.28 is unstable~~ Cilium v1.14.1 with Kubernetes v1.28 is unstable Sep 7, 2023

youngnick added the need-more-info More information is required to further debug or fix the issue. label Sep 8, 2023

Silvest89 mentioned this issue Sep 10, 2023

Node pods cannot access kube dns after a reboot kube-hetzner/terraform-hcloud-kube-hetzner#974

Closed

github-actions bot added info-completed The GH issue has received a reply from the author and removed need-more-info More information is required to further debug or fix the issue. labels Sep 14, 2023

julianwiedmann added need-more-info More information is required to further debug or fix the issue. and removed info-completed The GH issue has received a reply from the author labels Sep 15, 2023

github-actions bot added info-completed The GH issue has received a reply from the author and removed need-more-info More information is required to further debug or fix the issue. labels Sep 18, 2023

dghubble changed the title ~~Cilium v1.14.1 with Kubernetes v1.28 is unstable~~ Cilium v1.14.2 with Kubernetes v1.28 is unstable Sep 23, 2023

lmb added sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. sig/agent Cilium agent related. and removed needs/triage This issue requires triaging to establish severity and next steps. labels Sep 26, 2023

aanm mentioned this issue Sep 26, 2023

cilium and kube-proxy keep restarting on a certain machine #27965

Closed

2 tasks

tommyp1ckles mentioned this issue Oct 17, 2023

.github: bump k8s version from v1.28.0 -> v1.28.2. #28664

Merged

dghubble mentioned this issue Oct 29, 2023

Configure Cilium agents to connect to apiserver explicitly poseidon/terraform-render-bootstrap#365

Merged

dghubble mentioned this issue Oct 29, 2023

Workaround problems in Cilium v1.14 partial kube-proxy replacement poseidon/typhoon#1391

Merged

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 4, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cilium v1.14.2 with Kubernetes v1.28 is unstable #27982

Cilium v1.14.2 with Kubernetes v1.28 is unstable #27982

dghubble commented Sep 7, 2023 •

edited

Loading

dghubble commented Sep 7, 2023

youngnick commented Sep 8, 2023

Silvest89 commented Sep 10, 2023

youngnick commented Sep 11, 2023

aanm commented Sep 13, 2023

zhurkin commented Sep 14, 2023 •

edited

Loading

dghubble commented Sep 14, 2023

julianwiedmann commented Sep 15, 2023

aojea commented Sep 18, 2023

dghubble commented Sep 18, 2023

Silvest89 commented Sep 21, 2023

dghubble commented Sep 23, 2023 •

edited

Loading

aanm commented Sep 26, 2023

dghubble commented Oct 16, 2023

dghubble commented Oct 19, 2023

squeed commented Oct 20, 2023

dghubble commented Oct 20, 2023 •

edited

Loading

dghubble commented Oct 20, 2023 •

edited

Loading

squeed commented Oct 23, 2023

dghubble commented Oct 29, 2023

Silvest89 commented Nov 14, 2023

squeed commented Nov 14, 2023

n-able-consulting commented Dec 5, 2023 •

edited

Loading

n-able-consulting commented Dec 5, 2023

n-able-consulting commented Dec 5, 2023

github-actions bot commented Feb 4, 2024

github-actions bot commented Feb 19, 2024

dghubble commented Feb 19, 2024

Cilium v1.14.2 with Kubernetes v1.28 is unstable #27982

Cilium v1.14.2 with Kubernetes v1.28 is unstable #27982

Comments

dghubble commented Sep 7, 2023 • edited Loading

Is there an existing issue for this?

What happened?

Workaround

Scope

Cilium Version

Kernel Version

Kubernetes Version

Sysdump

Relevant log output

Anything else?

Code of Conduct

dghubble commented Sep 7, 2023

youngnick commented Sep 8, 2023

Silvest89 commented Sep 10, 2023

youngnick commented Sep 11, 2023

aanm commented Sep 13, 2023

zhurkin commented Sep 14, 2023 • edited Loading

dghubble commented Sep 14, 2023

julianwiedmann commented Sep 15, 2023

aojea commented Sep 18, 2023

dghubble commented Sep 18, 2023

Silvest89 commented Sep 21, 2023

dghubble commented Sep 23, 2023 • edited Loading

aanm commented Sep 26, 2023

dghubble commented Oct 16, 2023

dghubble commented Oct 19, 2023

squeed commented Oct 20, 2023

dghubble commented Oct 20, 2023 • edited Loading

dghubble commented Oct 20, 2023 • edited Loading

squeed commented Oct 23, 2023

dghubble commented Oct 29, 2023

Silvest89 commented Nov 14, 2023

squeed commented Nov 14, 2023

n-able-consulting commented Dec 5, 2023 • edited Loading

n-able-consulting commented Dec 5, 2023

n-able-consulting commented Dec 5, 2023

github-actions bot commented Feb 4, 2024

github-actions bot commented Feb 19, 2024

dghubble commented Feb 19, 2024

dghubble commented Sep 7, 2023 •

edited

Loading

zhurkin commented Sep 14, 2023 •

edited

Loading

dghubble commented Sep 23, 2023 •

edited

Loading

dghubble commented Oct 20, 2023 •

edited

Loading

dghubble commented Oct 20, 2023 •

edited

Loading

n-able-consulting commented Dec 5, 2023 •

edited

Loading