-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterwideNetworkPolicy causes intermittent connection problems to k8s API #32698
Comments
Thanks for the report. Not having a sysdump is going to make it harder for us to diagnose unfortunately. Can you reproduce this if you upgrade to the latest point release? |
Hey @lmb, thanks for you answer! As mentioned, I've got quite few sysdumps. I just can't share them publicly. Should I share a link with you by mail? We will attempt the latest point release update soon, but have little hope since this issue persistent for quite some time now. |
Sorry but we can't take responsibility for your confidential data. If that is something you need consider one of the enterprise vendors. |
OK. I've attached one. Please let me know when you don't need it anymore. I will deleted it afterwards. Edit: I've also me able to rollout the .dot version by now. Took only a couple of minutes for the first node to went into NotReady state. Seems like this didn't solve this bug. |
This is a scary chicken-and-egg problem; the agent needs to be able to determine which IPs belong to the apiserver, and to do that, it needs access to the apiserver. I would suggest enabling matching nodes via CIDR and then allowing access to the set of possible apiserver IPs. |
@squeed Thanks for you answer! I think that would be an option for when the pods are restarting, but I'd like to understand why those nodes loose the |
That is indeed interesting. We should probably log enough information to catch this in debug mode. If you like, you can enable debug mode globally in your cluster (set |
@squeed Got some debug logs while the error showed up 🥳 I think the relevant parts are: $ ggrep -B2 -P '(^((?!EndpointSelector).)*kube-apiserver|Kubernetes service definition changed)' cilium-debug.log
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Processing 0 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="EndpointSlice kubernetes has 0 backends" subsys=k8s
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints= k8sNamespace=default k8sSvcName=kubernetes old-endpoints="109.68.224.35:30153/TCP" old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher
--
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Upserting IP into ipcache layer" identity="{16777231 custom-resource [] false true}" ipAddr=109.68.224.35/32 key=0 subsys=ipcache
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777231 custom-resource [] false true}" ipAddr="{109.68.224.35 ffffffff}" modification=Upsert subsys=datapath-ipcache
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="UpdateIdentities: Deleting identity" identity=16777339 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:104.0.0.0/5 cidr:108.0.0.0/6 cidr:108.0.0.0/7 cidr:109.0.0.0/8 cidr:109.0.0.0/9 cidr:109.64.0.0/10 cidr:109.64.0.0/11 cidr:109.64.0.0/12 cidr:109.64.0.0/13 cidr:109.68.0.0/14 cidr:109.68.0.0/15 cidr:109.68.0.0/16 cidr:109.68.128.0/17 cidr:109.68.192.0/18 cidr:109.68.224.0/19 cidr:109.68.224.0/20 cidr:109.68.224.0/21 cidr:109.68.224.0/22 cidr:109.68.224.0/23 cidr:109.68.224.0/24 cidr:109.68.224.0/25 cidr:109.68.224.0/26 cidr:109.68.224.32/27 cidr:109.68.224.32/28 cidr:109.68.224.32/29 cidr:109.68.224.32/30 cidr:109.68.224.34/31 cidr:109.68.224.35/32 cidr:64.0.0.0/2 cidr:96.0.0.0/3 cidr:96.0.0.0/4 reserved:kube-apiserver reserved:world]" subsys=policy
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Processing 1 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="EndpointSlice kubernetes has 1 backends" subsys=k8s
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints="109.68.224.35:30153/TCP" k8sNamespace=default k8sSvcName=kubernetes old-endpoints= old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Acquired service ID" backends="[109.68.224.35:30153]" l7LBFrontendPorts="[]" l7LBProxyPort=0 loadBalancerSourceRanges="[]" serviceID=145 serviceIP="{192.168.129.244 {TCP 30153} 0}" serviceName=kubernetes serviceNamespace=default sessionAffinity=false sessionAffinityTimeout=0 subsys=service svcExtTrafficPolicy=Cluster svcHealthCheckNodePort=0 svcIntTrafficPolicy=Cluster svcType=NodePort
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Deleting backends from session affinity match" backends="[]" serviceID=145 subsys=service
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Resolving identity" identityLabels="cidr:109.68.224.35/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="UpdateIdentities: Adding a new identity" identity=16777232 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:104.0.0.0/5 cidr:108.0.0.0/6 cidr:108.0.0.0/7 cidr:109.0.0.0/8 cidr:109.0.0.0/9 cidr:109.64.0.0/10 cidr:109.64.0.0/11 cidr:109.64.0.0/12 cidr:109.64.0.0/13 cidr:109.68.0.0/14 cidr:109.68.0.0/15 cidr:109.68.0.0/16 cidr:109.68.128.0/17 cidr:109.68.192.0/18 cidr:109.68.224.0/19 cidr:109.68.224.0/20 cidr:109.68.224.0/21 cidr:109.68.224.0/22 cidr:109.68.224.0/23 cidr:109.68.224.0/24 cidr:109.68.224.0/25 cidr:109.68.224.0/26 cidr:109.68.224.32/27 cidr:109.68.224.32/28 cidr:109.68.224.32/29 cidr:109.68.224.32/30 cidr:109.68.224.34/31 cidr:109.68.224.35/32 cidr:64.0.0.0/2 cidr:96.0.0.0/3 cidr:96.0.0.0/4 reserved:kube-apiserver reserved:world]" subsys=policy
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Short circuiting HTTP rules due to rule allowing all and no other rules needing attention" subsys=envoy-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="preparing new cache transaction: upserting 1 entries, deleting 0 entries" subsys=xds xdsCachedVersion=16 xdsTypeURL=type.googleapis.com/cilium.NetworkPolicy
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Resolving identity" identityLabels="cidr:109.68.224.35/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Waiting for proxy updates to complete..." subsys=endpoint-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Wait time for proxy updates: 56.608µs" subsys=endpoint-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Upserting IP into ipcache layer" identity="{16777232 kube-apiserver [] false true}" ipAddr=109.68.224.35/32 key=0 subsys=ipcache
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777232 kube-apiserver [] false true}" ipAddr="{109.68.224.35 ffffffff}" modification=Upsert subsys=datapath-ipcache
--
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="Processing 0 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="EndpointSlice kubernetes has 0 backends" subsys=k8s
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints= k8sNamespace=default k8sSvcName=kubernetes old-endpoints="109.68.224.35:30153/TCP" old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher The $ kubectl get endpoints kubernetes -n kube-system -w
NAME ENDPOINTS AGE
kubernetes 109.68.224.35:30153 94d
kubernetes <none> 94d
kubernetes 109.68.224.35:30153 94d
kubernetes <none> 94d
kubernetes 109.68.224.35:30153 94d |
This is a core functionality from Kubernetes. It removes the endpoint from the |
Exactly right. There's not much we can do in this scenario. If you would like to restrict access in this manner, you will need to set up something like a load balancer to ensure the apiserver has a consistent IP. |
@squeed That is a static IP provided by a Load Balancer in front of a Managed Kubernetes cluster thus the apiserver component is run separately with the cloud provider. The apiserver uses |
Indeed. That said, it seems that Cilium is doing the "right thing" -- or, at least what it has been told. Access is restricted to IPs know to be apiserver IPs, and that service is empty. I could imagine a world in which we have some hysteresis for the apiserver to prevent these sorts of issues, but those are only papering over the real problem: it is too easy a Host Firewall policy to lose access to the apiserver on upgrades or failovers. We should document this limitation. |
Thank you for clarifying! If I interpret your answer correct, this means: Don't use I'd currently consider dropping all +1 for documenting this and/or implementing something like a cache TTL for the entity <> CIDR mapping. |
No, the kube-apiserver entity works well. The bigger issue is: be careful when using Host Firewall. Specifically, how the kube-apiserver is defined must not rely on the kube-apiserver always being up.
ToFQDNs relies on layer 7 interception, which does not work for host firewall. You really have to allow access to the apiserver by 100% static means right now. |
Is there an existing issue for this?
What happened?
We're running Cilium with egress/ingress CiliumClusterwideNetworkPolicy/CiliumNetworkPolicy to ensure all traffic must be explicitly allowed:
This works fine - until it doesn't :)
From time to time, worker nodes are getting
NotReady
and stay there until manual interaction. Investigating deeper it turned out that the worker nodes aren't able to reach the Kubernetes API and thus kubelet/Cilium isn't functional anymore.Kubelet Log:
Cilium Agent Log:
With the help of
nerdctl
I was able to manually start the Cilium Agent container:This allowed me to interact with Cilium directly, but this wasn't much fruitful because the Cilium (and Monitor/...) socket isn't there yet - probably because the Kubernetes API still can't be reach:
But it gave some insights and showed that there isn't a matching Egress rule anymore for the IP of the Kubernetes API:
My guess (and I don't have much experience with Cilium/BPF or networking in general 🙈) what happening is: Cilium looses its mapping from
kube-apiserver
to the IP and isn't able to recover from that.Temporary workaround for us is:
This deletes the iptables jump to the bpf firewalling (?) and thus allows access to the Kubernetes API again. This allows Cilium to fully restart and bootstrap itself. Afterwards everythings seems to be back to normal.
Cilium Version
1.15.2
Kernel Version
5.15.0-100-generic
Kubernetes Version
1.28.7
Regression
Not sure when it started, but we're seeing it for a couple of months and mostly doing update relatively fast after release.
Sysdump
cilium-bugtool-20240524-131108.522+0000-UTC-3443698366.zip
Relevant log output
Anything else?
We have SSH access to broken nodes, logs in Loki and lots of metrics. If you need anything in particular to investigate this deeper, please let me know!
Also: It feels related to this issue -> #24502
Cilium Users Document
Code of Conduct
The text was updated successfully, but these errors were encountered: