-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cilium 1.14.0 AKS BYOCNI - Connections from pods to IMDS randomly blocked #27536
Comments
We tried installing cilium 1.14.1 in one of our clusters, and the issue seems to be resolved. We suspect the issue might have something to do with the following bug that was fixed in 1.14.1: #27327 Would be nice if someone with a bit more insight could have a look and see if the pr could be related to the issue. |
#27327 typically triggers ~10m after startup and starts to cause connectivity impacts for traffic that are allowed by CIDR or ToFQDNs policy, so the policy you've pasted could be impacted. It's mitigated by touching/updating that policy such that the agents pick it up again. It's a good sign if v1.14.1 is no longer exhibiting the problem. I'll close this for now, but if you do observe this behaviour or another issue in future, feel free to comment so we can reopen this one, or file a new issue with the details. |
We are experiencing the same issue with cilium 1.14.1. We are using chaining mode with AWS ENI though. The clusterwide policy looks as follows:
The cilium endpoint looks as follows:
The traffic going to Should I open a new issue or will we use this one? |
@opaetzel I'd suggest opening a fresh issue. At a glance that might be more of a problem where the IP's scope (in-cluster vs external) is being considered in a way that excludes it from CIDR policy. |
Is there an existing issue for this?
What happened?
After both fresh installation of cilium 1.14.0 and upgrade from 1.13.2, some pods in some deployments is blocked when trying to connect to the IMDS IP (169.254.169.254:80).
Some deployments with several replicas runs fine, and for some deployments only 1 replica is able to connect to the IMDS, where the rest of the replicas fails to connect.
Pods are evenly distributed across several nodes, and there is no apparent pattern in which nodes some pods run on and some fail on. A node can have both running pods and pods failing to connect to the IMDS at the same time (from the same deployment and between deployments).
We have a clusterwide policy in place which allows connections to the IMDS:
We only use allow-policies, no deny-policies.
The following shows a deployment where 2 of 3 replicas are running ok, and the 3 is failing (where the failing replica is on the same node as one of the running ones):
The following is the output of hubble observe that shows the relevant traffic (different deployment, same issue):
We have tried restarting all cilium ds/deployments to no avail. We have recreated the issue in different aks clusters.
Cilium Version
1.14.0 - AKS BYO CNI
Kernel Version
5.15.0-1042-azure
(aks image version AKSUbuntu-2204gen2containerd-202307.27.0)
Kubernetes Version
1.27.3
Sysdump
Relevant log output
No response
Anything else?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: