-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datapath: Fix ENI egress routing table for cilium_host IP #29335
Conversation
/ci-eks |
This commit extends the `Configure` method of `RoutingInfo` with a flag to skip the creation of the ingress rule. The ingress rule is needed for endpoints such that those are forwarded via the `main` routing table. But for the `cilium_host` (aka. router) IP, we want to route it via the `local` table (which would be skipped by the ingress rule). Without a lookup in the `local` routing table, Linux will not consider `cilium_host` to be an address of the local host, and for example not respond to ICMP requests. Note that this commit does not yet use `RoutingInfo.Configure` to set up the `cilium_host` IP, this will be done in the next commit. This commit here merely prepares the method for that and does not contain any functional changes by itself (which can be observed by the fact that all callers pass in `host=false`). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
On ENI, we install source-based egress routing rules that steer traffic from Cilium-mananged IPs (i.e. pods, but also the health IP, ingress IP and router IP) to the correct egress interface. For pod IPs, this is done in the CNI plugin: https://github.com/cilium/cilium/blob/7875f6acb5a2fd2b0e3e6c993c9995c0d322e55d/plugins/cilium-cni/interface.go#L59-L63 For ingress and health IP, this is done from cilium-agent: https://github.com/cilium/cilium/blob/ed20c8acde8c76d405d6c9fac3c9de44aa3bb403/daemon/cmd/ipam.go#L401-L405 https://github.com/cilium/cilium/blob/e49430286b5d63b00062758a10a2b37458f94525/cilium-health/launch/endpoint.go#L329-L333 For the `cilium_host` (aka router) IP however, this was done differently. Commit f34371c added a new `routing.SetupRules` function that duplicated parts of the `routing.RoutingInfo.Configure` logic, but missed a crucial part: Namely the creation of the per-ENI routing table that the source-based egress rule points towards. This means that if the `cilium_host` IP address was allocated from a different ENI than the pod, health and ingress IP addresses, that the routing table for that ENI was never created. This led to connectivity issues, in particular in combination with IPSec. This commit addresses that issue by having the `cilium_host` IP use the same code path as the other IP users: Using `RoutingInfo.Configure`. This not only fixes the bug, but removes some code that was otherwise only used for the router IP. There is one major difference between other users of `RoutingInfo.Configure` and the newly introduced use for the `cilium_host` IP: For the `cilium_host` IP, we skip the creation of the ingress rule (by passing in `host=true`), as otherwise the `cilium_host` IP would not be considered a local address of the host network namespace. This is consistent with the old `SetupRules` function did also not create such an ingress rule. Long-term, it remains questionable if the setup of egress rules in ENI mode should be left to IPAM clients, as every client seems to do it slightly differently. Maybe this is better done by either the IPAM subsystem or a separate device manager. Fixes: f34371c ("ipam: Add routes for cilium_host ENI address") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
954eebc
to
f4b9743
Compare
I have locally tested this by installing Cilium with:
And then adding a node to the cluster which does not have any pods scheduled on it (by scaling the node group). Since the Before (running
After (upgrade):
|
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Net negative LOC 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the fix, and the detailed commit description. 🚀
My first thought after reading the commit description was thinking about the case(s) where users would hit such an error condition as cilium_host
is one of the core interface that the agent manages.
I see that the fix is already marked for backports to older branches. 👍 Could you comment on how we could identify such issues? Was there a specific error message logged when the agent tried installing rules when the routing table wasn't created? (I suppose users could notice missing routing table(s) by running ip route
commands.
On a related note, how can we prevent such cases in future for cilium managed devices? Creating a routing table in specific error conditions while setting up ENI rules is a potential option.
The bug that was surfaced was that IPsec encrypted traffic was leaving on the wrong interface, and not on the egress interface the router IP was attached to. This lead to traffic being dropped by AWS network (which performs source IP validation). The wrong egress interface was expected, since the egress rule was defunct (without a routing table, it can't do anything). There was no log message, just broken IPsec encryption. Thus also the backport. The main reason why we did not discover this broken egress rule earlier is that it only surfaces under the following circumstances: The router IP must be allocated from a different ENI than the pod (workload) IPs. As soon as there is at least one pod IP sharing the same ENI as the router IP, the CNI ADD call (that led to to the pod IP being allocated) would created the missing table. Because IP allocation is random, the bug basically only surfaces if one has lots of pre-allocated IPs (e.g. using
Yeah, the idea of an ENI specific device manager has been floated around. I have an open TODO on my plate to write down my thoughts on this. |
On ENI, we install source-based egress routing rules that steer traffic from Cilium-mananged IPs (i.e. pods, but also the health IP, ingress IP and router IP) to the correct egress interface.
For pod IPs, this is done in the CNI plugin:
cilium/plugins/cilium-cni/interface.go
Lines 59 to 63 in 7875f6a
For ingress and health IP, this is done from cilium-agent:
cilium/daemon/cmd/ipam.go
Lines 401 to 405 in ed20c8a
cilium/cilium-health/launch/endpoint.go
Lines 329 to 333 in e494302
For the router (aka cilium_host) IP however, this was done differently. Commit f34371c added a new
routing.SetupRules
function that duplicated parts of therouting.RoutingInfo.Configure
logic, but missed a crucial part: Namely the creation of the per-ENI routing table that the source-based egress rule points towards.This means that if the cilium_host IP address was allocated from a different ENI than the pod, health and ingress IP addresses, that the routing table for that ENI was never created. This led to connectivity issues, in particular in combination with IPSec.
This commit addresses that issue by having the router IP use the same code path as the other IP users: Using
RoutingInfo.Configure
. This not only fixes the bug, but removes some code that was otherwise only used for the router IP.There is one major difference between other users of
RoutingInfo.Configure
and the newly introduced use for thecilium_host
IP: For thecilium_host
IP, we skip the creation of the ingress rule (by passing inhost=true
), as otherwise thecilium_host
IP would not be considered a local address of the host network namespace. This is consistent with the oldSetupRules
function did also not create such an ingress rule.Long-term, it remains questionable if the setup of egress rules in ENI mode should be left to IPAM clients, as every client seems to do it slightly differently. Maybe this is better done by either the IPAM subsystem or a separate device manager.
Fixes: f34371c ("ipam: Add routes for cilium_host ENI address")