New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node clusterpool Endpoint CIDRs used in AWS ENI mode erroneously #22273
Comments
Is this causing connectivity issues in your environment or just that the log msgs warning? |
The initial reason to start poking around was that ipv4 masquerading(iptables) wasn't working, and then the odd routes attempting to be inserted were noticed when looking thru the logs(caused by autodirectnoderoutes). They happen not to align with any legit routes for now, but I don't fully understand what will happen when they hit ranges of the 10 space that we do use. But the main direct breakage we've detected is ENI mode* with iptables masquerading seems broken due to the fact the NAT rules inserted reference these node ranges which are totally fictional. So far the best we can tell BFP masquerading does seem to work. *ENI mode for us also means using separate pod and host subnets(same VPC), along with prefix delegation. |
Oh and here is an example of the broken NAT iptable:
you can see the 10.243.0.0/16 source address, which isn't either our host network (10.196.0.0/16) or the pod network (10.10.0.0/16) but instead appears to be a vestigial bit of the Cluster Pool endpoint CIDR range code.
|
This issue has been automatically marked as stale because it has not |
This issue has been automatically marked as stale because it has not |
This issue has been automatically marked as stale because it has not |
Related with #9409 |
Quick triage on this: Overall I agree that we suffer from assumptions that there is a single pod CIDR, which of course does not hold in AWS ENI mode. However, we also particularly don't support certain features in ENI mode that rely on the "there is a pod CIDR" assumption, namely:
cilium/pkg/datapath/iptables/iptables.go Lines 1228 to 1237 in 11828af
cilium/pkg/datapath/iptables/iptables.go Lines 1196 to 1203 in 11828af
|
By default, in the iptables-based masquerading mode, Cilium will only masquerade traffic coming from the local pod CIDR (`allocRange` in `installMasqueradeRules`). However, many IPAM modes such as ENI or multi-pool IPAM do not have a single pod CIDR. Instead, those modes rely on the `egress-masquerade-interfaces` setting, which masquerades all traffic if it leaves one of the `egress-masquerade-interfaces` devices. Therefore, the "exclude traffic to cluster nodes from masquerade" `CILIUM_POST_nat` rule should also respect the `egress-masquerade-interfaces` setting and not masquerade traffic regardless of the value of `allocRange` (which will not be valid in settings such as ENI mode). This likely has not manifested in ENI mode as an issue, because in ENI mode we derive the native routing CIDR (`snatDstExclusionCIDR` in `installMasqueradeRules`) from the EC2 VPC CIDR, which usually contains the node IPs too. However, we should not rely on that, since we are adding additional non-podCIDR based IPAM modes such as multi-pool where this will not be true. Related: cilium#22273 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
By default, in the iptables-based masquerading mode, Cilium will only masquerade traffic coming from the local pod CIDR (`allocRange` in `installMasqueradeRules`). However, many IPAM modes such as ENI or multi-pool IPAM do not have a single pod CIDR. Instead, those modes rely on the `egress-masquerade-interfaces` setting, which masquerades all traffic if it leaves one of the `egress-masquerade-interfaces` devices. Therefore, the "exclude traffic to cluster nodes from masquerade" `CILIUM_POST_nat` rule should also respect the `egress-masquerade-interfaces` setting and not masquerade traffic regardless of the value of `allocRange` (which will not be valid in settings such as ENI mode). This likely has not manifested in ENI mode as an issue, because in ENI mode we derive the native routing CIDR (`snatDstExclusionCIDR` in `installMasqueradeRules`) from the EC2 VPC CIDR, which usually contains the node IPs too. However, we should not rely on that, since we are adding additional non-podCIDR based IPAM modes such as multi-pool where this will not be true. Related: #22273 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This issue has been automatically marked as stale because it has not |
This issue has been automatically marked as stale because it has not |
This issue has not seen any activity since it was marked stale. |
Is there an existing issue for this?
What happened?
When using Cilium 1.12.4 in native AWS ENI mode (not chaining), if you login to a cilium pod you can see the following:
That endpoint CIDR is interesting, since its not any range we've specified (10.196.x.x or 10.10.x.x). That range looks like its from here:
cilium/pkg/defaults/node.go
Line 14 in 1f34277
But since we are using ENI mode, there aren't Cluster pools to set - and nor can they BE set via the configmap (or helm).
The problem is that this node data is used in various places - one of them is iptables masquerading:
cilium/pkg/datapath/iptables/iptables.go
Line 1435 in 1f34277
...and so this inserts IPs into the masq iptables that don't route or don't participate in any way. This would make sense when using cluster pools - but not ENI mode (BPF masq mode seems fine in this case)
another place is here:
cilium/pkg/datapath/linux/node.go
Line 282 in 2fe3a92
In this case it doesn't break anything but it is trying to generate routes that shouldn't be generated from those 10.x.0.0/16 ranges as DEST:
there might be other code locations that I didn't find
It seems there is a notable mismatch about the node IPAM pool of IP addresses in ClusterPool mode, and when you are using ENI mode(where clusterpool CIDR addresses are still assigned to this node datastructure). So either these particular codepaths need to be modernized to handle AWS ENI usecase, or maybe just feed the node CIDR something will "just work".
I'd propose in AWS ENI mode, the Node Endpoint CIDR should be set to what podCIDR is for EACH node (ipv4 and v6) instead of these made up 10.x.0.0 addresses that end up have odd side effects in random places.
In our case I'd expect to see this - and if so would expect iptables masquerade to work (and those route errors to go away)
Cilium Version
1.12.4 (and 1.12.3)
Kernel Version
5.15.0-1022-aws
Kubernetes Version
1.22.15
Sysdump
cilium-sysdump-20221118-221157.zip
used --quick otherwise wouldn't fit in 25MB
Relevant log output
Anything else?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: