Node clusterpool Endpoint CIDRs used in AWS ENI mode erroneously #22273

jkulesa · 2022-11-19T06:12:59Z

Is there an existing issue for this?

I have searched the existing issues

What happened?

When using Cilium 1.12.4 in native AWS ENI mode (not chaining), if you login to a cilium pod you can see the following:

$ cilium node list
Name                                           IPv4 Address     Endpoint CIDR   IPv6 Address   Endpoint CIDR
ip-10-196-160-103.us-west-2.compute.internal   10.196.160.103   10.103.0.0/16
ip-10-196-161-128.us-west-2.compute.internal   10.196.161.128   10.128.0.0/16

That endpoint CIDR is interesting, since its not any range we've specified (10.196.x.x or 10.10.x.x). That range looks like its from here:

cilium/pkg/defaults/node.go

Line 14 in 1f34277

DefaultIPv4Prefix = "10.%d.0.1"

But since we are using ENI mode, there aren't Cluster pools to set - and nor can they BE set via the configmap (or helm).

The problem is that this node data is used in various places - one of them is iptables masquerading:

cilium/pkg/datapath/iptables/iptables.go

Line 1435 in 1f34277

node.GetIPv4AllocRange().String(),

...and so this inserts IPs into the masq iptables that don't route or don't participate in any way. This would make sense when using cluster pools - but not ENI mode (BPF masq mode seems fine in this case)

another place is here:

cilium/pkg/datapath/linux/node.go

Line 282 in 2fe3a92

if routeSpec, err := installDirectRoute(cidr, newIP); err != nil {

In this case it doesn't break anything but it is trying to generate routes that shouldn't be generated from those 10.x.0.0/16 ranges as DEST:

level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: 10.194.0.0/16 Src: <nil> Gw: 10.196.164.194 Flags: [] Table: 0 Realm: 0}" error="route to destination 10.196.164.194 contains gateway 10.196.162.1, must be directly reachable" subsys=linux-datapath
level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: 10.103.0.0/16 Src: <nil> Gw: 10.196.160.103 Flags: [] Table: 0 Realm: 0}" error="route to destination 10.196.160.103 contains gateway 10.196.162.1, must be directly reachable" subsys=linux-datapath
level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: 10.128.0.0/16 Src: <nil> Gw: 10.196.161.128 Flags: [] Table: 0 Realm: 0}" error="route to destination 10.196.161.128 contains gateway 10.196.162.1, must be directly reachable" subsys=linux-datapath
level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: 10.216.0.0/16 Src: <nil> Gw: 10.196.165.216 Flags: [] Table: 0 Realm: 0}" error="route to destination 10.196.165.216 contains gateway 10.196.162.1, must be directly reachable" subsys=linux-datapath

there might be other code locations that I didn't find

It seems there is a notable mismatch about the node IPAM pool of IP addresses in ClusterPool mode, and when you are using ENI mode(where clusterpool CIDR addresses are still assigned to this node datastructure). So either these particular codepaths need to be modernized to handle AWS ENI usecase, or maybe just feed the node CIDR something will "just work".

I'd propose in AWS ENI mode, the Node Endpoint CIDR should be set to what podCIDR is for EACH node (ipv4 and v6) instead of these made up 10.x.0.0 addresses that end up have odd side effects in random places.

In our case I'd expect to see this - and if so would expect iptables masquerade to work (and those route errors to go away)

$ cilium node list
Name                                           IPv4 Address     Endpoint CIDR   IPv6 Address   Endpoint CIDR
ip-10-196-160-103.us-west-2.compute.internal   10.196.160.103   10.10.0.0/16
ip-10-196-161-128.us-west-2.compute.internal   10.196.161.128   10.10.0.0/16

Cilium Version

1.12.4 (and 1.12.3)

Kernel Version

5.15.0-1022-aws

Kubernetes Version

1.22.15

Sysdump

cilium-sysdump-20221118-221157.zip

used --quick otherwise wouldn't fit in 25MB

Relevant log output

level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: 10.194.0.0/16 Src: <nil> Gw: 10.196.164.194 Flags: [] Table: 0 Realm: 0}" error="route to destination 10.196.164.194 contains gateway 10.196.162.1, must be directly reachable" subsys=linux-datapath
level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: 10.103.0.0/16 Src: <nil> Gw: 10.196.160.103 Flags: [] Table: 0 Realm: 0}" error="route to destination 10.196.160.103 contains gateway 10.196.162.1, must be directly reachable" subsys=linux-datapath
level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: 10.128.0.0/16 Src: <nil> Gw: 10.196.161.128 Flags: [] Table: 0 Realm: 0}" error="route to destination 10.196.161.128 contains gateway 10.196.162.1, must be directly reachable" subsys=linux-datapath
level=warning msg="Unable to install direct node route {Ifindex: 0 Dst: 10.216.0.0/16 Src: <nil> Gw: 10.196.165.216 Flags: [] Table: 0 Realm: 0}" error="route to destination 10.196.165.216 contains gateway 10.196.162.1, must be directly reachable" subsys=linux-datapath

Anything else?

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

christarazi · 2022-11-21T07:25:33Z

Is this causing connectivity issues in your environment or just that the log msgs warning?

jkulesa · 2022-11-21T17:48:23Z

The initial reason to start poking around was that ipv4 masquerading(iptables) wasn't working, and then the odd routes attempting to be inserted were noticed when looking thru the logs(caused by autodirectnoderoutes). They happen not to align with any legit routes for now, but I don't fully understand what will happen when they hit ranges of the 10 space that we do use. But the main direct breakage we've detected is ENI mode* with iptables masquerading seems broken due to the fact the NAT rules inserted reference these node ranges which are totally fictional.

So far the best we can tell BFP masquerading does seem to work.

*ENI mode for us also means using separate pod and host subnets(same VPC), along with prefix delegation.

jkulesa · 2022-11-21T19:58:55Z

Oh and here is an example of the broken NAT iptable:

Chain CILIUM_POST_nat (1 references)
target     prot opt source               destination         
ACCEPT     all  --  10.243.0.0/16        0.0.0.0/0            match-set cilium_node_set_v4 dst /* exclude traffic to cluster nodes from masquerade */
MASQUERADE  all  --  10.243.0.0/16       !10.196.0.0/16        /* cilium masquerade non-cluster */
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            mark match 0xa00/0xe00 /* exclude proxy return traffic from masquerade */
SNAT       all  --  127.0.0.1            0.0.0.0/0            /* cilium host->cluster from 127.0.0.1 masquerade */ to:10.10.78.66

you can see the 10.243.0.0/16 source address, which isn't either our host network (10.196.0.0/16) or the pod network (10.10.0.0/16) but instead appears to be a vestigial bit of the Cluster Pool endpoint CIDR range code.

$ cilium node list
Name                                           IPv4 Address     Endpoint CIDR   IPv6 Address   Endpoint CIDR
...        
ip-10-196-162-243.us-west-2.compute.internal   10.196.162.243   10.243.0.0/16                  
...

github-actions · 2023-01-21T01:59:24Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2023-03-25T01:53:43Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2023-05-27T01:52:12Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

aanm · 2023-05-30T15:55:53Z

Related with #9409

gandro · 2023-06-21T09:43:12Z

Quick triage on this:

Overall I agree that we suffer from assumptions that there is a single pod CIDR, which of course does not hold in AWS ENI mode.

However, we also particularly don't support certain features in ENI mode that rely on the "there is a pod CIDR" assumption, namely:

installDirectRoute should not be an issue here. It is only used with autoDirectNodeRoutes=true, which we never supported in ENI mode and which is also not needed (since the EC2 fabric does the routing for us)
With iptables-based masquerading, our documentation requires egressMasqueradeInterfaces to be set. This will make the code not use the pod CIDR for the cilium masquerade non-cluster rule:

cilium/pkg/datapath/iptables/iptables.go

Lines 1228 to 1237 in 11828af

    
           if option.Config.EgressMasqueradeInterfaces != "" { 
        
           	progArgs = append( 
        
           		progArgs, 
        
           		"-o", option.Config.EgressMasqueradeInterfaces) 
        
           } else { 
        
           	progArgs = append( 
        
           		progArgs, 
        
           		"-s", allocRange, 
        
           		"!", "-o", "cilium_+") 
        
           }

But this is poorly documented at the moment, and I think we should also make that limitation clear in the ENI concepts page. However, it seems that the exclude traffic to cluster nodes from masquerade rule that we install, is actually buggy, since it unconditionally uses the pod CIDR to create the masquerading rule. We probably should respect egressMasqueradeInterfaces there as well:

cilium/pkg/datapath/iptables/iptables.go

Lines 1196 to 1203 in 11828af

    
           if err := prog.runProg([]string{ 
        
           	"-t", "nat", 
        
           	"-A", ciliumPostNatChain, 
        
           	"-s", allocRange, 
        
           	"-m", "set", "--match-set", prog.getIpset(), "dst", 
        
           	"-m", "comment", "--comment", "exclude traffic to cluster nodes from masquerade", 
        
           	"-j", "ACCEPT"}); err != nil { 
        
           	return err

By default, in the iptables-based masquerading mode, Cilium will only masquerade traffic coming from the local pod CIDR (`allocRange` in `installMasqueradeRules`). However, many IPAM modes such as ENI or multi-pool IPAM do not have a single pod CIDR. Instead, those modes rely on the `egress-masquerade-interfaces` setting, which masquerades all traffic if it leaves one of the `egress-masquerade-interfaces` devices. Therefore, the "exclude traffic to cluster nodes from masquerade" `CILIUM_POST_nat` rule should also respect the `egress-masquerade-interfaces` setting and not masquerade traffic regardless of the value of `allocRange` (which will not be valid in settings such as ENI mode). This likely has not manifested in ENI mode as an issue, because in ENI mode we derive the native routing CIDR (`snatDstExclusionCIDR` in `installMasqueradeRules`) from the EC2 VPC CIDR, which usually contains the node IPs too. However, we should not rely on that, since we are adding additional non-podCIDR based IPAM modes such as multi-pool where this will not be true. Related: cilium#22273 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

By default, in the iptables-based masquerading mode, Cilium will only masquerade traffic coming from the local pod CIDR (`allocRange` in `installMasqueradeRules`). However, many IPAM modes such as ENI or multi-pool IPAM do not have a single pod CIDR. Instead, those modes rely on the `egress-masquerade-interfaces` setting, which masquerades all traffic if it leaves one of the `egress-masquerade-interfaces` devices. Therefore, the "exclude traffic to cluster nodes from masquerade" `CILIUM_POST_nat` rule should also respect the `egress-masquerade-interfaces` setting and not masquerade traffic regardless of the value of `allocRange` (which will not be valid in settings such as ENI mode). This likely has not manifested in ENI mode as an issue, because in ENI mode we derive the native routing CIDR (`snatDstExclusionCIDR` in `installMasqueradeRules`) from the EC2 VPC CIDR, which usually contains the node IPs too. However, we should not rely on that, since we are adding additional non-podCIDR based IPAM modes such as multi-pool where this will not be true. Related: #22273 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

github-actions · 2023-08-21T01:45:24Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2023-10-21T01:44:39Z

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

github-actions · 2023-11-04T01:45:15Z

This issue has not seen any activity since it was marked stale.
Closing.

jkulesa added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Nov 19, 2022

jkulesa changed the title ~~Cilium Endpoint CIDRs used in AWS ENI mode when ipv4NativeRoutingCIDR should be used instead~~ Cilium Endpoint CIDRs used in AWS ENI mode when podCIDR should be used instead Nov 19, 2022

jkulesa changed the title ~~Cilium Endpoint CIDRs used in AWS ENI mode when podCIDR should be used instead~~ Node Endpoint CIDRs used in AWS ENI mode when podCIDR should be used instead Nov 19, 2022

jkulesa changed the title ~~Node Endpoint CIDRs used in AWS ENI mode when podCIDR should be used instead~~ Node clusterpool Endpoint CIDRs used in AWS ENI mode erroneously Nov 19, 2022

christarazi added area/eni Impacts ENI based IPAM. sig/ipam IP address management, including cloud IPAM sig/agent Cilium agent related. labels Nov 21, 2022

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 21, 2023

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 23, 2023

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 25, 2023

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 27, 2023

This was referenced May 11, 2023

Add support for allocating PodCIDRs from multiple IPAM pools #22762

Merged

IPAM Multi-Pool Follow-ups #25470

Closed

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 27, 2023

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 30, 2023

mathpl assigned learnitall and unassigned learnitall May 30, 2023

gandro mentioned this issue Jun 21, 2023

iptables: Fix wrong use of podCIDR in cluster node NAT exclusion #26397

Merged

mathpl mentioned this issue Jul 17, 2023

ipam: Fix invalid PodCIDR in CiliumNode in ENI/Azure/MultiPool mode #26663

Merged

gandro mentioned this issue Jul 24, 2023

Remove reliance on IPv4/IPv6AllocRange in Cilium-Agent #27018

Open

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Aug 21, 2023

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Aug 21, 2023

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Oct 21, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node clusterpool Endpoint CIDRs used in AWS ENI mode erroneously #22273

Node clusterpool Endpoint CIDRs used in AWS ENI mode erroneously #22273

jkulesa commented Nov 19, 2022

christarazi commented Nov 21, 2022

jkulesa commented Nov 21, 2022

jkulesa commented Nov 21, 2022 •

edited

github-actions bot commented Jan 21, 2023

github-actions bot commented Mar 25, 2023

github-actions bot commented May 27, 2023

aanm commented May 30, 2023

gandro commented Jun 21, 2023

github-actions bot commented Aug 21, 2023

github-actions bot commented Oct 21, 2023

github-actions bot commented Nov 4, 2023

Node clusterpool Endpoint CIDRs used in AWS ENI mode erroneously #22273

Node clusterpool Endpoint CIDRs used in AWS ENI mode erroneously #22273

Comments

jkulesa commented Nov 19, 2022

Is there an existing issue for this?

What happened?

Cilium Version

Kernel Version

Kubernetes Version

Sysdump

Relevant log output

Anything else?

Code of Conduct

christarazi commented Nov 21, 2022

jkulesa commented Nov 21, 2022

jkulesa commented Nov 21, 2022 • edited

github-actions bot commented Jan 21, 2023

github-actions bot commented Mar 25, 2023

github-actions bot commented May 27, 2023

aanm commented May 30, 2023

gandro commented Jun 21, 2023

github-actions bot commented Aug 21, 2023

github-actions bot commented Oct 21, 2023

github-actions bot commented Nov 4, 2023

jkulesa commented Nov 21, 2022 •

edited