Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipam: Fix invalid PodCIDR in CiliumNode in ENI/Azure/MultiPool mode #26663

Merged
merged 2 commits into from Aug 29, 2023

Conversation

gandro
Copy link
Member

@gandro gandro commented Jul 6, 2023

This commit fixes an issue where Cilium-Agent would announce its internal auto-generated IPv4/IPv6AllocRange by writing it into the local CiliumNode resource.

Cilium-Agents IPv4/IPv6AllocRange is Cilium's internal representation of the local node's PodCIDR. The concept of a single allocation CIDR however is outdated, since in many IPAM modes (ENI, Azure, AlibabaCloud, MultiPool) there either is no PodCIDR to begin with (and pod IPs are allocated using a different scheme), or there are multiple PodCIDRs (in the case of MultiPool).

Before this commit, cilium-agent used to copy its internal PodCIDR into the CiliumNode resource, but this behavior is only correct in Kubernetes IPAM mode. In Kubernetes IPAM mode, the IPv4/IPv6AllocRange is taken from the Kubernetes Node resource, and thus it makes sense to announce it to other cilium-agent instances via the CiliumNode resource. This field then is for example used by auto-direct-node-routes to install routes for direct routing mode.

However, in all other IPAM modes that we have, copying the internal PodCIDR into the the CiliumNode resource does not make sense:

  • In ClusterPool mode, the CiliumNode PodCIDR field is populated by cilium-operator. Therefore, we should try to overwrite it ourselves. This behavior is unchanged by this commit.
  • In all other current IPAM modes, the CiliumNode PodCIDR field is not used. Instead we use fields specific to those IPAM modes (c.f. CiliumNode.Spec.IPAM). In those modes, the previous code therefore wrote a auto-generated but otherwise IPv4/IPv6AllocRange into the field. That auto-generated PodCIDR is not used for pod IP allocation in those modes, so it does not make sense to announce it either.

The announced auto-generated PodCIDR is causing issues in IPAM MultiPool mode. With this commit, we will no longer install invalid routes for the auto-generated PodCIDR. This was mostly harmless (since the auto-generated PodCIDR rarely overlapped with the real ones), but still a potential cause for confusion and bugs.

In addition, ENI-mode users have reported that they are confused by the this "fake" PodCIDR, which is understandable, since that PodCIDR is meaningless in ENI mode (#9409).

Long term, we want to remove the concept of the IPv4/IPv6AllocCIDR completely (and remove the auto-generation code), but this is a first incremental step towards that goal.

Fixes: #9409

Fix potential cross-node connectivity issue when IPsec is enabled with ENI or Azure IPAM modes.

@gandro gandro added release-note/bug This PR fixes an issue in a previous release of Cilium. sig/ipam IP address management, including cloud IPAM area/ipam Impacts IP address management functionality. needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Jul 6, 2023
@gandro gandro requested a review from a team as a code owner July 6, 2023 09:09
@gandro gandro requested a review from thorn3r July 6, 2023 09:09
@gandro gandro force-pushed the pr/gandro/fix-cilium-node-pod-cidr branch from 9514070 to c60bf41 Compare July 6, 2023 09:13
@gandro
Copy link
Member Author

gandro commented Jul 6, 2023

/test

@gandro
Copy link
Member Author

gandro commented Jul 6, 2023

@gandro
Copy link
Member Author

gandro commented Jul 6, 2023

/ci-aks

@gandro gandro mentioned this pull request Jul 5, 2023
29 tasks
@gandro
Copy link
Member Author

gandro commented Jul 6, 2023

ci-eks keeps suspiciously failing with the same error. Moving back to draft until this has been sorted out

Sysdump does not reveal much so far. What is interesting is that I don't see the failing flows in Hubble. In both cases, Cilium failed with IPSec and I do see xfrm errors (though this could either be cause or symptom)

  [118] The following Agents have xfrm errors:
	- kube-system/cilium-7bv5m (policy_blocked [outbound] 6.00)
	- kube-system/cilium-xcnsz (policy_blocked [outbound] 3.00)
        IpSec xfrm errors detected.
        CodeXfrmErrors

@gandro gandro marked this pull request as draft July 6, 2023 15:49
@christarazi christarazi added the area/daemon Impacts operation of the Cilium daemon. label Jul 6, 2023
Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the detailed commit msg, looks great 👍

@gandro
Copy link
Member Author

gandro commented Jul 10, 2023

The IPSec failures seem legitimate. It seems we do not setup IPSec if the node does not have a podCIDR, even in ENI mode:

if !n.nodeConfig.EnableIPv4 || newNode.IPv4AllocCIDR == nil {

@gandro gandro force-pushed the pr/gandro/fix-cilium-node-pod-cidr branch from c60bf41 to 95b3512 Compare July 10, 2023 09:57
@gandro
Copy link
Member Author

gandro commented Jul 10, 2023

/test

@mathpl
Copy link

mathpl commented Jul 17, 2023

Does this relate to #22273 @gandro?

@gandro
Copy link
Member Author

gandro commented Jul 17, 2023

Does this relate to #22273 @gandro?

Yes, that issue too (we have to many issues caused by the invalid PodCIDRs 😞 )

@gandro
Copy link
Member Author

gandro commented Jul 17, 2023

This is currently in draft because it affects the IPSec code base, which is currently also undergoing a refactor. Paul asked me to hold off with this change until the IPSec refactor dust has settled.

@gandro gandro added the dont-merge/blocked Another PR must be merged before this one. label Jul 17, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.0 Jul 17, 2023
@aanm aanm added this to Needs backport from main in 1.14.1 Jul 26, 2023
@aanm aanm removed this from Needs backport from main in 1.14.0 Jul 26, 2023
@nebril nebril added this to Needs backport from main in 1.14.2 Aug 10, 2023
@nebril nebril removed this from Needs backport from main in 1.14.1 Aug 10, 2023
@pchaigno
Copy link
Member

So I suspect this is just a flake where DNS somehow timed out, we've seen apparently similar flakes before: #26321

I agree it's unlikely caused by this PR. Nevertheless, the fact that it happens in both cases in the key rotation section which we've just added makes me wonder if it's not a new flake caused by the key rotation changes. Might be worth opening a flake issue, just so we can track it.

@gandro gandro added the needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch label Aug 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.2 Aug 29, 2023
@gandro
Copy link
Member Author

gandro commented Aug 29, 2023

IPSec e2e test failed again, this time with a different symptom though. I'll investigate.

@gandro
Copy link
Member Author

gandro commented Aug 29, 2023

So I suspect this is just a flake where DNS somehow timed out, we've seen apparently similar flakes before: #26321

I agree it's unlikely caused by this PR. Nevertheless, the fact that it happens in both cases in the key rotation section which we've just added makes me wonder if it's not a new flake caused by the key rotation changes. Might be worth opening a flake issue, just so we can track it.

I've created an issue. Pretty sure it's unrelated since I also found scheduled runs with the same error. #27799


Triage of new failures:

Setup & Test (1, 4.19-20230810.091425, iptables, false, vxlan, ipsec, false)

   [.] Action [echo-ingress-knp/pod-to-pod/curl-ipv6-3: cilium-test/client2-646b88fb9b-g945g (fd00:10:244:3::9397) -> cilium-test/echo-same-node-59dc776cf5-qhn8t (fd00:10:244:3::7973:8080)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --output /dev/null --connect-timeout 2 --max-time 10 http://[fd00:10:244:3::7973]:8080" failed: command terminated with exit code 28

There are some weird RSTs coming from-host with the host identity. This is weird:

Aug 29 11:49:03.669: cilium-test/client2-646b88fb9b-g945g:49480 (ID:4992) <> cilium-test/echo-same-node-59dc776cf5-qhn8t:8080 (ID:41938) from-endpoint FORWARDED (TCP Flags: SYN)
Aug 29 11:49:03.669: cilium-test/client2-646b88fb9b-g945g:49480 (ID:4992) -> cilium-test/echo-same-node-59dc776cf5-qhn8t:8080 (ID:41938) policy-verdict:L3-Only INGRESS ALLOWED (TCP Flags: SYN)
Aug 29 11:49:03.669: cilium-test/client2-646b88fb9b-g945g:49480 (ID:4992) -> cilium-test/echo-same-node-59dc776cf5-qhn8t:8080 (ID:41938) to-endpoint FORWARDED (TCP Flags: SYN)
Aug 29 11:49:03.669: cilium-test/echo-same-node-59dc776cf5-qhn8t:8080 (ID:41938) <> cilium-test/client2-646b88fb9b-g945g:49480 (ID:4992) from-endpoint FORWARDED (TCP Flags: SYN, ACK)
Aug 29 11:49:03.669: cilium-test/client2-646b88fb9b-g945g:49480 (host) <> cilium-test/echo-same-node-59dc776cf5-qhn8t:8080 (ID:41938) from-host FORWARDED (TCP Flags: RST)
Aug 29 11:49:03.669: cilium-test/client2-646b88fb9b-g945g:49480 (ID:4992) -> cilium-test/echo-same-node-59dc776cf5-qhn8t:8080 (ID:41938) to-endpoint FORWARDED (TCP Flags: RST)

Edit: This looks very much like #27759 (comment) which also has some "inexplicable RSTs"


Setup & Test (2, 5.4-20230810.091425, iptables, false, disabled, ipsec, false)

[.] Action [echo-ingress-mutual-auth-spiffe/pod-to-pod/curl-ipv4-2: cilium-test/client2-646b88fb9b-vgmc9 (10.244.2.188) -> cilium-test/echo-other-node-f884df9df-cf7mg (10.244.1.142:8080)]
  ❌ command "curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --output /dev/null --connect-timeout 2 --max-time 10 http://10.244.1.142:8080/" failed: command terminated with exit code 28

There are no RSTs in this trace (at least not related to IP 10.244.2.188), but I see that the the connection eventually succeeds, so I'm not sure why the test failed:

Aug 29 11:35:27.757: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:22938) <> cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) to-host FORWARDED (TCP Flags: SYN)
Aug 29 11:35:27.757: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:9) <> cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) from-host FORWARDED (TCP Flags: SYN)
Aug 29 11:35:27.757: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:22938) <> cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) policy-verdict:L3-L4 INGRESS DENIED (TCP Flags: SYN; Auth: SPIRE)
Aug 29 11:35:27.757: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:22938) <> cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) Authentication required DROPPED (TCP Flags: SYN)
Aug 29 11:35:28.784: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:22938) <> cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) from-stack FORWARDED (TCP Flags: SYN)
Aug 29 11:35:28.784: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:22938) <> cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) to-host FORWARDED (TCP Flags: SYN)
Aug 29 11:35:28.784: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:9) <> cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) from-host FORWARDED (TCP Flags: SYN)
Aug 29 11:35:28.784: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:22938) -> cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) to-endpoint FORWARDED (TCP Flags: SYN)
Aug 29 11:35:28.784: cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) <> cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:22938) from-endpoint FORWARDED (TCP Flags: SYN, ACK)
Aug 29 11:35:28.784: cilium-test/client2-646b88fb9b-vgmc9:33116 (ID:22938) <- cilium-test/echo-other-node-f884df9df-cf7mg:8080 (ID:10042) to-stack FORWARDED (TCP Flags: SYN, ACK)

Edit: This looks like #26351 (comment)

@gandro
Copy link
Member Author

gandro commented Aug 29, 2023

Given that I don't see anything wrong with the routes, and both symptoms have been observed before (see links), I'll restart again. I don't know how to otherwise troubleshoot those failures further

@gandro
Copy link
Member Author

gandro commented Aug 29, 2023

/ci-ipsec-e2e

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 29, 2023
@aditighag aditighag merged commit 1568558 into cilium:main Aug 29, 2023
60 checks passed
@jibi jibi mentioned this pull request Sep 4, 2023
16 tasks
@gandro gandro added the backport/author The backport will be carried out by the author of the PR. label Sep 4, 2023
@gandro gandro added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Sep 4, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.2 Sep 4, 2023
@michi-covalent michi-covalent added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Sep 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.2 Sep 9, 2023
@pchaigno pchaigno added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. needs-backport/1.12 labels Jan 9, 2024
@giorio94 giorio94 added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.12 backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. area/ipam Impacts IP address management functionality. backport/author The backport will be carried out by the author of the PR. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/ipam IP address management, including cloud IPAM
Projects
No open projects
1.14.2
Backport done to v1.14
Development

Successfully merging this pull request may close these issues.

ENI: Do not fill ciliumnodes with allocated podCIDRs in ENI mode
9 participants