Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws/eni: fix cilium operator crash on IPv6 ENI #22075

Merged

Conversation

bimmlerd
Copy link
Member

Cilium operator would crash when being brought up in an AWS region where there was a IPv6-only ENI and no subnet filters, because it would fail to parse the ENI (logs will show ENI has no IP address and Initial synchronization with instances API failed).

We work around this issue for the moment by filtering the network interfaces we fetch from AWS with 'private-ip-addresses=*', which includes all ENIs with any value in the PrivateIpAddress field. This is the field parseENI complains about otherwise.

In general, though, it seems that the ENI IPAM mode needs to learn to handle IPv6 ENIs. There are unsolved challenges (such as representing the IP pool as a list of addresses, which will not work for IPv6), however, which cannot be addressed quickly, so we work around it for now.

cc @gandro @chanieljdan @bmcustodio

Prevent cilium operator crash in AWS region with IPv6-only ENIs without subnet filters.

Cilium operator would crash when being brought up in an AWS region where
there was a IPv6-only ENI and no subnet filters, because it would fail
to parse the ENI (logs will show "ENI has no IP address" and "Initial
synchronization with instances API failed").

We work around this issue for the moment by filtering the network
interfaces we fetch from AWS with 'private-ip-addresses=*', which
includes all ENIs with any value in the PrivateIpAddress field. This is
the field `parseENI` complains about otherwise.

In general, though, it seems that the ENI IPAM mode needs to learn to
handle IPv6 ENIs. That will not be a small undertaking, so we fix the
obvious bug for now.

Co-authored-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd bimmlerd added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/eni Impacts ENI based IPAM. sig/ipam IP address management, including cloud IPAM labels Nov 10, 2022
@bimmlerd bimmlerd marked this pull request as ready for review November 10, 2022 09:23
@bimmlerd bimmlerd requested a review from a team as a code owner November 10, 2022 09:23
@bimmlerd
Copy link
Member Author

Should I also create a ticket for this to make it more discoverable?

Also, do we want to backport this?

@gandro
Copy link
Member

gandro commented Nov 10, 2022

I don't think we necessarily need an issue here, the PR already describes it well enough.

Let's backport this to the latest stable, as it is a regular bug fix.

@gandro gandro added needs-backport/1.12 affects/v1.11 This issue affects v1.11 branch affects/v1.10 This issue affects v1.10 branch labels Nov 10, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.12.4 Nov 10, 2022
@gandro
Copy link
Member

gandro commented Nov 10, 2022

/ci-eks

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find and thanks for the fix. I'm wondering if it would also be worthwhile to update the docs to mention that Cilium only supports IPv4 ENIs.

@bimmlerd bimmlerd requested review from a team as code owners November 14, 2022 10:08
@bimmlerd
Copy link
Member Author

Good find and thanks for the fix. I'm wondering if it would also be worthwhile to update the docs to mention that Cilium only supports IPv4 ENIs.

I've added a sentence. I guess we track this in #18405, and #19251 is related. Do we link to issues in the docs, e.g. something like "work on implementing this is tracked in xyz?"

@christarazi
Copy link
Member

@bimmlerd I've seen it done before, so yes.

@joestringer
Copy link
Member

joestringer commented Nov 15, 2022

Do we link to issues in the docs, e.g. something like "work on implementing this is tracked in xyz?"

Yes, there is even built-in syntax for it: :gh-issue:`19251` would link to #19251, for instance.

@christarazi
Copy link
Member

christarazi commented Nov 15, 2022

Ah yes, thanks Joe. I remember we had done it but I forgot about this incantation.

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd bimmlerd force-pushed the pr/bimmlerd/fix-ipv6-eni-crashing-operator branch from 7b52acb to 5c622fe Compare November 15, 2022 08:37
@michi-covalent michi-covalent added this to Needs backport from master in 1.12.5 Nov 16, 2022
@michi-covalent michi-covalent removed this from Needs backport from master in 1.12.4 Nov 16, 2022
@joestringer
Copy link
Member

joestringer commented Nov 16, 2022

I've filed #22217 for the ConformanceKind1.19 failure. This seems likely to be broken/flaky on master somehow.

@gandro gandro assigned bimmlerd and unassigned bimmlerd Nov 21, 2022
@jrajahalme jrajahalme added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Nov 24, 2022
@joestringer joestringer moved this from Needs backport from master to Backport done to v1.12 in 1.12.5 Dec 15, 2022
@bimmlerd bimmlerd deleted the pr/bimmlerd/fix-ipv6-eni-crashing-operator branch April 5, 2023 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.10 This issue affects v1.10 branch affects/v1.11 This issue affects v1.11 branch area/eni Impacts ENI based IPAM. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/ipam IP address management, including cloud IPAM
Projects
No open projects
1.12.5
Backport done to v1.12
Development

Successfully merging this pull request may close these issues.

None yet

6 participants