Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipsec: Reinitialize IPsec for new devices in ENI mode #25744

Merged

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented May 29, 2023

In ENI mode new network devices are added as more IPs are requested. For IPsec support we
require the bpf_network program to be loaded on any device that receives the IPsec traffic
to mark the packets for decryption.

Since Cilium uses the IP of the cilium_host device for the IPsec traffic the ENI device hosting
the IP requires bpf_network to be loaded onto it and since the cilium_host IP can change, this
PR watches for new network devices and reinitializes IPsec to make sure the bpf_network program
is loaded onto all used non-virtual devices.

Fix bug affecting EKS installations with IPsec encryption enabled, where Cilium wouldn't attach its IPsec BPF program to new ENI interfaces, resulting in connectivity loss between pods on remote nodes.  

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 29, 2023
@joamaki joamaki force-pushed the pr/joamaki/fix-attach-ipsec-to-eni branch from d4bde38 to eee3171 Compare May 29, 2023 13:32
@joamaki joamaki added the release-note/bug This PR fixes an issue in a previous release of Cilium. label May 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 29, 2023
@joamaki
Copy link
Contributor Author

joamaki commented May 29, 2023

/test

@joamaki joamaki requested review from gandro and pchaigno May 29, 2023 14:07
@joamaki joamaki added needs-backport/1.11 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels May 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.3 May 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.10 May 29, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.11.18 May 29, 2023
@joamaki
Copy link
Contributor Author

joamaki commented May 29, 2023

This fix was tested to be working on a EKS cluster with IPsec and ENI mode. The new ensX interface added at runtime was verified to have the bpf_network program loaded onto it.

ens4(27) clsact/ingress bpf_network.o:[from-network] id 2576

@joamaki joamaki marked this pull request as ready for review May 29, 2023 14:22
@joamaki joamaki requested a review from a team as a code owner May 29, 2023 14:22
@joamaki joamaki requested a review from rgo3 May 29, 2023 14:22
@pchaigno pchaigno added area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. area/eni Impacts ENI based IPAM. labels May 29, 2023
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this is fairly independent from the rest of the codebase. Much easier to backport indeed!

I'm wondering if we could avoid having any wait time, but other than that looks good to me.

// settleDuration is the amount of time to wait for further link updates
// before proceeding with reinitialization. This avoids reinitializing
// twice when seeing NEWLINK followed by SETLINK.
const settleDuration = 5 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a fairly long time. Couldn't we instead filter out events we don't want to see (maybe u.Header.Type == unix.RTM_NEWLINK)?

Copy link
Contributor Author

@joamaki joamaki May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is also triggering compilation of the bpf_network program I wanted to err on the side of caution and not end up doing repeated work in a tight loop if the system has some weird flapping interfaces. It's a pretty rare occasion that new interfaces are added. We do need some delay to "coalesce" related updates.

Did also think about just reacting to NEWLINK, but then thought it'd be safer to also handle SETLINK (wasn't 100% sure if BPF prog loading requires interface to be up).

Doing just 1 second of "settling" would probably be fine so can drop it down to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug was initially reported to us as a race on agent startup, where the CiliumInternalIP is allocated to a new ENI interface, but we haven't attached our BPF program to it. We also have indications that it happens on node creation, node reboot, and on Cilium upgrades. If at any time we have CiliumInternalIP allocated from an interface where we didn't attach the BPF program, then that leads to IPsec ingress drops.

That's why I'd really like to get this to a minimum. We can't get to 0 because the loading of the BPF program itself takes time, but I think we can avoid waiting. If we are worried about the CPU utilization, then we could easily update reinitializeIPSec to only reload BPF programs when new interfaces actually showed up. If we can filter on NEWLINK, we also don't need to worry about flapping interfaces, right?

If we want to make this even faster, we could also avoid recompiling bpf_network.c every time we call reinitializeIPSec, but that's going to be more work.

Did also think about just reacting to NEWLINK, but then thought it'd be safer to also handle SETLINK (wasn't 100% sure if BPF prog loading requires interface to be up).

That I think we can easily confirm, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this now so that the delay/coalescing is after the first reinitialization so there won't be any delay. I've also moved the call to recompile bpf_network.c into Reinitialize so it won't be performed every time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also moved the call to recompile bpf_network.c into Reinitialize so it won't be performed every time.

I hadn't considered that. Nice!

Changed this now so that the delay/coalescing is after the first reinitialization so there won't be any delay.

Did you check that loading the BPF program right after the NEWLINK and without waiting for SETLINK works fine? I'm guessing if we don't see any errors/warnings when testing this code, it means it's okay, no?

pkg/datapath/loader/netlink.go Outdated Show resolved Hide resolved
@pchaigno
Copy link
Member

Fix bug affecting EKS installations with IPsec encryption enabled, where Cilium wouldn't attach its IPsec BPF program to new ENI interfaces, resulting in connectivity loss between pods on remote nodes.

Regarding above release note, small nit: this is not specific to EKS and can happen on EC2 as well, as long as you're using ENI mode.

@joamaki joamaki force-pushed the pr/joamaki/fix-attach-ipsec-to-eni branch from eee3171 to 8240411 Compare May 30, 2023 08:02
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this! Looks good to me overall! Left some minor feedback

pkg/datapath/loader/netlink.go Show resolved Hide resolved
pkg/datapath/loader/netlink.go Show resolved Hide resolved
pkg/datapath/loader/netlink.go Show resolved Hide resolved
reinitializeIPSec only runs the interface detection if EncryptInterface
is empty. Since it sets it after detecting interfaces, it will only run
the detection once.

Let's change that to run the detection even if the EncryptInterface list
isn't empty. That will allow us to rerun the detection when new ENI
devices are added on EKS.

One consequence of this change is that we will now attach to all
interfaces even if the user configured --encrypt-interface. That is fine
because --encrypt-interface shouldn't actually be used in ENI mode. In
ENI mode, we want to attach to all interfaces as we don't have a
guarantee on which interface the IPsec traffic will come in.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/fix-attach-ipsec-to-eni branch from 8240411 to ad0ee73 Compare May 31, 2023 07:59
If IPsec is enabled along with the ENI IPAM mode we
need to load the bpf_network program onto new ENI devices
when they're added at runtime.

To fix this, we subscribe to netlink link updates to detect when
new (non-veth) devices are added and reinitialize IPsec to load the BPF
program onto the devices. The compilation of the bpf_netowrk program
has been moved to Reinitialize() to avoid spurious recompilation on
reinitialize.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/fix-attach-ipsec-to-eni branch from ad0ee73 to 66f4383 Compare May 31, 2023 08:29
@joamaki joamaki requested review from pchaigno and gandro May 31, 2023 09:25
@pchaigno pchaigno added backport-pending/1.12 backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.11 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.10 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.10 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.11 in 1.11.18 Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.11 in 1.11.18 Jun 5, 2023
@pchaigno pchaigno added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.12 backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Jun 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.10 Jun 9, 2023
@qmonnet qmonnet added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jun 14, 2023
@qmonnet qmonnet moved this from Backport pending to v1.11 to Backport done to v1.11 in 1.11.18 Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. area/eni Impacts ENI based IPAM. backport/author The backport will be carried out by the author of the PR. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.11 This issue will prevent the release of the next version of Cilium. release-blocker/1.12 This issue will prevent the release of the next version of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.11.18
Backport done to v1.11
1.12.10
Backport done to v1.12
1.13.3
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet

6 participants