New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipsec: Reinitialize IPsec for new devices in ENI mode #25744
ipsec: Reinitialize IPsec for new devices in ENI mode #25744
Conversation
d4bde38
to
eee3171
Compare
/test |
This fix was tested to be working on a EKS cluster with IPsec and ENI mode. The new
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how this is fairly independent from the rest of the codebase. Much easier to backport indeed!
I'm wondering if we could avoid having any wait time, but other than that looks good to me.
pkg/datapath/loader/netlink.go
Outdated
// settleDuration is the amount of time to wait for further link updates | ||
// before proceeding with reinitialization. This avoids reinitializing | ||
// twice when seeing NEWLINK followed by SETLINK. | ||
const settleDuration = 5 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a fairly long time. Couldn't we instead filter out events we don't want to see (maybe u.Header.Type == unix.RTM_NEWLINK
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is also triggering compilation of the bpf_network program I wanted to err on the side of caution and not end up doing repeated work in a tight loop if the system has some weird flapping interfaces. It's a pretty rare occasion that new interfaces are added. We do need some delay to "coalesce" related updates.
Did also think about just reacting to NEWLINK, but then thought it'd be safer to also handle SETLINK (wasn't 100% sure if BPF prog loading requires interface to be up).
Doing just 1 second of "settling" would probably be fine so can drop it down to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug was initially reported to us as a race on agent startup, where the CiliumInternalIP is allocated to a new ENI interface, but we haven't attached our BPF program to it. We also have indications that it happens on node creation, node reboot, and on Cilium upgrades. If at any time we have CiliumInternalIP allocated from an interface where we didn't attach the BPF program, then that leads to IPsec ingress drops.
That's why I'd really like to get this to a minimum. We can't get to 0 because the loading of the BPF program itself takes time, but I think we can avoid waiting. If we are worried about the CPU utilization, then we could easily update reinitializeIPSec
to only reload BPF programs when new interfaces actually showed up. If we can filter on NEWLINK, we also don't need to worry about flapping interfaces, right?
If we want to make this even faster, we could also avoid recompiling bpf_network.c
every time we call reinitializeIPSec
, but that's going to be more work.
Did also think about just reacting to NEWLINK, but then thought it'd be safer to also handle SETLINK (wasn't 100% sure if BPF prog loading requires interface to be up).
That I think we can easily confirm, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this now so that the delay/coalescing is after the first reinitialization so there won't be any delay. I've also moved the call to recompile bpf_network.c
into Reinitialize
so it won't be performed every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also moved the call to recompile bpf_network.c into Reinitialize so it won't be performed every time.
I hadn't considered that. Nice!
Changed this now so that the delay/coalescing is after the first reinitialization so there won't be any delay.
Did you check that loading the BPF program right after the NEWLINK and without waiting for SETLINK works fine? I'm guessing if we don't see any errors/warnings when testing this code, it means it's okay, no?
Regarding above release note, small nit: this is not specific to EKS and can happen on EC2 as well, as long as you're using ENI mode. |
eee3171
to
8240411
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this! Looks good to me overall! Left some minor feedback
reinitializeIPSec only runs the interface detection if EncryptInterface is empty. Since it sets it after detecting interfaces, it will only run the detection once. Let's change that to run the detection even if the EncryptInterface list isn't empty. That will allow us to rerun the detection when new ENI devices are added on EKS. One consequence of this change is that we will now attach to all interfaces even if the user configured --encrypt-interface. That is fine because --encrypt-interface shouldn't actually be used in ENI mode. In ENI mode, we want to attach to all interfaces as we don't have a guarantee on which interface the IPsec traffic will come in. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
8240411
to
ad0ee73
Compare
If IPsec is enabled along with the ENI IPAM mode we need to load the bpf_network program onto new ENI devices when they're added at runtime. To fix this, we subscribe to netlink link updates to detect when new (non-veth) devices are added and reinitialize IPsec to load the BPF program onto the devices. The compilation of the bpf_netowrk program has been moved to Reinitialize() to avoid spurious recompilation on reinitialize. Signed-off-by: Jussi Maki <jussi@isovalent.com>
ad0ee73
to
66f4383
Compare
In ENI mode new network devices are added as more IPs are requested. For IPsec support we
require the bpf_network program to be loaded on any device that receives the IPsec traffic
to mark the packets for decryption.
Since Cilium uses the IP of the
cilium_host
device for the IPsec traffic the ENI device hostingthe IP requires bpf_network to be loaded onto it and since the
cilium_host
IP can change, thisPR watches for new network devices and reinitializes IPsec to make sure the
bpf_network
programis loaded onto all used non-virtual devices.