v1.15: IPsec Fixes #31610

pchaigno · 2024-03-26T13:50:46Z

No description provided.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

This small bit of refactoring will ease a subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

We need to ensure the (key used, sequence number) tuple for each encrypted packet is always unique on the network. Today that's not the case because the key is the same for all nodes and the sequence number starts at 0 on node reboot. To enable this, we will derive one key per node pair from a global key shared across all nodes. We need it per node pair and not per node because the first message emitted from A to B shouldn't be using the same key as the first message emitted from B to A, to satisfy the above requirement. To that end, for each node pair (A, B), we compute a key as follows: key = sha256(global_key + ip_of_a + ip_of_b) The sha256 sum is then truncated to the expected length. Once computed, we install the derived keys such that the key used for encryption on node A is the same as the key used for decryption on node B: Node A <> Node B XFRM IN: key(b+a) XFRM IN: key(a+b) XFRM OUT: key(a+b) XFRM OUT: key(b+a) Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

In the previous commit, we changed the way we compute the IPsec keys. We therefore need to replace the XFRM states to use the new keys. Our current update logic however doesn't take this case into account. It compares states based on IPs, marks, and SPIs, but it doesn't compare the keys. It would therefore assume that the correct states are already installed. This commit extends that logic to detect a difference in encryption keys and, if such a difference exist, remove the old states. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

It turns out that two XFRM IN states can't have the same mark and destination IP, even if they have different source IPs. That's an issue in our case because each node1->node2 pair will have its own IPsec key. Therefore, we need one XFRM state per origin node on input. Since we can't differentiate those XFRM states by their source IPs, we will have to differentiate using the marks. To do so, we need to convert the source IP into a packet mark before matching against XFRM states. We can write these packet marks in bpf_network, before going up the stack for decryption. And conveniently, we've just introduce a way to convert each cluster node into an ID, the node ID, which fits in the packet mark. This commit therefore performs an node ID map lookup to retrieve the node ID using the outer source IP address when packets are first processed in bpf_network. We clear the node ID from the packet mark after decryption using XFRM (output-mark). If no node ID is found for the outer source IP, we drop the packet. It seems preferable to drop it from BPF with all the contextual information rather than let it proceed to the XFRM layer where it will be dropped with only an error code incrementing. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

We want to have one IPsec key per node1->node2 (not including node2->node1 which will get a different key). We therefore need per-node XFRM states on the receive/decrypt side to carry each node's key. This commit implements that change. Thus, instead of creating a unique XFRM IN state when we receive the local node, we will create an XFRM IN state everytime we receive a remote node. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

This commit extends the logic from commit c0d9b8c ("ipsec: Allow old and new XFRM OUT states to coexist for upgrade") to have both the old and the new XFRM IN states in place. This is necessary to avoid packet drops during the upgrade. As with the XFRM OUT states, we can't add the new IN state while the old one is in place. We therefore need to first remove the old state, to then add the new one. See c0d9b8c ("ipsec: Allow old and new XFRM OUT states to coexist for upgrade") for details. Note this commit also removes the comparison of output-marks. Output-marks aren't actually used by the kernel to decide if two states conflict. And in the case of XFRM IN states, the output-marks changed a bit as well. Despite being different, the states still conflict. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

Read and export the local bootid via CiliumNode. We'll need it in a subsequent commit to generate new IPsec keys when a node reboots. This commit also collects the boot_id file as part of the bugtool report. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com>

We need to ensure we never have two packets encrypted with the same key and sequence number. To that end, in previous commits, we introduced per-node-pair keys. That is however not sufficient. Since the sequence numbers can start from zero on node boot, if a node reboot, it will start sending traffic encrypted again with the same key and sequence number as it did before. To fix that, we need to ensure the per-node-pair keys change on node reboots. We achieve that by using the boot ID in the per-node-pair key calculation. For a pair of nodes A and B with IP addresses a and b and boot IDs x and y, we will therefore install two different keys: Node A <> Node B XFRM IN: key(b+a+y+x) XFRM IN: key(a+b+x+y) XFRM OUT: key(a+b+x+y) XFRM OUT: key(b+a+y+x) This is done such that, for each pair of nodes A, B, the key used for decryption on A (XFRM IN) is the same key used for encryption on B (XFRM OUT), and vice versa. Since we are now retrieving the local node's boot ID as part of the IPsec code, we need to initialize the mocked local node store in the unit tests. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

When we detect that a node's bootid has changed, we need to update the IPsec states. Unfortunately this is not as straightforward as it should be, because we may receive the new boot ID before a CiliumInternalIP is assign to the node. In such a case, we can't install the XFRM states yet because we don't have the CiliumInternalIP, but we need to remember that the boot ID changed and states should be replaced. We therefore record that information in a map, ipsecUpdateNeeded, which is later read to see if the boot ID changed. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

When a node reboots the key used to communicate with it is expected to change due to the new boot id generated. While the new key is being installed we may need to do it non-atomically (delete + insert), so packets to/from that node might be dropped which would cause increases in the XfrmNoStatesIn/Out. Add a note about it in the docs so users are not surprised. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

Now we can enable ESN anti-replay with window size of 1024. If a node reboots then everyone updates the related keys with the new one due to the different bootid, the node itself is already generating the keys with the new bootid. The window is used to allow for out-of-order packets, anti-replay still doesn't allow to replay any packet but keeps a bitmap and can accept out-of-order packets within window size range. For more information check section ""A2. Anti-Replay Window" of RFC 4303. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

The ESN bit in the IPsec secret will be used to indicate whether per-node-pair keys should be used or if the global key should remain in use. Specifically, it consist in a '+' sign after the SPI number in the secret. This ESN bit will be used to transition from a global key system to a per-node-pair system at runtime. We would typically rely on an agent flag for such a configuration. However, in this case, we need to perform a key rotation at the same time as we change the key system. Encoding the key system in the IPsec secret achieves that. By transition from the global to the per-node-pair keys via a key rotation, we ensure that the two can coexist during the transition. The old, global key will have XFRM rules with SPI n, whereas the new, per-node-pair keys will have XFRM rules with SPI n+1. Using a bit in the IPsec secret is also easier to test because we already have all the logic to test key rotation (whereas we would need new logic to test a flag change). The users therefore need to perform a key rotation from e.g.: 3 rfc4106(gcm(aes)) [...] 128 to: 4+ rfc4106(gcm(aes)) [...] 128 The key rotation test in CI is updated to cover a rotation from 3 to 4+ (meaning a rotation into the new per-node-pair key system). Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

The IPsec fixes will introduce a few XfrmInNoStates packet drops on up/downgrades due to non-atomic Linux APIs (can't replace XFRM states atomically). Those are limited to a very short time (time between two netlink syscalls). We however need to allowlist them in the CI. Since we're using the conn-disrupt GitHub action from main, we need to allowlist in main for the pull request's CI to pass. Note that despite the expected-xfrm-errors flag, the tests will still fail if we get 10 or more such drops. We don't expect so many XfrmInNoStates drops so we still want to fail in that case. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

[ backporter's notes: couple minor conflicts due to a switch from simple to double quote for parameters and the addition of "-c cilium-agent" in the kubectl exec commands. ] Since commit 4cf468b ("ipsec: Control use of per-node-pair keys from secret bit"), IPsec key rotations can be used to switch from the single-key system to the per-tunnel key system (also referred to as per-node-pair key system). Our key rotation test in CI was updated to cover such a switch. This commit extends it to also cover traditional key rotations, with both the new and old key systems. The switch back into a single-key system is also covered. These special key rotations are controlled with a single + sign. Adding it after the SPI in the IPsec Kubernetes secret is enough to switch to a per-tunnel key system. We thus simply need to cover all 4 cases of having or not having the + sign in the old and new secrets. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

A node update that doesn't contain a BootID will cause the creation of non-matching XFRM IN and OUT states across the cluster as the BootID is used to generate per-node key pairs. Non-matching XFRM states will result in XfrmInStateProtoError, causing packet drops. An empty BootID should thus be treated as an error, and Cilium should not attempt to derive per-node keys from it. Signed-off-by: Robin Gögge <r.goegge@gmail.com>

When adding the BootID field to the CiliumNode CRD, we forgot to bump the version, which is an issue when after an cilium upgrade the operator tries to update the CiliumNode objects to include the BootID field. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

This commit ensures that - each time we compute a per-node-pair-key we create an empty slice with the correct length first, and then append all the input data instead of appending to one of the input slices (`globalKey`) directly. - the IPs that are used as arguments in `computeNodeIPsecKey` are canonical, meaning IPv4 IPs consist of 4 bytes and IPv6 IPs consist of 16 bytes. This is necessary to always have the same inputs on all nodes when computing the per-node-pair-key. Without this IPs might not match on the byte level, e.g on one node the input is a v6 mapped v4 address (IPv4 address in 16 bytes) and on the other it isn't when used as input to the hash function. This will generate non-matching keys. Co-authored-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Robin Gögge <r.goegge@gmail.com>

We have very little logging of the boot IDs. Really fixing that will require a bit of work to not be too verbose, but in the meantime, we should at least log the local boot ID. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

pchaigno and others added 19 commits March 26, 2024 12:10

ipsec: New IPsec secret bit to indicate per-node-pair keys

2ae2095

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

ipsec: Move enableIPsecIPv{4,6} preconditions to caller

e815c0c

This small bit of refactoring will ease a subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

node: Log local boot ID

9a5ad8a

We have very little logging of the boot IDs. Really fixing that will require a bit of work to not be too verbose, but in the meantime, we should at least log the local boot ID. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

pchaigno added release-note/bug This PR fixes an issue in a previous release of Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. feature/ipsec Relates to Cilium's IPsec feature labels Mar 26, 2024

pchaigno requested review from a team as code owners March 26, 2024 13:50

pchaigno requested a review from brlbil March 26, 2024 13:50

maintainer-s-little-helper bot added backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Mar 26, 2024

pchaigno requested review from aanm and julianwiedmann March 26, 2024 13:51

pchaigno removed the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 26, 2024

aanm merged commit d3eb6ed into v1.15 Mar 26, 2024
37 checks passed

aanm deleted the pr/pchaigno/v1.15/rm-543 branch March 26, 2024 13:54

jrajahalme mentioned this pull request Mar 26, 2024

Prepare for release v1.15.3 #31621

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.15: IPsec Fixes #31610

v1.15: IPsec Fixes #31610

pchaigno commented Mar 26, 2024

v1.15: IPsec Fixes #31610

v1.15: IPsec Fixes #31610

Conversation

pchaigno commented Mar 26, 2024