Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.15: IPsec Fixes #31610

Merged
merged 19 commits into from Mar 26, 2024
Merged

v1.15: IPsec Fixes #31610

merged 19 commits into from Mar 26, 2024

Conversation

pchaigno
Copy link
Member

No description provided.

pchaigno and others added 19 commits March 26, 2024 12:10
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This small bit of refactoring will ease a subsequent commit.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
We need to ensure the (key used, sequence number) tuple for each
encrypted packet is always unique on the network. Today that's not the
case because the key is the same for all nodes and the sequence number
starts at 0 on node reboot.

To enable this, we will derive one key per node pair from a global key
shared across all nodes. We need it per node pair and not per node
because the first message emitted from A to B shouldn't be using the
same key as the first message emitted from B to A, to satisfy the above
requirement.

To that end, for each node pair (A, B), we compute a key as follows:

    key = sha256(global_key + ip_of_a + ip_of_b)

The sha256 sum is then truncated to the expected length.

Once computed, we install the derived keys such that the key used for
encryption on node A is the same as the key used for decryption on node
B:

    Node A               <> Node B
    XFRM IN:  key(b+a)      XFRM IN:  key(a+b)
    XFRM OUT: key(a+b)      XFRM OUT: key(b+a)

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
In the previous commit, we changed the way we compute the IPsec keys. We
therefore need to replace the XFRM states to use the new keys.

Our current update logic however doesn't take this case into account. It
compares states based on IPs, marks, and SPIs, but it doesn't compare
the keys. It would therefore assume that the correct states are already
installed.

This commit extends that logic to detect a difference in encryption keys
and, if such a difference exist, remove the old states.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
It turns out that two XFRM IN states can't have the same mark and
destination IP, even if they have different source IPs. That's an issue
in our case because each node1->node2 pair will have its own IPsec key.
Therefore, we need one XFRM state per origin node on input.

Since we can't differentiate those XFRM states by their source
IPs, we will have to differentiate using the marks. To do so, we need to
convert the source IP into a packet mark before matching against XFRM
states. We can write these packet marks in bpf_network, before going up
the stack for decryption. And conveniently, we've just introduce a way
to convert each cluster node into an ID, the node ID, which fits in the
packet mark.

This commit therefore performs an node ID map lookup to retrieve the
node ID using the outer source IP address when packets are first
processed in bpf_network.

We clear the node ID from the packet mark after decryption using XFRM
(output-mark).

If no node ID is found for the outer source IP, we drop the packet.
It seems preferable to drop it from BPF with all the contextual
information rather than let it proceed to the XFRM layer where it will
be dropped with only an error code incrementing.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
We want to have one IPsec key per node1->node2 (not including
node2->node1 which will get a different key). We therefore need per-node
XFRM states on the receive/decrypt side to carry each node's key.

This commit implements that change. Thus, instead of creating a unique
XFRM IN state when we receive the local node, we will create an XFRM IN
state everytime we receive a remote node.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This commit extends the logic from commit c0d9b8c ("ipsec: Allow old
and new XFRM OUT states to coexist for upgrade") to have both the old
and the new XFRM IN states in place. This is necessary to avoid packet
drops during the upgrade.

As with the XFRM OUT states, we can't add the new IN state while the old
one is in place. We therefore need to first remove the old state, to
then add the new one. See c0d9b8c ("ipsec: Allow old and new XFRM OUT
states to coexist for upgrade") for details.

Note this commit also removes the comparison of output-marks.
Output-marks aren't actually used by the kernel to decide if two states
conflict. And in the case of XFRM IN states, the output-marks changed a
bit as well. Despite being different, the states still conflict.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Read and export the local bootid via CiliumNode. We'll need it in a
subsequent commit to generate new IPsec keys when a node reboots.
This commit also collects the boot_id file as part of the bugtool
report.

Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com>
We need to ensure we never have two packets encrypted with the same key
and sequence number. To that end, in previous commits, we introduced
per-node-pair keys. That is however not sufficient. Since the sequence
numbers can start from zero on node boot, if a node reboot, it will
start sending traffic encrypted again with the same key and sequence
number as it did before.

To fix that, we need to ensure the per-node-pair keys change on node
reboots. We achieve that by using the boot ID in the per-node-pair key
calculation.

For a pair of nodes A and B with IP addresses a and b and boot IDs x
and y, we will therefore install two different keys:

    Node A               <> Node B
    XFRM IN:  key(b+a+y+x)  XFRM IN:  key(a+b+x+y)
    XFRM OUT: key(a+b+x+y)  XFRM OUT: key(b+a+y+x)

This is done such that, for each pair of nodes A, B, the key used for
decryption on A (XFRM IN) is the same key used for encryption on B (XFRM
OUT), and vice versa.

Since we are now retrieving the local node's boot ID as part of the
IPsec code, we need to initialize the mocked local node store in the
unit tests.

Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
When we detect that a node's bootid has changed, we need to update the
IPsec states.

Unfortunately this is not as straightforward as it should be, because we
may receive the new boot ID before a CiliumInternalIP is assign to the
node. In such a case, we can't install the XFRM states yet because we
don't have the CiliumInternalIP, but we need to remember that the boot
ID changed and states should be replaced.

We therefore record that information in a map, ipsecUpdateNeeded, which
is later read to see if the boot ID changed.

Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
When a node reboots the key used to communicate with it is expected to
change due to the new boot id generated. While the new key is being
installed we may need to do it non-atomically (delete + insert), so
packets to/from that node might be dropped which would cause increases
in the XfrmNoStatesIn/Out. Add a note about it in the docs so users are
not surprised.

Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Now we can enable ESN anti-replay with window size of 1024. If a node
reboots then everyone updates the related keys with the new one due to
the different bootid, the node itself is already generating the keys
with the new bootid. The window is used to allow for out-of-order
packets, anti-replay still doesn't allow to replay any packet but keeps
a bitmap and can accept out-of-order packets within window size range.
For more information check section ""A2. Anti-Replay Window" of RFC 4303.

Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
The ESN bit in the IPsec secret will be used to indicate whether
per-node-pair keys should be used or if the global key should remain in
use. Specifically, it consist in a '+' sign after the SPI number in the
secret.

This ESN bit will be used to transition from a global key system to a
per-node-pair system at runtime. We would typically rely on an agent
flag for such a configuration. However, in this case, we need to perform
a key rotation at the same time as we change the key system. Encoding
the key system in the IPsec secret achieves that.

By transition from the global to the per-node-pair keys via a key
rotation, we ensure that the two can coexist during the transition. The
old, global key will have XFRM rules with SPI n, whereas the new,
per-node-pair keys will have XFRM rules with SPI n+1.

Using a bit in the IPsec secret is also easier to test because we
already have all the logic to test key rotation (whereas we would need
new logic to test a flag change).

The users therefore need to perform a key rotation from e.g.:

    3 rfc4106(gcm(aes)) [...] 128

to:

    4+ rfc4106(gcm(aes)) [...] 128

The key rotation test in CI is updated to cover a rotation from 3 to 4+
(meaning a rotation into the new per-node-pair key system).

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
The IPsec fixes will introduce a few XfrmInNoStates packet drops on
up/downgrades due to non-atomic Linux APIs (can't replace XFRM states
atomically). Those are limited to a very short time (time between two
netlink syscalls).

We however need to allowlist them in the CI. Since we're using the
conn-disrupt GitHub action from main, we need to allowlist in main for
the pull request's CI to pass.

Note that despite the expected-xfrm-errors flag, the tests will still
fail if we get 10 or more such drops. We don't expect so many
XfrmInNoStates drops so we still want to fail in that case.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
[ backporter's notes: couple minor conflicts due to a switch from simple
  to double quote for parameters and the addition of "-c cilium-agent"
  in the kubectl exec commands. ]

Since commit 4cf468b ("ipsec: Control use of per-node-pair keys from
secret bit"), IPsec key rotations can be used to switch from the
single-key system to the per-tunnel key system (also referred to as
per-node-pair key system). Our key rotation test in CI was updated to
cover such a switch.

This commit extends it to also cover traditional key rotations, with
both the new and old key systems. The switch back into a single-key
system is also covered.

These special key rotations are controlled with a single + sign. Adding
it after the SPI in the IPsec Kubernetes secret is enough to switch to a
per-tunnel key system. We thus simply need to cover all 4 cases of
having or not having the + sign in the old and new secrets.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
A node update that doesn't contain a BootID will cause the creation
of non-matching XFRM IN and OUT states across the cluster as the
BootID is used to generate per-node key pairs. Non-matching XFRM
states will result in XfrmInStateProtoError, causing packet drops.
An empty BootID should thus be treated as an error, and Cilium
should not attempt to derive per-node keys from it.

Signed-off-by: Robin Gögge <r.goegge@gmail.com>
When adding the BootID field to the CiliumNode CRD, we forgot to bump
the version, which is an issue when after an cilium upgrade the
operator tries to update the CiliumNode objects to include the BootID
field.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
This commit ensures that

- each time we compute a per-node-pair-key we create an empty slice with
  the correct length first, and then append all the input data instead
  of appending to one of the input slices (`globalKey`) directly.
- the IPs that are used as arguments in `computeNodeIPsecKey` are
  canonical, meaning IPv4 IPs consist of 4 bytes and IPv6 IPs consist of
  16 bytes.

This is necessary to always have the same inputs on all nodes when
computing the per-node-pair-key. Without this IPs might not match on the
byte level, e.g on one node the input is a v6 mapped v4 address (IPv4
address in 16 bytes) and on the other it isn't when used as input to the
hash function. This will generate non-matching keys.

Co-authored-by: Zhichuan Liang <gray.liang@isovalent.com>
Signed-off-by: Robin Gögge <r.goegge@gmail.com>
We have very little logging of the boot IDs. Really fixing that will
require a bit of work to not be too verbose, but in the meantime, we
should at least log the local boot ID.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno pchaigno added release-note/bug This PR fixes an issue in a previous release of Cilium. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. feature/ipsec Relates to Cilium's IPsec feature labels Mar 26, 2024
@pchaigno pchaigno requested review from a team as code owners March 26, 2024 13:50
@pchaigno pchaigno requested a review from brlbil March 26, 2024 13:50
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Mar 26, 2024
@pchaigno pchaigno removed the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 26, 2024
@aanm aanm merged commit d3eb6ed into v1.15 Mar 26, 2024
37 checks passed
@aanm aanm deleted the pr/pchaigno/v1.15/rm-543 branch March 26, 2024 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. feature/ipsec Relates to Cilium's IPsec feature kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants