Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix upgrade for IPsec with tunneling #26708

Merged
merged 4 commits into from Jul 12, 2023

Conversation

pchaigno
Copy link
Member

@pchaigno pchaigno commented Jul 7, 2023

Fix various packet drops impacting upgrades when running with IPsec and overlay mode. See commits for details.

Fix bug that caused transient IPsec packet drops on upgrades when tunneling is enabled. 

@pchaigno pchaigno added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. upgrade-impact This PR has potential upgrade or downgrade impact. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. needs-backport/1.11 backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Jul 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.0 Jul 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.11.19 Jul 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.12 Jul 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.5 Jul 7, 2023
@pchaigno pchaigno force-pushed the fix-upgrade-ipsec-with-overlay branch 2 times, most recently from 0a4d608 to 24ae908 Compare July 7, 2023 17:53
Since commit 3b3e8d0 ("node: Don't encrypt traffic to
CiliumInternalIP"), traffic going to a CiliumInternalIP is not
encrypted. In practice, that means we don't have an SPI (IPsec key
reference) for the ipcache entries corresponding to CiliumInternalIPs.

This commit therefore updates the IPsec BPF unit test to remove
that SPI from the setup logic.

Fixes: dee0683 ("bpf: test IPSec datapath on from-host")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
On IPAM modes with one pod CIDR per node, the XFRM OUT policies look like below:

    src 10.0.1.0/24 dst 10.0.0.0/24
        dir out priority 0 ptype main
        mark 0x66d11e00/0xffffff00
        tmpl src 10.0.1.13 dst 10.0.0.214
            proto esp spi 0x00000001 reqid 1 mode tunnel

When sending traffic from the hostns, however, it may not match the
source CIDR above. Traffic from the hostns may indeed leave the node
with the NodeInternalIP as the source IP (vs. CiliumInternalIP which
would match).

In such cases, we don't match the XFRM OUT policy and fall back to the
catch-all default-drop rule, ending up with XfrmOutPolBlock packet
drops.

Why wasn't this an issue before? It was. Traffic would simply go in
plain-text (which is okay given we never intended to encrypt hostns
traffic in the first place). What changes is that we now have a
catch-all default-drop XFRM OUT policy to avoid leaking plain-text
traffic. So it now results in XfrmOutPolBlock errors.

In commit 5fe2b2d ("bpf: Don't encrypt on path hostns -> remote pod")
we removed encryption for the path hostns -> remote pod. Unfortunately,
that doesn't mean the issue is completely gone. On a new Cilium install,
we won't see this issue of XfrmOutPolBlock drops for hostns traffic
anymore. But on existing clusters, we will still see those drops during
the upgrade, after the default-drop rule is installed but before hostns
traffic encryption is removed.

None of this is an issue on AKS and ENI IPAM modes because there, the
XFRM OUT policies look like:

src 0.0.0.0/0 dst 10.0.0.0/16
    dir out priority 0 ptype main
    mark 0x66d11e00/0xffffff00
    tmpl src 10.0.1.13 dst 10.0.0.214
        proto esp spi 0x00000001 reqid 1 mode tunnel

Thus, hostns -> remote pod traffic is matched regardless of the source
IP being selected and packets are not dropped by the default-drop rule.

We can therefore avoid the upgrade drops by changing the XFRM OUT
policies to never match on the source IPs, as on AKS and ENI IPAM modes.

Fixes: 7d44f37 ("ipsec: Catch-default default drop policy for encryption")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Commits 3b3e8d0 ("node: Don't encrypt traffic to CiliumInternalIP")
and 5fe2b2d ("bpf: Don't encrypt on path hostns -> remote pod")
removed a path asymmetry on the paths hostns <> remote pod. They however
failed to remove workarounds that we have for this path asymmetry.

In particular, we would encrypt packets on the path pod -> remote node
(set SPI in the node manager) to try and avoid the path asymmetry, by
also encrypting the replies. And, as a result of this first change, we
would also need to handle a corner case in the datapath to appluy the
correct reverse DNAT for reply traffic.

All of that is unnecessary now that we fixed the path asymmetry.

Fixes: 3b3e8d0 ("node: Don't encrypt traffic to CiliumInternalIP")
Fixes: 5fe2b2d ("bpf: Don't encrypt on path hostns -> remote pod")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno pchaigno force-pushed the fix-upgrade-ipsec-with-overlay branch from 24ae908 to 09bd8c0 Compare July 7, 2023 19:03
Commits 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") and 19a62da
("bpf: Lookup tunnel endpoint for IPsec rewrite") changed the way we
pass the tunnel endpoint. We used to pass it via skb->cb[4] and read it
in bpf_host to build the encapsulation. We changed that in the above
commits to pass the identity via skb->cb[4] instead.

Therefore, on upgrades, for a short while, we may end up with bpf_lxc
writing the identity into skb->cb[4] (new datapath version) and
bpf_host interpreting it as the tunnel endpoint (old version).

Reloading bpf_host before bpf_lxc is not enough to fix it because then,
for a short while, bpf_lxc would write the tunnel endpoint in skb->cb[4]
(old version) and bpf_host would interpret it as the security identity
(new version).

In addition to reloading bpf_host first, we also need to make sure that
it can handle both cases (skb->cb[4] has tunnel endpoint or identity).
To distinguish between those two cases and interpret skb->cb[4]
correctly, bpf_host will rely on the first byte: in the case of the
tunnel endpoint is can't be zero because that would mean the IP address
is within the special purpose range 0.0.0.0/8; in the case of the
identity, it has to be zero because identities are on 24-bits.

This commit implements those changes. Commit ca9c056 ("daemon:
Reload bpf_host first in case of IPsec upgrade") had already made the
agent reload bpf_host first for ENI and Azure IPAM modes, so we just
need to extend it to all IPAM modes.

Note that the above bug on upgrades doesn't cause an immediate packet
drop at the sender. Instead, it seems the packet is encrypted twice. The
(unverified) assumption here is that the encapsulation is skipped
because the tunnel endpoint IP address is invalid (being a security
identity on 24-bits). The encrypted packet is then sent again to
cilium_host where the encryption bit is reapplied (given the destination
IP address is a CiliumInternalIP). And it goes through the XFRM
encryption again.

On the receiver's side, the packet is decrypted once as expected. It is
then recirculated to bpf_overlay which removes the packet mark. Given it
is still an ESP (encrypted) packet, it goes back through the XFRM
decryption layer. But since the packet mark is now zero, it fails to
match any XFRM IN state. The packet is dropped with XfrmInNoStates. This
can be seen in the following trace:

    <- overlay encrypted  flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32
    -> stack encrypted  flow 0x6fc46fc4 , identity 134400->44 state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32
    <- overlay encrypted  flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32
    -> host from flow 0x6fc46fc4 , identity unknown->43 state unknown ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32
    -> stack flow 0x6fc46fc4 , identity unknown->unknown state unknown ifindex cilium_host orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32

The packet comes from the overlay encrypted, is sent to the stack to be
decrypted, and comes back still encrypted.

Fixes: 4c7cce1 ("bpf: Remove IP_POOLS IPsec code")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno pchaigno force-pushed the fix-upgrade-ipsec-with-overlay branch 3 times, most recently from e5a56ae to 2895ea3 Compare July 7, 2023 20:07
@pchaigno
Copy link
Member Author

pchaigno commented Jul 7, 2023

/test

@pchaigno pchaigno marked this pull request as ready for review July 7, 2023 23:03
@pchaigno pchaigno requested review from a team as code owners July 7, 2023 23:03
@pchaigno pchaigno added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jul 17, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.5 Jul 17, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Needs backport from main in 1.12.12 Jul 17, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport pending to v1.12 in 1.12.12 Jul 17, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.11 in 1.11.19 Jul 17, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.11 in 1.11.19 Jul 17, 2023
@pchaigno pchaigno removed the backport/author The backport will be carried out by the author of the PR. label Jul 18, 2023
@gandro gandro mentioned this pull request Jul 19, 2023
6 tasks
@gandro gandro added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Jul 19, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.0 Jul 19, 2023
@gandro gandro added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Jul 19, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.0 Jul 19, 2023
@julianwiedmann julianwiedmann added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Jul 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.12 Jul 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.11 to Backport done to v1.11 in 1.11.19 Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
No open projects
1.11.19
Backport done to v1.11
1.12.12
Backport done to v1.12
1.13.5
Backport done to v1.13
1.14.0
Backport done to v1.14
Development

Successfully merging this pull request may close these issues.

None yet

6 participants