New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14 Backports 2023-07-19 #26914
Merged
Merged
v1.14 Backports 2023-07-19 #26914
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gandro
added
kind/backports
This PR provides functionality previously merged into master.
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
labels
Jul 19, 2023
[ upstream commit e5b2da5 ] Since commit 3b3e8d0 ("node: Don't encrypt traffic to CiliumInternalIP"), traffic going to a CiliumInternalIP is not encrypted. In practice, that means we don't have an SPI (IPsec key reference) for the ipcache entries corresponding to CiliumInternalIPs. This commit therefore updates the IPsec BPF unit test to remove that SPI from the setup logic. Fixes: dee0683 ("bpf: test IPSec datapath on from-host") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit ebd02f1 ] On IPAM modes with one pod CIDR per node, the XFRM OUT policies look like below: src 10.0.1.0/24 dst 10.0.0.0/24 dir out priority 0 ptype main mark 0x66d11e00/0xffffff00 tmpl src 10.0.1.13 dst 10.0.0.214 proto esp spi 0x00000001 reqid 1 mode tunnel When sending traffic from the hostns, however, it may not match the source CIDR above. Traffic from the hostns may indeed leave the node with the NodeInternalIP as the source IP (vs. CiliumInternalIP which would match). In such cases, we don't match the XFRM OUT policy and fall back to the catch-all default-drop rule, ending up with XfrmOutPolBlock packet drops. Why wasn't this an issue before? It was. Traffic would simply go in plain-text (which is okay given we never intended to encrypt hostns traffic in the first place). What changes is that we now have a catch-all default-drop XFRM OUT policy to avoid leaking plain-text traffic. So it now results in XfrmOutPolBlock errors. In commit 5fe2b2d ("bpf: Don't encrypt on path hostns -> remote pod") we removed encryption for the path hostns -> remote pod. Unfortunately, that doesn't mean the issue is completely gone. On a new Cilium install, we won't see this issue of XfrmOutPolBlock drops for hostns traffic anymore. But on existing clusters, we will still see those drops during the upgrade, after the default-drop rule is installed but before hostns traffic encryption is removed. None of this is an issue on AKS and ENI IPAM modes because there, the XFRM OUT policies look like: src 0.0.0.0/0 dst 10.0.0.0/16 dir out priority 0 ptype main mark 0x66d11e00/0xffffff00 tmpl src 10.0.1.13 dst 10.0.0.214 proto esp spi 0x00000001 reqid 1 mode tunnel Thus, hostns -> remote pod traffic is matched regardless of the source IP being selected and packets are not dropped by the default-drop rule. We can therefore avoid the upgrade drops by changing the XFRM OUT policies to never match on the source IPs, as on AKS and ENI IPAM modes. Fixes: 7d44f37 ("ipsec: Catch-default default drop policy for encryption") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 0a8f2c4 ] Commits 3b3e8d0 ("node: Don't encrypt traffic to CiliumInternalIP") and 5fe2b2d ("bpf: Don't encrypt on path hostns -> remote pod") removed a path asymmetry on the paths hostns <> remote pod. They however failed to remove workarounds that we have for this path asymmetry. In particular, we would encrypt packets on the path pod -> remote node (set SPI in the node manager) to try and avoid the path asymmetry, by also encrypting the replies. And, as a result of this first change, we would also need to handle a corner case in the datapath to appluy the correct reverse DNAT for reply traffic. All of that is unnecessary now that we fixed the path asymmetry. Fixes: 3b3e8d0 ("node: Don't encrypt traffic to CiliumInternalIP") Fixes: 5fe2b2d ("bpf: Don't encrypt on path hostns -> remote pod") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 420d7fa ] Commits 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") and 19a62da ("bpf: Lookup tunnel endpoint for IPsec rewrite") changed the way we pass the tunnel endpoint. We used to pass it via skb->cb[4] and read it in bpf_host to build the encapsulation. We changed that in the above commits to pass the identity via skb->cb[4] instead. Therefore, on upgrades, for a short while, we may end up with bpf_lxc writing the identity into skb->cb[4] (new datapath version) and bpf_host interpreting it as the tunnel endpoint (old version). Reloading bpf_host before bpf_lxc is not enough to fix it because then, for a short while, bpf_lxc would write the tunnel endpoint in skb->cb[4] (old version) and bpf_host would interpret it as the security identity (new version). In addition to reloading bpf_host first, we also need to make sure that it can handle both cases (skb->cb[4] has tunnel endpoint or identity). To distinguish between those two cases and interpret skb->cb[4] correctly, bpf_host will rely on the first byte: in the case of the tunnel endpoint is can't be zero because that would mean the IP address is within the special purpose range 0.0.0.0/8; in the case of the identity, it has to be zero because identities are on 24-bits. This commit implements those changes. Commit ca9c056 ("daemon: Reload bpf_host first in case of IPsec upgrade") had already made the agent reload bpf_host first for ENI and Azure IPAM modes, so we just need to extend it to all IPAM modes. Note that the above bug on upgrades doesn't cause an immediate packet drop at the sender. Instead, it seems the packet is encrypted twice. The (unverified) assumption here is that the encapsulation is skipped because the tunnel endpoint IP address is invalid (being a security identity on 24-bits). The encrypted packet is then sent again to cilium_host where the encryption bit is reapplied (given the destination IP address is a CiliumInternalIP). And it goes through the XFRM encryption again. On the receiver's side, the packet is decrypted once as expected. It is then recirculated to bpf_overlay which removes the packet mark. Given it is still an ESP (encrypted) packet, it goes back through the XFRM decryption layer. But since the packet mark is now zero, it fails to match any XFRM IN state. The packet is dropped with XfrmInNoStates. This can be seen in the following trace: <- overlay encrypted flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> stack encrypted flow 0x6fc46fc4 , identity 134400->44 state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 <- overlay encrypted flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> host from flow 0x6fc46fc4 , identity unknown->43 state unknown ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> stack flow 0x6fc46fc4 , identity unknown->unknown state unknown ifindex cilium_host orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 The packet comes from the overlay encrypted, is sent to the stack to be decrypted, and comes back still encrypted. Fixes: 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 57249b1 ] Signed-off-by: toVersus <toversus2357@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 755eae8 ] The L2-announcements feature has not yet been used and tested widely and lacks a few features such as metrics. So mark it as beta for now. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit b10d096 ] Otherwise, when the IPv6 is disabled, the check-log-errors fails with: Error while inserting service in LB map" error="Unable to upsert service [fd00:10:96::8f2f]:8080 as IPv6 is disabled" k8sNamespace=cilium-test k8sSvcName=echo-same-node Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 58ebdd2 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 17ade33 ] As with all other *-ci images, we should not use check the digest when pulling the image, as it is different for CI builds. On main this has not been an issue, because useDigest already defaults to false, but on release branches (such as v1.14), we do check the digest by default. This caused failing workflows on the v1.14 branch which should be fixed by this commit. Fixes: 98003d5 ("ci/github: Set `useDigest=false` for Hubble Relay") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
force-pushed
the
pr/v1.14-backport-2023-07-19
branch
from
July 19, 2023 08:07
6c688b9
to
b6f7262
Compare
/test-backport-1.14 |
pchaigno
approved these changes
Jul 19, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PR looks good. Thanks Sebastian!
aanm
approved these changes
Jul 19, 2023
dylandreimerink
approved these changes
Jul 19, 2023
brb
approved these changes
Jul 19, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Reviews are in, CI is green. Merging! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
useDigest=false
for Hubble Relay #26890 (@gandro)PRs skipped due to conflicts:
hostns -> remote pod
#25440 (@pchaigno)Once this PR is merged, you can update the PR labels via:
or with