Skip to content

Commit

Permalink
bpf, daemon: Have bpf_host support both values for skb->cb[4]
Browse files Browse the repository at this point in the history
Commits 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") and 19a62da
("bpf: Lookup tunnel endpoint for IPsec rewrite") changed the way we
pass the tunnel endpoint. We used to pass it via skb->cb[4] and read it
in bpf_host to build the encapsulation. We changed that in the above
commits to pass the identity via skb->cb[4] instead.

Therefore, on upgrades, for a short while, we may end up with bpf_lxc
writing the identity into skb->cb[4] (new datapath version) and
bpf_host interpreting it as the tunnel endpoint (old version).

Reloading bpf_host before bpf_lxc is not enough to fix it because then,
for a short while, bpf_lxc would write the tunnel endpoint in skb->cb[4]
(old version) and bpf_host would interpret it as the security identity
(new version).

In addition to reloading bpf_host first, we also need to make sure that
it can handle both cases (skb->cb[4] has tunnel endpoint or identity).
To distinguish between those two cases and interpret skb->cb[4]
correctly, bpf_host will rely on the first byte: in the case of the
tunnel endpoint is can't be zero because that would mean the IP address
is within the special purpose range 0.0.0.0/8; in the case of the
identity, it has to be zero because identities are on 24-bits.

This commit implements those changes. Commit ca9c056 ("daemon:
Reload bpf_host first in case of IPsec upgrade") had already made the
agent reload bpf_host first for ENI and Azure IPAM modes, so we just
need to extend it to all IPAM modes.

Note that the above bug on upgrades doesn't cause an immediate packet
drop at the sender. Instead, it seems the packet is encrypted twice. The
(unverified) assumption here is that the encapsulation is skipped
because the tunnel endpoint IP address is invalid (being a security
identity on 24-bits). The encrypted packet is then sent again to
cilium_host where the encryption bit is reapplied (given the destination
IP address is a CiliumInternalIP). And it goes through the XFRM
encryption again.

On the receiver's side, the packet is decrypted once as expected. It is
then recirculated to bpf_overlay which removes the packet mark. Given it
is still an ESP (encrypted) packet, it goes back through the XFRM
decryption layer. But since the packet mark is now zero, it fails to
match any XFRM IN state. The packet is dropped with XfrmInNoStates. This
can be seen in the following trace:

    <- overlay encrypted  flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32
    -> stack encrypted  flow 0x6fc46fc4 , identity 134400->44 state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32
    <- overlay encrypted  flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32
    -> host from flow 0x6fc46fc4 , identity unknown->43 state unknown ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32
    -> stack flow 0x6fc46fc4 , identity unknown->unknown state unknown ifindex cilium_host orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32

The packet comes from the overlay encrypted, is sent to the stack to be
decrypted, and comes back still encrypted.

Fixes: 4c7cce1 ("bpf: Remove IP_POOLS IPsec code")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
  • Loading branch information
pchaigno committed Jul 7, 2023
1 parent 09bd8c0 commit 2895ea3
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 5 deletions.
18 changes: 18 additions & 0 deletions bpf/lib/identity.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,24 @@ static __always_inline __u32 inherit_identity_from_host(struct __ctx_buff *ctx,
*identity = HOST_ID;
} else if (magic == MARK_MAGIC_ENCRYPT) {
*identity = ctx_load_meta(ctx, CB_ENCRYPT_IDENTITY);

/* Special case needed to handle upgrades. Can be removed in v1.15.
* Before the upgrade, bpf_lxc will write the tunnel endpoint in
* skb->cb[4]. After the upgrade, it will write the security identity.
* For the upgrade to happen without drops, bpf_host thus needs to
* handle both cases.
* We can distinguish between the two cases by looking at the first
* byte. Identities are on 24-bits so the first byte will be zero;
* conversely, tunnel endpoint addresses within the range 0.0.0.0/8
* (first byte is zero) are impossible because special purpose
* (RFC6890).
*/
if ((*identity & 0xFF000000) != 0) {
/* skb->cb[4] was actually carrying the tunnel endpoint and the
* security identity is in the mark.
*/
*identity = get_identity(ctx);
}
#if defined(ENABLE_L7_LB)
} else if (magic == MARK_MAGIC_PROXY_EGRESS_EPID) {
*identity = get_epid(ctx); /* endpoint identity, not security identity! */
Expand Down
7 changes: 2 additions & 5 deletions daemon/cmd/state.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ import (
"github.com/cilium/cilium/pkg/controller"
"github.com/cilium/cilium/pkg/endpoint"
"github.com/cilium/cilium/pkg/ipam"
ipamOption "github.com/cilium/cilium/pkg/ipam/option"
slim_corev1 "github.com/cilium/cilium/pkg/k8s/slim/k8s/api/core/v1"
"github.com/cilium/cilium/pkg/k8s/watchers/resources"
"github.com/cilium/cilium/pkg/labels"
Expand Down Expand Up @@ -305,8 +304,7 @@ func (d *Daemon) regenerateRestoredEndpoints(state *endpointRestoreState) (resto
}
}

if option.Config.EnableIPSec &&
(option.Config.IPAM == ipamOption.IPAMENI || option.Config.IPAM == ipamOption.IPAMAzure) {
if option.Config.EnableIPSec {
// If IPsec is enabled on EKS or AKS, we need to restore the host
// endpoint before any other endpoint, to ensure a dropless upgrade.
// This code can be removed in v1.15.
Expand All @@ -333,8 +331,7 @@ func (d *Daemon) regenerateRestoredEndpoints(state *endpointRestoreState) (resto
}

for _, ep := range state.restored {
if ep.IsHost() && option.Config.EnableIPSec &&
(option.Config.IPAM == ipamOption.IPAMENI || option.Config.IPAM == ipamOption.IPAMAzure) {
if ep.IsHost() && option.Config.EnableIPSec {
// The host endpoint was handled above.
continue
}
Expand Down

0 comments on commit 2895ea3

Please sign in to comment.