Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bpf, daemon: Have bpf_host support both values for skb->cb[4]
Commits 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") and 19a62da ("bpf: Lookup tunnel endpoint for IPsec rewrite") changed the way we pass the tunnel endpoint. We used to pass it via skb->cb[4] and read it in bpf_host to build the encapsulation. We changed that in the above commits to pass the identity via skb->cb[4] instead. Therefore, on upgrades, for a short while, we may end up with bpf_lxc writing the identity into skb->cb[4] (new datapath version) and bpf_host interpreting it as the tunnel endpoint (old version). Reloading bpf_host before bpf_lxc is not enough to fix it because then, for a short while, bpf_lxc would write the tunnel endpoint in skb->cb[4] (old version) and bpf_host would interpret it as the security identity (new version). In addition to reloading bpf_host first, we also need to make sure that it can handle both cases (skb->cb[4] has tunnel endpoint or identity). To distinguish between those two cases and interpret skb->cb[4] correctly, bpf_host will rely on the first byte: in the case of the tunnel endpoint is can't be zero because that would mean the IP address is within the special purpose range 0.0.0.0/8; in the case of the identity, it has to be zero because identities are on 24-bits. This commit implements those changes. Commit ca9c056 ("daemon: Reload bpf_host first in case of IPsec upgrade") had already made the agent reload bpf_host first for ENI and Azure IPAM modes, so we just need to extend it to all IPAM modes. Note that the above bug on upgrades doesn't cause an immediate packet drop at the sender. Instead, it seems the packet is encrypted twice. The (unverified) assumption here is that the encapsulation is skipped because the tunnel endpoint IP address is invalid (being a security identity on 24-bits). The encrypted packet is then sent again to cilium_host where the encryption bit is reapplied (given the destination IP address is a CiliumInternalIP). And it goes through the XFRM encryption again. On the receiver's side, the packet is decrypted once as expected. It is then recirculated to bpf_overlay which removes the packet mark. Given it is still an ESP (encrypted) packet, it goes back through the XFRM decryption layer. But since the packet mark is now zero, it fails to match any XFRM IN state. The packet is dropped with XfrmInNoStates. This can be seen in the following trace: <- overlay encrypted flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> stack encrypted flow 0x6fc46fc4 , identity 134400->44 state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 <- overlay encrypted flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> host from flow 0x6fc46fc4 , identity unknown->43 state unknown ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> stack flow 0x6fc46fc4 , identity unknown->unknown state unknown ifindex cilium_host orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 The packet comes from the overlay encrypted, is sent to the stack to be decrypted, and comes back still encrypted. Fixes: 4c7cce1 ("bpf: Remove IP_POOLS IPsec code") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
- Loading branch information