[ upstream commit 0bd4e04 ]
With ENI IPAM mode and IPsec enabled, users were reporting cases where
connectivity to particular pods breaks, and correlated with those drops,
the following error msg:
```
Unable to delete the IPsec route OUT from the host routing table
```
In addition, it was also reported that the connectivity outage would
only last for a few minutes before resolving itself.
The issue turned out to be that upon node deletion, the logic to handle
the IPsec cleanup is asymmetric with the IPsec logic to handle a node
create / update.
Here's how:
* With ENI mode and IPsec, subnet encryption mode is enabled
implicitly.
* Background: Users can explicitly enable subnet encryption mode by
configuring `--ipv4-pod-subnets=[cidr1,cidr2,...]`.
* Background: ENIs are part of subnet(s).
* Cilium with ENI mode automatically appends the node's ENIs'
subnets' CIDRs to this slice.
* For example, node A has ENI E which is a part of subnet S with
CIDR C. Therefore, `--ipv4-pod-subnets=[C]`.
* This means that each node should have an IPsec OUT routes for each
pod subnet, i.e. each ENI's subnet, as shown by
(*linuxNodeHandler).nodeUpdate() which contains the IPsec logic on a
node create / update.
* Upon a node delete [(*linuxNodeHandler).nodeDelete()], we clean up
the "old" node. When it gets to the IPsec logic, it removes the
routes for the pod subnets as well, i.e. removes the route to the
ENI's subnet from the local node. From the example above, it'd
remove the route for CIDR C.
* This is problematic because in ENI mode, different nodes can share
the same ENI's subnet, meaning subnets are NOT exclusive to a node.
For example, a node B can also have ENI E with a subnet C attached
to it.
* As for how the nodes were fixing themselves, it turns out that
(*Manager).backgroundSync() runs on an interval which calls
NodeValidateImplementation() which calls down to
(*linuxNodeHandler).nodeUpdate() thereby running the IPsec logic of
a node create / update which reinstates the missing routes.
Therefore, we shouldn't be deleting these routes because pods might
still be relying on them. By comparing the IPsec delete logic with [1],
we see that they're asymmetric. This commit fixes this asymmetry.
[1]: Given subnetEncryption=true, notice how we only call
enableSubnetIPsec() if the node is local. That is not the case on
node delete.
```
func (n *linuxNodeHandler) nodeUpdate(oldNode, newNode *nodeTypes.Node, firstAddition bool) error {
...
if n.nodeConfig.EnableIPSec && !n.subnetEncryption() && !n.nodeConfig.EncryptNode {
n.enableIPsec(newNode)
newKey = newNode.EncryptionKey
}
...
if n.nodeConfig.EnableIPSec && !n.subnetEncryption() {
n.encryptNode(newNode)
}
if newNode.IsLocal() {
isLocalNode = true
...
if n.subnetEncryption() {
n.enableSubnetIPsec(n.nodeConfig.IPv4PodSubnets, n.nodeConfig.IPv6PodSubnets)
}
...
return nil
}
```
Fixes: 645de9d ("cilium: remove encryption route and rules if crypto
is disabled")
Co-authored-by: John Fastabend <john@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>