Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
nodediscovery: Fix bug where CiliumInternalIP was flapping
[ upstream commit 263e689 ] [ backporter's notes: Various conflicts. net.IPFamilyOfString(x) doesn't exist so it had to be replaced with ParseIPSloppy(x).To4(). slices.DeleteFunc doesn't exist so it had to be defined in our own slices package as done on v1.14. mutateNodeResource's logic was significantly refactored so the new code to clean node IP addresses had to be moved before all appending of node IP addresses. ] This fixes a bug in `UpdateCiliumNodeResource` where the `CiliumInternalIP` (aka `cilium_host` IP, aka router IP) was flapping in the node manager during restoration (i.e. during cilium-agent restarts). In particular in `cluster-pool` mode, `UpdateCiliumNodeResource` is called before the `cilium_host` IP has been restored, as there are some circular dependencies: The restored IP can only be fully validated after the IPAM subsystem is ready, but that in turn can only happen if the `CiliumNode` object has been created. The `UpdateCiliumNodeResource` function however will only announce the `cilium_host` IP if it has been restored. This commit attempts to break that cycle by not overwriting any already existing `CiliumInternalIP` in the CiliumNode resource. Overall, this change is rather hacky, in particular it does not address the fact that other less crucial node information (like the health IP) also flaps. But since we want to backport this bugfix to older stable branches too, this change is intentionally kept as minimal as possible. Example node event (as observed by other nodes) before this change: ``` 2023-12-18T12:58:20.070330814Z level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.4\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::4\"}],..." subsys=nodemanager 2023-12-18T12:58:20.208082226Z level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.4\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::4\"},{\"Type\":\"CiliumInternalIP\",\"IP\":\"10.0.1.245\"}],..." subsys=nodemanager ``` After this change (note the `CiliumInternalIP` present in both events): ``` 2023-12-18T15:38:23.695653876Z level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"CiliumInternalIP\",\"IP\":\"10.0.1.245\"},{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.4\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::4\"}],..." subsys=nodemanager 2023-12-18T15:38:23.838604573Z level=debug msg="Received node update event from custom-resource" node="{\"Name\":\"kind-worker\",\"Cluster\":\"default\",\"IPAddresses\":[{\"Type\":\"InternalIP\",\"IP\":\"172.18.0.4\"},{\"Type\":\"InternalIP\",\"IP\":\"fc00:c111::4\"},{\"Type\":\"CiliumInternalIP\",\"IP\":\"10.0.1.245\"}],...}" subsys=nodemanager ``` Reported-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
- Loading branch information