Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
daemon, node: Fix faulty router IP restoration logic
[ upstream commit ff63b07 ] When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode ("crd") and upon Cilium restart, it was very likely that `cilium_host` was assigned an additional IP. Below is a case where Cilium was restarted 3 times, hence getting 3 additional router IPs: ``` 4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000 link/ether 66:03:3c:07:8c:47 brd ff:ff:ff:ff:ff:ff inet 192.168.35.9/32 scope link cilium_host valid_lft forever preferred_lft forever inet 192.168.34.37/32 scope link cilium_host valid_lft forever preferred_lft forever inet 192.168.57.107/32 scope link cilium_host valid_lft forever preferred_lft forever inet6 fe80::6403:3cff:fe07:8c47/64 scope link valid_lft forever preferred_lft forever ``` This was because in CRD-backed IPAM modes, we wait until we fully sync with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on the node. Since the router IP restoration logic was using a different pod CIDR during the router IP validation check, it was erroneously discarding it. This was observed with: ``` 2021-06-25T13:59:47.816069937Z level=info msg="The router IP (192.168.135.3) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr=10.8.0.0/16 subsys=node ``` This is problematic because the extraneous router IPs could be also assigned to pods, which would break pod connectivity. The fix is to break up the router IP restoration process into 2 parts. The first is to attempt a restoration of the IP from the filesystem (`node_config.h`). We also fetch the router IPs from Kubernetes resources since they were already retrieved prior inside k8s.WaitForNodeInformation(). Then after the CRD-backed IPAM is initialized and started (*Daemon).startIPAM() is called, we attempt the second part. This includes evaluating which IPs (either from filesystem or from K8s) should be set as the router IPs. The IPs from the filesystem take precedence. In case the node was rebooted, the filesystem will be wiped so then we'd rely on the IPs from the K8s resources. At this point in the daemon initialization, we have the correct CIDR range as the pod CIDR range to validate the chosen IP. Fixes: beb8bde ("k8s, node: Restore router IPs (`cilium_host`) from K8s resource") Signed-off-by: Chris Tarazi <chris@isovalent.com>
- Loading branch information
1 parent
5fa82da
commit c349779
Showing
3 changed files
with
122 additions
and
82 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters