New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s, node: Restore router IPs (cilium_host) from K8s resource #16307
k8s, node: Restore router IPs (cilium_host) from K8s resource #16307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of anything important offhand, but I guess there is the possibility that some kind of fast restart with a swapped NIC in a cloud provider could turn this into a race condition. However, that's a very edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going to happen if podCIDR will change? Will it still try to restore potentially obsolete router IP?
@brb Do we support pod CIDRs changing anyway? Regardless of that, I'm thinking we need to determine the priority of the router IPs that we restore. I'm assuming that the local filesystem is the most up-to-date source of truth. An approach would be to load the router IPs from the K8s resource like this PR already does, then read the filesystem state. If the fs state contains a router IP as well, then compare it against the one from the K8s resource. If there's a mismatch, then we should warn loudly and we will probably want to switch over to it anyway. With a system like the above in-place, then we are not actually changing the behavior of Cilium by much. Only the case where the router IP doesn't exist on the filesystem, but does exist in the K8s resource, which occurs on a node reboot. Edit: Implemented the above |
5f6fd2b
to
0dc6e98
Compare
cilium host
) from K8s resource
User confirmed that this fixes their issue: #16279 (comment) |
test-me-please Edit: Legit failures, fixed in the below push |
d2b80f6
to
77bf5af
Compare
test-me-please |
Previously, after a node reboot, Cilium would allocate a new router IP and append it slice of node IPs. Since the node IPs have already been synced to the K8s resource, meaning there are already IPs present (from the previous Cilium instance), the router IP is appended to the slice. In other parts of Cilium, it is assumed that the router IP is the first node IP (first element of the slice). Since the new router IP has been appended to the end, it is no longer where it is expected, aka no longer the first element. This causes a mismatch of which router IP is to be used. There should only ever be one router IP (one IPv4 or one IPv6). In case of a node reboot, the router IPs cannot be restored because they are wiped away due to the Cilium state dir being mounted as a tmpfs [1]. This commit fixes this to restore the router IPs from the K8s resource (Node or CiliumNode) if they are present in the annotations. This prevents the possibility of having more than one router IP, as described above. Note that router IPs from the K8s resource are only restored if no router IP was found on the filesystem, which is considered the source of truth. In other words, the filesystem takes precedence over the K8s resource. The user is warned in cases of a mismatch between the two different sources. We also check that the IP to be restored is within the pod / node CIDR range, otherwise we ignore it from restoration. [1]: Linux distributions mount /run as tmpfs and Cilium's default state directory is created under /run. (It's worth mentioning that it's also common for /var/run to be symlinked to /run.) Fixes: cilium#16279 Signed-off-by: Chris Tarazi <chris@isovalent.com>
c9af154
to
68cae4c
Compare
Pushed a small change that shouldn't require a re-run. CI has passed already. |
Previously, after a node reboot, Cilium would allocate a new router IP
and append it slice of node IPs. Since the node IPs have already been
synced to the K8s resource, meaning there are already IPs present (from
the previous Cilium instance), the router IP is appended to the slice.
In other parts of Cilium, it is assumed that the router IP is the first
node IP (first element of the slice). Since the new router IP has been
appended to the end, it is no longer where it is expected, aka no longer
the first element. This causes a mismatch of which router IP is to be
used. There should only ever be one router IP (one IPv4 or one IPv6).
In case of a node reboot, the router IPs cannot be restored because they
are wiped away due to the Cilium state dir being mounted as a tmpfs [1].
This commit fixes this to restore the router IPs from the K8s resource
(Node or CiliumNode) if they are present in the annotations. This
prevents the possibility of having more than one router IP, as described
above. Note that router IPs from the K8s resource are only restored if
no router IP was found on the filesystem, which is considered the source
of truth. In other words, the filesystem takes precedence over the K8s
resource. The user is warned in cases of a mismatch between the two
different sources. We also check that the IP to be restored is within
the pod / node CIDR range, otherwise we ignore it from restoration.
[1]: Linux distributions mount /run as tmpfs and Cilium's default state
directory is created under /run. (It's worth mentioning that it's
also common for /var/run to be symlinked to /run.)
Fixes: #16279