Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s, node: Restore router IPs (cilium_host) from K8s resource #16307

Merged

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented May 25, 2021

Previously, after a node reboot, Cilium would allocate a new router IP
and append it slice of node IPs. Since the node IPs have already been
synced to the K8s resource, meaning there are already IPs present (from
the previous Cilium instance), the router IP is appended to the slice.
In other parts of Cilium, it is assumed that the router IP is the first
node IP (first element of the slice). Since the new router IP has been
appended to the end, it is no longer where it is expected, aka no longer
the first element. This causes a mismatch of which router IP is to be
used. There should only ever be one router IP (one IPv4 or one IPv6).

In case of a node reboot, the router IPs cannot be restored because they
are wiped away due to the Cilium state dir being mounted as a tmpfs [1].
This commit fixes this to restore the router IPs from the K8s resource
(Node or CiliumNode) if they are present in the annotations. This
prevents the possibility of having more than one router IP, as described
above. Note that router IPs from the K8s resource are only restored if
no router IP was found on the filesystem, which is considered the source
of truth. In other words, the filesystem takes precedence over the K8s
resource. The user is warned in cases of a mismatch between the two
different sources. We also check that the IP to be restored is within
the pod / node CIDR range, otherwise we ignore it from restoration.

[1]: Linux distributions mount /run as tmpfs and Cilium's default state
directory is created under /run. (It's worth mentioning that it's
also common for /var/run to be symlinked to /run.)

Fixes: #16279

Fix bug where Cilium allocates a new router (`cilium_host`) IP upon node reboot, breaking connectivity especially with IPsec

@christarazi christarazi added sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. kind/bug This is a bug in the Cilium logic. area/daemon Impacts operation of the Cilium daemon. needs-backport/1.10 labels May 25, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 25, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.8 May 25, 2021
@christarazi christarazi added the release-note/bug This PR fixes an issue in a previous release of Cilium. label May 25, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 25, 2021
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything important offhand, but I guess there is the possibility that some kind of fast restart with a swapped NIC in a cloud provider could turn this into a race condition. However, that's a very edge case.

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going to happen if podCIDR will change? Will it still try to restore potentially obsolete router IP?

@christarazi
Copy link
Member Author

christarazi commented May 26, 2021

What's going to happen if podCIDR will change? Will it still try to restore potentially obsolete router IP?

@brb Do we support pod CIDRs changing anyway?

Regardless of that, I'm thinking we need to determine the priority of the router IPs that we restore. I'm assuming that the local filesystem is the most up-to-date source of truth. An approach would be to load the router IPs from the K8s resource like this PR already does, then read the filesystem state. If the fs state contains a router IP as well, then compare it against the one from the K8s resource. If there's a mismatch, then we should warn loudly and we will probably want to switch over to it anyway.

With a system like the above in-place, then we are not actually changing the behavior of Cilium by much. Only the case where the router IP doesn't exist on the filesystem, but does exist in the K8s resource, which occurs on a node reboot.

Edit: Implemented the above

@christarazi christarazi force-pushed the pr/christarazi/restore-router-ips-k8s branch 4 times, most recently from 5f6fd2b to 0dc6e98 Compare May 26, 2021 19:18
@christarazi christarazi marked this pull request as ready for review May 26, 2021 19:18
@christarazi christarazi requested a review from a team as a code owner May 26, 2021 19:18
@christarazi christarazi requested review from a team and nathanjsweet May 26, 2021 19:18
@christarazi christarazi changed the title k8s, node: Restore router IPs (cilium host) from K8s resource k8s, node: Restore router IPs (cilium_host) from K8s resource May 26, 2021
@christarazi
Copy link
Member Author

User confirmed that this fixes their issue: #16279 (comment)

@christarazi christarazi requested a review from a team May 26, 2021 19:19
@christarazi
Copy link
Member Author

christarazi commented May 26, 2021

test-me-please

Edit: Legit failures, fixed in the below push

@christarazi christarazi force-pushed the pr/christarazi/restore-router-ips-k8s branch 2 times, most recently from d2b80f6 to 77bf5af Compare May 27, 2021 18:39
@christarazi
Copy link
Member Author

test-me-please

@aanm aanm added this to Needs backport from master in 1.9.9 May 27, 2021
Previously, after a node reboot, Cilium would allocate a new router IP
and append it slice of node IPs. Since the node IPs have already been
synced to the K8s resource, meaning there are already IPs present (from
the previous Cilium instance), the router IP is appended to the slice.
In other parts of Cilium, it is assumed that the router IP is the first
node IP (first element of the slice). Since the new router IP has been
appended to the end, it is no longer where it is expected, aka no longer
the first element. This causes a mismatch of which router IP is to be
used. There should only ever be one router IP (one IPv4 or one IPv6).

In case of a node reboot, the router IPs cannot be restored because they
are wiped away due to the Cilium state dir being mounted as a tmpfs [1].
This commit fixes this to restore the router IPs from the K8s resource
(Node or CiliumNode) if they are present in the annotations. This
prevents the possibility of having more than one router IP, as described
above. Note that router IPs from the K8s resource are only restored if
no router IP was found on the filesystem, which is considered the source
of truth. In other words, the filesystem takes precedence over the K8s
resource. The user is warned in cases of a mismatch between the two
different sources. We also check that the IP to be restored is within
the pod / node CIDR range, otherwise we ignore it from restoration.

[1]: Linux distributions mount /run as tmpfs and Cilium's default state
     directory is created under /run. (It's worth mentioning that it's
     also common for /var/run to be symlinked to /run.)

Fixes: cilium#16279

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/restore-router-ips-k8s branch from c9af154 to 68cae4c Compare June 1, 2021 23:54
@christarazi
Copy link
Member Author

christarazi commented Jun 1, 2021

Pushed a small change that shouldn't require a re-run. CI has passed already.

@aanm aanm merged commit beb8bde into cilium:master Jun 4, 2021
@christarazi christarazi deleted the pr/christarazi/restore-router-ips-k8s branch June 4, 2021 16:45
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.9 Jun 16, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Needs backport from master in 1.9.9 Jun 16, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Needs backport from master in 1.9.9 Jun 16, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.9 Jun 16, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.9 Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
No open projects
1.9.9
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

Connectivity issues with IPsec after node reboot
7 participants