-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix possible cross-cluster connection drops on agents restart when clustermesh is enabled #27575
Merged
borkmann
merged 7 commits into
cilium:main
from
giorio94:mio/fix-clustermesh-drops-on-restart
Aug 31, 2023
Merged
Fix possible cross-cluster connection drops on agents restart when clustermesh is enabled #27575
borkmann
merged 7 commits into
cilium:main
from
giorio94:mio/fix-clustermesh-drops-on-restart
Aug 31, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
giorio94
added
kind/bug
This is a bug in the Cilium logic.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
area/clustermesh
Relates to multi-cluster routing functionality in Cilium.
labels
Aug 17, 2023
giorio94
changed the title
Fix possible connection drops on agents restart when clustermesh is enabled
Fix possible cross-cluster connection drops on agents restart when clustermesh is enabled
Aug 17, 2023
/test |
Merged
1 task
giorio94
force-pushed
the
mio/fix-clustermesh-drops-on-restart
branch
from
August 22, 2023 13:11
28ed271
to
d4c61f7
Compare
/test |
giorio94
added
the
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
label
Aug 22, 2023
Converting back to draft, as the remaining part of the fix for #26462 (when endpoints are selected by netpols) should be simple enough to be included here. |
giorio94
force-pushed
the
mio/fix-clustermesh-drops-on-restart
branch
from
August 22, 2023 17:05
d4c61f7
to
67f8e99
Compare
/test |
giorio94
force-pushed
the
mio/fix-clustermesh-drops-on-restart
branch
from
August 22, 2023 17:07
67f8e99
to
98f1dd5
Compare
/test |
ti-mo
approved these changes
Aug 23, 2023
giorio94
force-pushed
the
mio/fix-clustermesh-drops-on-restart
branch
from
August 23, 2023 08:35
98f1dd5
to
0acd226
Compare
giorio94
added
the
backport/author
The backport will be carried out by the author of the PR.
label
Aug 23, 2023
Add a new IPIdentitiesSynced method to the clustermesh object to allow waiting until the initial list of ipcache entries and identities has been received from all known remote clusters. This mimics the ServicesSynced() method, and enables to postpone disruptive operations to prevent dropping valid long-running connections on agent restart. Differently from the services case, we don't need to separately track the propagation of the single entries to the datapath, as performed synchronously. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Add a new NodesSynced method to the clustermesh object to allow waiting until the initial list of nodes has been received from all known remote clusters. This mimics the ServicesSynced() method, and enables to postpone disruptive operations to prevent dropping valid long-running connections on agent restart. Differently from the services case, we don't need to separately track the propagation of the single entries to the datapath, as performed synchronously. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
A new ipcache map is recreated from scratch and populated by each cilium agent upon restart, and the existing endpoints are switched to use it during the regeneration process of the endpoint itself. Before performing this operation, the new ipcache map needs to have already been fully synchronized, otherwise we might drop long running connections. Currently we wait for synchronization from k8s (when in CRD mode) and the kvstore (when in kvstore mode). Yet, when clustermesh is enabled, we don't wait for synchronization from all the remote clusters, possibly disrupting existing cross-cluster connections. The same also applies to identities in case network policies targeting the given endpoint are present. This commit introduces additional logic to wait for ipcache and identities synchronization from remote clusters before triggering endpoint regeneration. The wait period is bound by a user-configurable timeout, defaulting to one minute. The idea being that we don't block the local endpoints regeneration for an extended period of time if a given remote cluster is not ready. This also prevents possible inter dependency problems, with the regeneration in one cluster being required to allow other clusters to successfully connect to it. Nonetheless, the timeout can be configured through a dedicated --clustermesh-ip-identities-sync-timeout flag, to select the desired trade-off between possible endpoints regeneration delay and likelihood of connection drops if a given remote clusters is not ready (e.g., because the corresponding clustermesh-apiserver is restarting). A special case is represented by the host endpoint if IPSec is enabled, because it is currently restored synchronously to ensure ordering. Given that blocking this operation prevents the agent from becoming ready, and in turn new pods from being scheduled, we forcefully cap the wait timeout to 5 seconds. Empirically, this value has proved to be sufficient to retrieve a high number of entries if the remote kvstore is ready (e.g., the clustermesh-apiserver pod is running, and already synchronized with Kubernetes). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the WireGuard subsystem triggers the deletion of obsolete peers when the daemon restoration process has completed. Yet, at this point the information about nodes belonging to remote clusters might not yet have been received, causing the disruption of cross-cluster connections. Hence, let's first wait for the synchronization of all remote nodes before kicking off the removal process. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
giorio94
force-pushed
the
mio/fix-clustermesh-drops-on-restart
branch
from
August 30, 2023 17:39
3b69304
to
e0b482a
Compare
Rebased onto main to pick CI changes |
giorio94
removed
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Aug 30, 2023
/test |
maintainer-s-little-helper
bot
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Aug 31, 2023
giorio94
added
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
and removed
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
labels
Sep 1, 2023
maintainer-s-little-helper
bot
moved this from Needs backport from main
to Backport pending to v1.14
in 1.14.2
Sep 1, 2023
temporarily moving this back to |
michi-covalent
added
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
and removed
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
labels
Sep 9, 2023
maintainer-s-little-helper
bot
moved this from Backport pending to v1.14
to Needs backport from main
in 1.14.2
Sep 9, 2023
michi-covalent
added
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
and removed
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
labels
Sep 9, 2023
michi-covalent
moved this from Needs backport from main
to Backport pending to v1.14
in 1.14.3
Sep 9, 2023
giorio94
added
backport-done/1.14
The backport for Cilium 1.14.x for this PR is done.
and removed
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
labels
Sep 22, 2023
jrajahalme
moved this from Backport pending to v1.14
to Backport done to v1.14
in 1.14.3
Oct 18, 2023
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/clustermesh
Relates to multi-cluster routing functionality in Cilium.
backport/author
The backport will be carried out by the author of the PR.
backport-done/1.14
The backport for Cilium 1.14.x for this PR is done.
kind/bug
This is a bug in the Cilium logic.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR modifies the endpoint regeneration logic to additional wait for remote clusters ipcache synchronization before performing the datapath regeneration, to make sure that the new ipcache map starts to be used only when it has been completely populated.
A user-configurable timeout (defaulting to one minute) limits the maximum wait interval, to prevent blocking the local endpoints regeneration process for an extended period of time if a remote cluster is not ready (which could also lead to checken-and-egg dependencies). A special case is IPSec enabled, as this currently requires a synchronous regeneration of the host endpoint: here, we limit the timeout to at most 5 seconds, to not block the agent startup process.
In addition, when WireGuard is enabled, the deletion of obsolete peers is now performed after listing all nodes from remote clusters, again to prevent possible connectivity drops on agent restart.
Please refer to the individual commit messages for additional details.
Tested with #27232: https://github.com/cilium/cilium/actions/runs/5949436519
The downgrade test was based on the current tip of v1.14 with these changes on top; hence, I'm marking this PR for backport to v1.14 (and backport/author given that I've already the PR for that).
Fixes: #26462