Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924

Closed

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Apr 17, 2023

Currently, restarting the connection to remote clusters leaves a time window in which possible deletion events are missed, and never recovered once the new watchers are started. Similarly, the status is not properly cleaned-up when removing one remote cluster from the clustermesh configuration. This PR fixes these two issues.

Please, refer to the individual commits for more details.

I've currently marked this PR as backport for the current stable version (i.e., v1.13). I've also marked it as backport/author due to the conflicts concerning IPCache modifications with #22935, which was not present in v1.13 and earlier.

Fixes: #24740

Fix missed deletion events when reconnecting to/disconnecting from remote clusters

@giorio94 giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. sig/kvstore Impacts the KVStore package interactions. labels Apr 17, 2023
@giorio94 giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events branch 3 times, most recently from a482685 to 7c063a1 Compare April 18, 2023 07:58
@giorio94 giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events branch 2 times, most recently from df0b6e6 to 9a687c5 Compare April 18, 2023 16:41
@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

giorio94 commented Apr 19, 2023

/test-1.16-4.19

This test did not run since main branch has been renamed at the same time.

@giorio94
Copy link
Member Author

giorio94 commented Apr 19, 2023

/test-1.16-4.19

Failed during VM booting with different instances of:

W: Failed to fetch http://us.archive.ubuntu.com/ubuntu/dists/focal-updates/... Cannot initiate the connection to us.archive.ubuntu.com:80 (2001:67c:1562::15). - connect (101: Network is unreachable) Cannot initiate the connection to us.archive.ubuntu.com:80 (2001:67c:1562::18). - connect (101: Network is unreachable) [IP: 91.189.91.39 80]

@giorio94 giorio94 marked this pull request as ready for review April 19, 2023 07:15
@giorio94 giorio94 requested review from a team as code owners April 19, 2023 07:15
@giorio94 giorio94 added backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 19, 2023
@giorio94
Copy link
Member Author

k8s-1.16-kernel-4.19 hit known flake #24697

pkg/allocator/allocator.go Outdated Show resolved Hide resolved
pkg/allocator/allocator.go Show resolved Hide resolved
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit watcher cache: accept string rather than []byte and watcher: store scoped logger as part of the watcher don't seem related to the PR


select {
case <-ctx.Done():
scopedLog.Warning("Context canceled before remote KVStore watcher synchronization completed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a warning message, is there anything the user can do about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Info level: #24924 (comment)

pkg/allocator/allocator.go Outdated Show resolved Hide resolved
// are not present in the current cache (if any). This ensures we do not
// leak any stale identity, and at the same time we do not invalidate the
// current state.
rc.cache.drainIf(func(id idpool.ID) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause a.backend.UpdateKey to be called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, because MasterKeyProtection is not enabled for remote kvstores (as they are read-only) The only effective action is c.sendEvent(kvstore.EventTypeDelete, id, key), but I preferred to directly call onDeleteLocked for consistency with standard operations (let me know if you prefer I switch to that).

// are added/removed in the meanwhile).
scopedLog.Info("Another KVStore watcher was already registered: deleting stale identities")
rc.cache.mutex.RLock()
old.cache.drainIf(func(id idpool.ID) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause a.backend.UpdateKey to be called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, because MasterKeyProtection is not enabled for remote kvstores (asthey are read-only) The only effective action is c.sendEvent(kvstore.EventTypeDelete, id, key), but I preferred to directly call onDeleteLocked for consistency with standard operations (let me know if you prefer I switch to that).

@@ -200,6 +200,19 @@ func (c *cache) stop() {
c.stopWatchWg.Wait()
}

// drain emits a deletion event for all known IDs. It must be called after the
// cache has been stopped, to ensure that no new events can be received afterwards.
func (c *cache) drain() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the remote KVStore reconnects back while we are draining the identities?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before starting to drain the identities we wait for all incoming events to have been processed and the watch stream to be interrupted (through the waitgroup), in order to prevent that any new events are received afterwards. And all reconnection operations are performed through a per-cluster controller, hence we are guaranteed they are always executed by the same goroutine.

@@ -21,6 +24,8 @@ const (
EventTypeDelete
//EventTypeListDone signals that the initial list operation has completed
EventTypeListDone
// EventTypeDrainDone signals that the RestartableWatcher has been drained completely
EventTypeDrainDone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess this is what will help to avoid draining identities on reconnects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This event is used for the ipcache, since in case the ClusterID changes we need to drain all old identities with the old ClusterID, and insert the new ones with the new one. Hence, when triggering an explicit drain operation, the previous watcher is first stopped, all known entries drained, then the DrainDone event is emitted, and at that point all events will be from the next watcher. The DrainDone event is thus used to swap the ClusterID which is used.

When just restarting the RestartableWatcher normally (i.e., through Wrap), instead, a deletion event for all stale entries is triggered immediately before the ListDone event, and no DrainDone event is emitted (since it is transparent in that case).

@giorio94
Copy link
Member Author

The commit watcher cache: accept string rather than []byte and watcher: store scoped logger as part of the watcher don't seem related to the PR

They are preparatory commits for kvstore: implement the RestartableWatcher wrapper:

  • watcher cache: accept string rather than []byte: it reduces the number of type conversions that need to be performed, since the watcher cache is now also used by the RestartableWatcher, which already sees all keys as strings. I can drop it at the cost of a few more casts from []byte to string and vice versa.
  • watcher: store scoped logger as part of the watcher: it is required to expose a scoped logger that is then used by the RestartableWatcher to provide more contextual debug messages (since it otherwise has no info about the backend connection or watched paths).

bfe7b88 ("clustermesh: correctly remove remoteCache on connection
disruption") modified the handling of remote identities upon kvstore
reconnection to fix a memory leak. Still, it didn't address the
possibility that identities may be deleted in the time window between
the termination of the previous connection and the establishment of a
new one. If this happens, the stale identities will be removed when the
map entry gets overwritten, but the corresponding deletion event will
never be propagated to the rest of the agent processing logic.

This commit fixes this issue draining all stale identities once the new
cache for remote identities has initialized correctly. As a side effect,
this makes `allocator.WatchRemoteIdentities` wait synchronously until
all objects are listed for the first time, or the context is canceled
(which also reflects the current behavior of JoinSharedStore).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
bfe7b88 ("clustermesh: correctly remove remoteCache on
connection disruption") modified the handling of remote identities
when a remote cluster is disconnected, removing the corresponding
cache from the global allocator map. Still, it didn't emit a deletion
event to propagate this information to the rest of the agent processing
logic. This commit implements the missing logic to drain all remote
identities when a remote kvstore is unregistered.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The watcherCache is currently implemented using a map with keys of type
string, but the different methods accept []byte as argument. Given this
cache will be used also in the subsequent commits to implement the
RestartableWatcher, and the keys are actually of type string, let's
switch them to string, to reduce the number of type conversions.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Let's store the scoped logger (i.e., including the backend and watcher
information) as part of the watcher struct, so that it can be used also
by the RestartableWatcher wrapper introduced in the subsequent commits.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
RestartableWatcher implements a wrapper around a KVStore watcher,
automatically handling the generation of deletion events for stale keys
during reconnections. It will be leveraged in subsequent commits to
prevent that deletion events are missed when reconnecting to remote
clusters.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, a new ipcache watcher is created for each connection to the
kvstore backend of a remote cluster, and the old one is dropped. Yet,
this approach leads to missing the deletion event for possible entries
that are removed in the remote cluster during the reconnection process
(added entries would instead be detected by the initial "list" operation).

This commit fixes this issue moving to a single ipcache watcher per
remote cluster, which is reused upon reconnection. Specifically, it now
leverages the newly introduced RestartableWatcher wrapper to automatically
trigger a deletion event for all previously known entries that are no
longer present after a reconnection. Special handling is performed in
case the ClusterID associated with the watcher is changed, draining all
previously known entries as no longer valid (even if still present, they
need to be observed again with the new ClusterID).

Additionally, all ipcache entries are drained when disconnecting from a
remote cluster, to properly clean-up the status without requiring to
restart the agent.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit extends the current SharedStore implementation, enabling the
possibility of re-establishing a new connection to a different kvstore
backend while preserving the list of known local/shared keys, and
transparently handling the emission of a deletion event for the shared
keys that have been removed during reconnection (leveraging the
RestartableWatcher wrapper). Additionally, the Release and Close
operations are modified to synchronously wait for the termination of
the synchronization loop before exiting.

These new functionalities will be used in the subsequent commit to properly
handle the deletion of stale nodes and services upon reconnection to the
kvstore backend of a remote cluster.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, a new shared store synchronizing node and service information
is created for each connection to the kvstore backend of a remote
cluster, and the old one is dropped. Yet, this approach leads to missing
the deletion event for possible entries that are removed in the remote
cluster during the reconnection process (added entries would instead be
detected by the initial "list" operation).

This commit fixes this issue using a fixed instance of node and service
stores per remote cluster, which are reused upon reconnection, while
transparently handling the emission of the appropriate deletion events
for all keys no longer present.

Additionally, all keys are drained when disconnecting from a remote
cluster, to properly clean-up the status without requiring to restart
the agent.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events branch from 9a687c5 to 4a26d50 Compare April 20, 2023 14:58
@giorio94
Copy link
Member Author

giorio94 commented May 4, 2023

The current implementation is affected by a possible issue due to the fact that the completion of the relist operation alone is not enough to make sure that we are up-to-date (indeed, the clustermesh-apiserver might not yet have completed the synchronization from k8s CRs to the kvstore). Converting back to draft while working on that.

@giorio94 giorio94 marked this pull request as draft May 4, 2023 13:17
@github-actions
Copy link

github-actions bot commented Jun 4, 2023

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jun 4, 2023
@giorio94
Copy link
Member Author

giorio94 commented Jun 5, 2023

Closing in favor of #25499, #25675 and #25677

@giorio94 giorio94 closed this Jun 5, 2023
@giorio94 giorio94 removed backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/kvstore Impacts the KVStore package interactions. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clustermesh: deletion events can be missed when restarting the connection to remote etcd
4 participants