Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924

giorio94 · 2023-04-17T14:04:34Z

Currently, restarting the connection to remote clusters leaves a time window in which possible deletion events are missed, and never recovered once the new watchers are started. Similarly, the status is not properly cleaned-up when removing one remote cluster from the clustermesh configuration. This PR fixes these two issues.

Please, refer to the individual commits for more details.

I've currently marked this PR as backport for the current stable version (i.e., v1.13). I've also marked it as backport/author due to the conflicts concerning IPCache modifications with #22935, which was not present in v1.13 and earlier.

Fixes: #24740

Fix missed deletion events when reconnecting to/disconnecting from remote clusters

pkg/ipcache/kvstore.go

giorio94 · 2023-04-18T16:50:56Z

/test

giorio94 · 2023-04-19T06:46:06Z

/test-1.16-4.19

This test did not run since main branch has been renamed at the same time.

giorio94 · 2023-04-19T07:13:42Z

/test-1.16-4.19

Failed during VM booting with different instances of:

W: Failed to fetch http://us.archive.ubuntu.com/ubuntu/dists/focal-updates/... Cannot initiate the connection to us.archive.ubuntu.com:80 (2001:67c:1562::15). - connect (101: Network is unreachable) Cannot initiate the connection to us.archive.ubuntu.com:80 (2001:67c:1562::18). - connect (101: Network is unreachable) [IP: 91.189.91.39 80]

giorio94 · 2023-04-19T12:09:45Z

k8s-1.16-kernel-4.19 hit known flake #24697

pkg/allocator/allocator.go

aanm

The commit watcher cache: accept string rather than []byte and watcher: store scoped logger as part of the watcher don't seem related to the PR

aanm · 2023-04-19T13:51:53Z

pkg/allocator/allocator.go

+
+	select {
+	case <-ctx.Done():
+		scopedLog.Warning("Context canceled before remote KVStore watcher synchronization completed")


This is a warning message, is there anything the user can do about it?

Changed to Info level: #24924 (comment)

pkg/allocator/allocator.go

aanm · 2023-04-19T13:55:14Z

pkg/allocator/allocator.go

+		// are not present in the current cache (if any). This ensures we do not
+		// leak any stale identity, and at the same time we do not invalidate the
+		// current state.
+		rc.cache.drainIf(func(id idpool.ID) bool {


Will this cause a.backend.UpdateKey to be called?

It doesn't, because MasterKeyProtection is not enabled for remote kvstores (as they are read-only) The only effective action is c.sendEvent(kvstore.EventTypeDelete, id, key), but I preferred to directly call onDeleteLocked for consistency with standard operations (let me know if you prefer I switch to that).

aanm · 2023-04-19T13:55:17Z

pkg/allocator/allocator.go

+		// are added/removed in the meanwhile).
+		scopedLog.Info("Another KVStore watcher was already registered: deleting stale identities")
+		rc.cache.mutex.RLock()
+		old.cache.drainIf(func(id idpool.ID) bool {


Will this cause a.backend.UpdateKey to be called?

It doesn't, because MasterKeyProtection is not enabled for remote kvstores (asthey are read-only) The only effective action is c.sendEvent(kvstore.EventTypeDelete, id, key), but I preferred to directly call onDeleteLocked for consistency with standard operations (let me know if you prefer I switch to that).

aanm · 2023-04-19T13:57:33Z

pkg/allocator/cache.go

@@ -200,6 +200,19 @@ func (c *cache) stop() {
 	c.stopWatchWg.Wait()
 }

+// drain emits a deletion event for all known IDs. It must be called after the
+// cache has been stopped, to ensure that no new events can be received afterwards.
+func (c *cache) drain() {


What happens if the remote KVStore reconnects back while we are draining the identities?

Before starting to drain the identities we wait for all incoming events to have been processed and the watch stream to be interrupted (through the waitgroup), in order to prevent that any new events are received afterwards. And all reconnection operations are performed through a per-cluster controller, hence we are guaranteed they are always executed by the same goroutine.

aanm · 2023-04-19T14:00:03Z

pkg/kvstore/events.go

@@ -21,6 +24,8 @@ const (
 	EventTypeDelete
 	//EventTypeListDone signals that the initial list operation has completed
 	EventTypeListDone
+	// EventTypeDrainDone signals that the RestartableWatcher has been drained completely
+	EventTypeDrainDone


Ah, I guess this is what will help to avoid draining identities on reconnects

This event is used for the ipcache, since in case the ClusterID changes we need to drain all old identities with the old ClusterID, and insert the new ones with the new one. Hence, when triggering an explicit drain operation, the previous watcher is first stopped, all known entries drained, then the DrainDone event is emitted, and at that point all events will be from the next watcher. The DrainDone event is thus used to swap the ClusterID which is used.

When just restarting the RestartableWatcher normally (i.e., through Wrap), instead, a deletion event for all stale entries is triggered immediately before the ListDone event, and no DrainDone event is emitted (since it is transparent in that case).

giorio94 · 2023-04-20T14:56:52Z

The commit watcher cache: accept string rather than []byte and watcher: store scoped logger as part of the watcher don't seem related to the PR

They are preparatory commits for kvstore: implement the RestartableWatcher wrapper:

watcher cache: accept string rather than []byte: it reduces the number of type conversions that need to be performed, since the watcher cache is now also used by the RestartableWatcher, which already sees all keys as strings. I can drop it at the cost of a few more casts from []byte to string and vice versa.
watcher: store scoped logger as part of the watcher: it is required to expose a scoped logger that is then used by the RestartableWatcher to provide more contextual debug messages (since it otherwise has no info about the backend connection or watched paths).

bfe7b88 ("clustermesh: correctly remove remoteCache on connection disruption") modified the handling of remote identities upon kvstore reconnection to fix a memory leak. Still, it didn't address the possibility that identities may be deleted in the time window between the termination of the previous connection and the establishment of a new one. If this happens, the stale identities will be removed when the map entry gets overwritten, but the corresponding deletion event will never be propagated to the rest of the agent processing logic. This commit fixes this issue draining all stale identities once the new cache for remote identities has initialized correctly. As a side effect, this makes `allocator.WatchRemoteIdentities` wait synchronously until all objects are listed for the first time, or the context is canceled (which also reflects the current behavior of JoinSharedStore). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

bfe7b88 ("clustermesh: correctly remove remoteCache on connection disruption") modified the handling of remote identities when a remote cluster is disconnected, removing the corresponding cache from the global allocator map. Still, it didn't emit a deletion event to propagate this information to the rest of the agent processing logic. This commit implements the missing logic to drain all remote identities when a remote kvstore is unregistered. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

The watcherCache is currently implemented using a map with keys of type string, but the different methods accept []byte as argument. Given this cache will be used also in the subsequent commits to implement the RestartableWatcher, and the keys are actually of type string, let's switch them to string, to reduce the number of type conversions. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Let's store the scoped logger (i.e., including the backend and watcher information) as part of the watcher struct, so that it can be used also by the RestartableWatcher wrapper introduced in the subsequent commits. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

RestartableWatcher implements a wrapper around a KVStore watcher, automatically handling the generation of deletion events for stale keys during reconnections. It will be leveraged in subsequent commits to prevent that deletion events are missed when reconnecting to remote clusters. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Currently, a new ipcache watcher is created for each connection to the kvstore backend of a remote cluster, and the old one is dropped. Yet, this approach leads to missing the deletion event for possible entries that are removed in the remote cluster during the reconnection process (added entries would instead be detected by the initial "list" operation). This commit fixes this issue moving to a single ipcache watcher per remote cluster, which is reused upon reconnection. Specifically, it now leverages the newly introduced RestartableWatcher wrapper to automatically trigger a deletion event for all previously known entries that are no longer present after a reconnection. Special handling is performed in case the ClusterID associated with the watcher is changed, draining all previously known entries as no longer valid (even if still present, they need to be observed again with the new ClusterID). Additionally, all ipcache entries are drained when disconnecting from a remote cluster, to properly clean-up the status without requiring to restart the agent. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

This commit extends the current SharedStore implementation, enabling the possibility of re-establishing a new connection to a different kvstore backend while preserving the list of known local/shared keys, and transparently handling the emission of a deletion event for the shared keys that have been removed during reconnection (leveraging the RestartableWatcher wrapper). Additionally, the Release and Close operations are modified to synchronously wait for the termination of the synchronization loop before exiting. These new functionalities will be used in the subsequent commit to properly handle the deletion of stale nodes and services upon reconnection to the kvstore backend of a remote cluster. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Currently, a new shared store synchronizing node and service information is created for each connection to the kvstore backend of a remote cluster, and the old one is dropped. Yet, this approach leads to missing the deletion event for possible entries that are removed in the remote cluster during the reconnection process (added entries would instead be detected by the initial "list" operation). This commit fixes this issue using a fixed instance of node and service stores per remote cluster, which are reused upon reconnection, while transparently handling the emission of the appropriate deletion events for all keys no longer present. Additionally, all keys are drained when disconnecting from a remote cluster, to properly clean-up the status without requiring to restart the agent. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 · 2023-05-04T13:17:07Z

The current implementation is affected by a possible issue due to the fact that the completion of the relist operation alone is not enough to make sure that we are up-to-date (indeed, the clustermesh-apiserver might not yet have completed the synchronization from k8s CRs to the kvstore). Converting back to draft while working on that.

github-actions · 2023-06-04T02:11:52Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

giorio94 · 2023-06-05T06:31:30Z

Closing in favor of #25499, #25675 and #25677

giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. sig/kvstore Impacts the KVStore package interactions. labels Apr 17, 2023

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events branch 3 times, most recently from a482685 to 7c063a1 Compare April 18, 2023 07:58

YutaroHayakawa reviewed Apr 18, 2023

View reviewed changes

pkg/ipcache/kvstore.go Show resolved Hide resolved

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events branch 2 times, most recently from df0b6e6 to 9a687c5 Compare April 18, 2023 16:41

giorio94 marked this pull request as ready for review April 19, 2023 07:15

giorio94 requested review from a team as code owners April 19, 2023 07:15

giorio94 requested review from aanm, nathanjsweet, christarazi and YutaroHayakawa April 19, 2023 07:15

giorio94 added backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 19, 2023

christarazi reviewed Apr 19, 2023

View reviewed changes

pkg/allocator/allocator.go Outdated Show resolved Hide resolved

pkg/allocator/allocator.go Show resolved Hide resolved

aanm requested changes Apr 19, 2023

View reviewed changes

giorio94 added 7 commits April 20, 2023 16:57

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events branch from 9a687c5 to 4a26d50 Compare April 20, 2023 14:58

giorio94 requested review from aanm and christarazi April 20, 2023 14:59

christarazi approved these changes Apr 20, 2023

View reviewed changes

aanm approved these changes Apr 24, 2023

View reviewed changes

giorio94 marked this pull request as draft May 4, 2023 13:17

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jun 4, 2023

giorio94 closed this Jun 5, 2023

giorio94 removed backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924

Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924

giorio94 commented Apr 17, 2023 •

edited

giorio94 commented Apr 18, 2023

giorio94 commented Apr 19, 2023 •

edited

giorio94 commented Apr 19, 2023 •

edited

giorio94 commented Apr 19, 2023

aanm left a comment

aanm Apr 19, 2023

giorio94 Apr 20, 2023

aanm Apr 19, 2023

giorio94 Apr 20, 2023

aanm Apr 19, 2023

giorio94 Apr 20, 2023

aanm Apr 19, 2023

giorio94 Apr 20, 2023

aanm Apr 19, 2023

giorio94 Apr 20, 2023

giorio94 commented Apr 20, 2023

giorio94 commented May 4, 2023

github-actions bot commented Jun 4, 2023

giorio94 commented Jun 5, 2023

Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924

Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924

Conversation

giorio94 commented Apr 17, 2023 • edited

giorio94 commented Apr 18, 2023

giorio94 commented Apr 19, 2023 • edited

giorio94 commented Apr 19, 2023 • edited

giorio94 commented Apr 19, 2023

aanm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giorio94 commented Apr 20, 2023

giorio94 commented May 4, 2023

github-actions bot commented Jun 4, 2023

giorio94 commented Jun 5, 2023

giorio94 commented Apr 17, 2023 •

edited

giorio94 commented Apr 19, 2023 •

edited

giorio94 commented Apr 19, 2023 •

edited