-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924
Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924
Conversation
a482685
to
7c063a1
Compare
df0b6e6
to
9a687c5
Compare
/test |
/test-1.16-4.19 This test did not run since main branch has been renamed at the same time. |
/test-1.16-4.19 Failed during VM booting with different instances of:
|
k8s-1.16-kernel-4.19 hit known flake #24697 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit watcher cache: accept string rather than []byte
and watcher: store scoped logger as part of the watcher
don't seem related to the PR
pkg/allocator/allocator.go
Outdated
|
||
select { | ||
case <-ctx.Done(): | ||
scopedLog.Warning("Context canceled before remote KVStore watcher synchronization completed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a warning message, is there anything the user can do about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to Info level: #24924 (comment)
// are not present in the current cache (if any). This ensures we do not | ||
// leak any stale identity, and at the same time we do not invalidate the | ||
// current state. | ||
rc.cache.drainIf(func(id idpool.ID) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause a.backend.UpdateKey
to be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, because MasterKeyProtection
is not enabled for remote kvstores (as they are read-only) The only effective action is c.sendEvent(kvstore.EventTypeDelete, id, key)
, but I preferred to directly call onDeleteLocked
for consistency with standard operations (let me know if you prefer I switch to that).
// are added/removed in the meanwhile). | ||
scopedLog.Info("Another KVStore watcher was already registered: deleting stale identities") | ||
rc.cache.mutex.RLock() | ||
old.cache.drainIf(func(id idpool.ID) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause a.backend.UpdateKey
to be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, because MasterKeyProtection
is not enabled for remote kvstores (asthey are read-only) The only effective action is c.sendEvent(kvstore.EventTypeDelete, id, key)
, but I preferred to directly call onDeleteLocked
for consistency with standard operations (let me know if you prefer I switch to that).
@@ -200,6 +200,19 @@ func (c *cache) stop() { | |||
c.stopWatchWg.Wait() | |||
} | |||
|
|||
// drain emits a deletion event for all known IDs. It must be called after the | |||
// cache has been stopped, to ensure that no new events can be received afterwards. | |||
func (c *cache) drain() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the remote KVStore reconnects back while we are draining the identities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before starting to drain the identities we wait for all incoming events to have been processed and the watch stream to be interrupted (through the waitgroup), in order to prevent that any new events are received afterwards. And all reconnection operations are performed through a per-cluster controller, hence we are guaranteed they are always executed by the same goroutine.
@@ -21,6 +24,8 @@ const ( | |||
EventTypeDelete | |||
//EventTypeListDone signals that the initial list operation has completed | |||
EventTypeListDone | |||
// EventTypeDrainDone signals that the RestartableWatcher has been drained completely | |||
EventTypeDrainDone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess this is what will help to avoid draining identities on reconnects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This event is used for the ipcache, since in case the ClusterID changes we need to drain all old identities with the old ClusterID, and insert the new ones with the new one. Hence, when triggering an explicit drain operation, the previous watcher is first stopped, all known entries drained, then the DrainDone event is emitted, and at that point all events will be from the next watcher. The DrainDone event is thus used to swap the ClusterID which is used.
When just restarting the RestartableWatcher
normally (i.e., through Wrap
), instead, a deletion event for all stale entries is triggered immediately before the ListDone event, and no DrainDone
event is emitted (since it is transparent in that case).
They are preparatory commits for
|
bfe7b88 ("clustermesh: correctly remove remoteCache on connection disruption") modified the handling of remote identities upon kvstore reconnection to fix a memory leak. Still, it didn't address the possibility that identities may be deleted in the time window between the termination of the previous connection and the establishment of a new one. If this happens, the stale identities will be removed when the map entry gets overwritten, but the corresponding deletion event will never be propagated to the rest of the agent processing logic. This commit fixes this issue draining all stale identities once the new cache for remote identities has initialized correctly. As a side effect, this makes `allocator.WatchRemoteIdentities` wait synchronously until all objects are listed for the first time, or the context is canceled (which also reflects the current behavior of JoinSharedStore). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
bfe7b88 ("clustermesh: correctly remove remoteCache on connection disruption") modified the handling of remote identities when a remote cluster is disconnected, removing the corresponding cache from the global allocator map. Still, it didn't emit a deletion event to propagate this information to the rest of the agent processing logic. This commit implements the missing logic to drain all remote identities when a remote kvstore is unregistered. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The watcherCache is currently implemented using a map with keys of type string, but the different methods accept []byte as argument. Given this cache will be used also in the subsequent commits to implement the RestartableWatcher, and the keys are actually of type string, let's switch them to string, to reduce the number of type conversions. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Let's store the scoped logger (i.e., including the backend and watcher information) as part of the watcher struct, so that it can be used also by the RestartableWatcher wrapper introduced in the subsequent commits. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
RestartableWatcher implements a wrapper around a KVStore watcher, automatically handling the generation of deletion events for stale keys during reconnections. It will be leveraged in subsequent commits to prevent that deletion events are missed when reconnecting to remote clusters. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, a new ipcache watcher is created for each connection to the kvstore backend of a remote cluster, and the old one is dropped. Yet, this approach leads to missing the deletion event for possible entries that are removed in the remote cluster during the reconnection process (added entries would instead be detected by the initial "list" operation). This commit fixes this issue moving to a single ipcache watcher per remote cluster, which is reused upon reconnection. Specifically, it now leverages the newly introduced RestartableWatcher wrapper to automatically trigger a deletion event for all previously known entries that are no longer present after a reconnection. Special handling is performed in case the ClusterID associated with the watcher is changed, draining all previously known entries as no longer valid (even if still present, they need to be observed again with the new ClusterID). Additionally, all ipcache entries are drained when disconnecting from a remote cluster, to properly clean-up the status without requiring to restart the agent. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit extends the current SharedStore implementation, enabling the possibility of re-establishing a new connection to a different kvstore backend while preserving the list of known local/shared keys, and transparently handling the emission of a deletion event for the shared keys that have been removed during reconnection (leveraging the RestartableWatcher wrapper). Additionally, the Release and Close operations are modified to synchronously wait for the termination of the synchronization loop before exiting. These new functionalities will be used in the subsequent commit to properly handle the deletion of stale nodes and services upon reconnection to the kvstore backend of a remote cluster. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, a new shared store synchronizing node and service information is created for each connection to the kvstore backend of a remote cluster, and the old one is dropped. Yet, this approach leads to missing the deletion event for possible entries that are removed in the remote cluster during the reconnection process (added entries would instead be detected by the initial "list" operation). This commit fixes this issue using a fixed instance of node and service stores per remote cluster, which are reused upon reconnection, while transparently handling the emission of the appropriate deletion events for all keys no longer present. Additionally, all keys are drained when disconnecting from a remote cluster, to properly clean-up the status without requiring to restart the agent. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
9a687c5
to
4a26d50
Compare
The current implementation is affected by a possible issue due to the fact that the completion of the relist operation alone is not enough to make sure that we are up-to-date (indeed, the clustermesh-apiserver might not yet have completed the synchronization from k8s CRs to the kvstore). Converting back to draft while working on that. |
This pull request has been automatically marked as stale because it |
Currently, restarting the connection to remote clusters leaves a time window in which possible deletion events are missed, and never recovered once the new watchers are started. Similarly, the status is not properly cleaned-up when removing one remote cluster from the clustermesh configuration. This PR fixes these two issues.
Please, refer to the individual commits for more details.
I've currently marked this PR as backport for the current stable version (i.e., v1.13). I've also marked it as
backport/author
due to the conflicts concerning IPCache modifications with #22935, which was not present in v1.13 and earlier.Fixes: #24740