-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
clustermesh: delete stale node/service entries on reconnect/disconnect
Currently, a new shared store synchronizing node and service information is created for each connection to the kvstore backend of a remote cluster, and the old one is dropped. Yet, this approach leads to missing the deletion event for possible entries that are removed in the remote cluster during the reconnection process (added entries would instead be detected by the initial "list" operation). This commit fixes this issue using a fixed instance of node and service stores per remote cluster, which are reused upon reconnection, while transparently handling the emission of the appropriate deletion events for all keys no longer present. To prevent reading an incomplete state when watching an ephemeral kvstore instance that has not yet been completely synchronized, the watch operation is started only once the sync canary for the given prefix is present, if support is enabled. In case the sync canary support is not enabled (e.g., because the remote cluster is running an older version of the clustermesh-apiserver), there is the possibility that non-stale keys are temporarily removed upon reconnection, causing brief connectivity disruptions. Yet, this is not different from what already happens today if the agent is restarted when the remote kvstore has not yet been fully synchronized. Additionally, this is not an issue when the remote kvstore is backed by persistent storage, since it is already synchronized. Alternatively, we might disable the deletion of stale entries if sync canaries are not supported, at the cost of leaking those entries until the agent is restarted. This behavior will be further detailed updating the release notes in a subsequent commit (it can be prevented upgrading first all clustermesh-apiservers and then the agents) after addressing the same problem affecting the ipcache and identity entries. Additionally, all keys are drained when disconnecting from a remote cluster, to properly clean-up the status without requiring to restart the agent. Differently, they are not drained when simply shutting down, to avoid breaking existing connections on restart. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
- Loading branch information
Showing
3 changed files
with
74 additions
and
61 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters