New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services) #25499
Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services) #25499
Conversation
6b78207
to
78f71b4
Compare
11e39ee
to
e146c1e
Compare
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/171/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
e146c1e
to
9bb03d8
Compare
/test |
9bb03d8
to
0684e3b
Compare
// This ensures that the synchronization of the keys hosted under the given prefix | ||
// have been successfully synchronized from the external source, even in case an | ||
// ephemeral kvstore is used. | ||
func NewWatchStoreManagerSync(backend WatchStoreBackend, clusterName string) WatchStoreManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a separate type necessary, or could you add the synchronization canary aspect in to the RestartableWatchStore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally in favor of the separate type, which IMO provides better separation of concerns (it would transparently work also with different implementations of the WatchStore interface) and simplifies using alternative approaches in a simpler way compared to directly embedding the logic in the RestartableWatchStore. Additionally, it also reduces the number of watchers compared to having each RestartableWatchStore waiting for its own prefix.
d84b7f4
to
ef4d950
Compare
// when no watch operation is in progress. | ||
func (rws *restartableWatchStore) Drain() { | ||
if rws.watching.Swap(true) { | ||
rws.log.Panic("Cannot drain the watch store while still running") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, should we panic? 🤔 Maybe yes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of being strict here was to easily catch possible misuses, since draining it while actively watching a kvstore would lead to inconsistent behaviors (both due to the meaning of the operation itself and being the local state not protected by a mutex for the sake of simplicity).
They Key interface represents the basic operations to enable inserting key-value pairs into and retrieve them from a kvstore, mainly concerning marshalling and unmarshalling. Yet, unmarshalling was performed based on the content of the value only, which makes it impossible to reconstruct the full object when the key is not part of the value itself (e.g., for identity-label pairs). This commit extends the Unmarshal method to additionally take the key (stripped of the store prefix) as parameter, and adapts the existing implementations to ignore it. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit introduces a new simple implementation of the LocalKey interface used to store plain key-value pairs and retrieve them from the kvstore. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit introduces the WatchStore abstraction, which models the operations required to synchronize key/value pairs from a kvstore, and represents a simplified "read-only" version of the SharedStore abstraction currently available. The implementation which is provided in this commit supports restarting the watch operation multiple times, automatically handling the emission of deletion events for all stale entries. This allows to correctly handle the reconnection procedure when the kvstore client needs to be recreated (e.g., due to a change in the parameters, or a temporary failure), avoiding to leak the entries that have been removed during the connection outage. It additionally supports a Drain operation, to trigger the emission of a deletion event for all currently known entries (e.g., for cleanup purposes). Notably, the RestartableWatchStore implementation conveys the synchronization status through the `kvstore_initial_sync_completed` metric, which is updated at every restart. It is labeled by `scope` (matching the same format adopted for the other kvstore metrics), `source_cluster` (identifying the source of the information) and `action` (either write or read, the WatchStore always does read operations). Additionally, it is possible to specify an additional metric which is automatically updated with the number of synchronized entries. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit introduces the WatchStoreManager abstraction, which enables to register a set of functions to be asynchronously executed when the corresponding kvstore prefixes are synchronized. Two different implementations are provided. The first leverages the sync canaries to detect that the entries under a given prefix have been correctly synchronized from an external source. The second one, instead, is provided for backward compatibility and immediately starts all registered functions. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, a new shared store synchronizing node and service information is created for each connection to the kvstore backend of a remote cluster, and the old one is dropped. Yet, this approach leads to missing the deletion event for possible entries that are removed in the remote cluster during the reconnection process (added entries would instead be detected by the initial "list" operation). This commit fixes this issue using a fixed instance of node and service stores per remote cluster, which are reused upon reconnection, while transparently handling the emission of the appropriate deletion events for all keys no longer present. To prevent reading an incomplete state when watching an ephemeral kvstore instance that has not yet been completely synchronized, the watch operation is started only once the sync canary for the given prefix is present, if support is enabled. In case the sync canary support is not enabled (e.g., because the remote cluster is running an older version of the clustermesh-apiserver), there is the possibility that non-stale keys are temporarily removed upon reconnection, causing brief connectivity disruptions. Yet, this is not different from what already happens today if the agent is restarted when the remote kvstore has not yet been fully synchronized. Additionally, this is not an issue when the remote kvstore is backed by persistent storage, since it is already synchronized. Alternatively, we might disable the deletion of stale entries if sync canaries are not supported, at the cost of leaking those entries until the agent is restarted. This behavior will be further detailed updating the release notes in a subsequent commit (it can be prevented upgrading first all clustermesh-apiservers and then the agents) after addressing the same problem affecting the ipcache and identity entries. Additionally, all keys are drained when disconnecting from a remote cluster, to properly clean-up the status without requiring to restart the agent. Differently, they are not drained when simply shutting down, to avoid breaking existing connections on restart. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
ef4d950
to
ff36250
Compare
Rebased onto main to fix a conflict. |
@squeed PTAL |
/test |
/ci-eks |
Cilium Runtime (privileged) hit #22373. Manually rerunning |
All checks are green, and team reviews are in. Marking as ready-to-merge |
👋 as this is |
I personally think that backporting is quite risky, since it is a fairly large change which also depends on changes on the clustermesh-apiserver side, plus the follow-ups for identities and ipcaches. I'd be more comfortable not marking it as needs-backport for the moment, and re-evaluate once all pieces are in. |
Currently, restarting the connection to remote clusters leaves a time window in which possible deletion events are missed, and never recovered once the new watchers are started. Similarly, the status is not properly cleaned-up when removing one remote cluster from the clustermesh configuration.
This PR builds on top of #25388 and addresses this issue for nodes and services (identities and ipcache entries will be handled in a followup PR). When connecting to a remote cluster that supports synced canaries, the initial prefix list is postponed until that prefix is known to be synchronized. This ensures that we observe an up-to-date state (which would not be otherwise guaranteed when the remote etcd instance is ephemeral).
Differently, when synced canaries are not supported (e.g., because the remote cluster is running an older version of the clustermesh-apiserver), we perform listing immediately. In this case, there is the possibility that non-stale keys are temporarily removed upon reconnection, causing brief connectivity disruptions. Yet, this is not different from what already happens today if the agent is restarted when the remote kvstore has not yet been fully synchronized. Additionally, this is not an issue when the remote kvstore is backed by persistent storage, since it is already synchronized. Alternatively, we might disable the deletion of stale entries if sync canaries are not supported, at the cost of leaking those entries until the agent is restarted. This behavior (if agreed upon) will be further detailed updating the release notes in a subsequent commit (it can be prevented upgrading first all clustermesh-apiservers and then the agents) after addressing the same problem affecting the ipcache and identity entries.
Related: #25388
Related: #24740