Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services) #25499

giorio94 · 2023-05-17T08:23:24Z

Currently, restarting the connection to remote clusters leaves a time window in which possible deletion events are missed, and never recovered once the new watchers are started. Similarly, the status is not properly cleaned-up when removing one remote cluster from the clustermesh configuration.

This PR builds on top of #25388 and addresses this issue for nodes and services (identities and ipcache entries will be handled in a followup PR). When connecting to a remote cluster that supports synced canaries, the initial prefix list is postponed until that prefix is known to be synchronized. This ensures that we observe an up-to-date state (which would not be otherwise guaranteed when the remote etcd instance is ephemeral).

Differently, when synced canaries are not supported (e.g., because the remote cluster is running an older version of the clustermesh-apiserver), we perform listing immediately. In this case, there is the possibility that non-stale keys are temporarily removed upon reconnection, causing brief connectivity disruptions. Yet, this is not different from what already happens today if the agent is restarted when the remote kvstore has not yet been fully synchronized. Additionally, this is not an issue when the remote kvstore is backed by persistent storage, since it is already synchronized. Alternatively, we might disable the deletion of stale entries if sync canaries are not supported, at the cost of leaking those entries until the agent is restarted. This behavior (if agreed upon) will be further detailed updating the release notes in a subsequent commit (it can be prevented upgrading first all clustermesh-apiservers and then the agents) after addressing the same problem affecting the ipcache and identity entries.

Related: #25388
Related: #24740

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services)

giorio94 · 2023-05-25T09:26:01Z

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing With host policy Tests NodePort

Failure Output

FAIL: Can not connect to service "http://[fd04::12]:31293" from outside cluster (1/10)

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/171/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

pkg/kvstore/store/watchstore.go

giorio94 · 2023-05-26T07:18:51Z

/test

pkg/kvstore/store/watchstore.go

squeed · 2023-05-30T10:22:47Z

pkg/kvstore/store/watchstoremgr.go

+// This ensures that the synchronization of the keys hosted under the given prefix
+// have been successfully synchronized from the external source, even in case an
+// ephemeral kvstore is used.
+func NewWatchStoreManagerSync(backend WatchStoreBackend, clusterName string) WatchStoreManager {


Is a separate type necessary, or could you add the synchronization canary aspect in to the RestartableWatchStore?

I'm personally in favor of the separate type, which IMO provides better separation of concerns (it would transparently work also with different implementations of the WatchStore interface) and simplifies using alternative approaches in a simpler way compared to directly embedding the logic in the RestartableWatchStore. Additionally, it also reduces the number of watchers compared to having each RestartableWatchStore waiting for its own prefix.

pkg/clustermesh/remote_cluster.go

aanm · 2023-05-30T21:48:26Z

pkg/kvstore/store/watchstore.go

+// when no watch operation is in progress.
+func (rws *restartableWatchStore) Drain() {
+	if rws.watching.Swap(true) {
+		rws.log.Panic("Cannot drain the watch store while still running")


Small nit, should we panic? 🤔 Maybe yes...

The idea of being strict here was to easily catch possible misuses, since draining it while actively watching a kvstore would lead to inconsistent behaviors (both due to the meaning of the operation itself and being the local state not protected by a mutex for the sake of simplicity).

They Key interface represents the basic operations to enable inserting key-value pairs into and retrieve them from a kvstore, mainly concerning marshalling and unmarshalling. Yet, unmarshalling was performed based on the content of the value only, which makes it impossible to reconstruct the full object when the key is not part of the value itself (e.g., for identity-label pairs). This commit extends the Unmarshal method to additionally take the key (stripped of the store prefix) as parameter, and adapts the existing implementations to ignore it. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

This commit introduces a new simple implementation of the LocalKey interface used to store plain key-value pairs and retrieve them from the kvstore. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

This commit introduces the WatchStore abstraction, which models the operations required to synchronize key/value pairs from a kvstore, and represents a simplified "read-only" version of the SharedStore abstraction currently available. The implementation which is provided in this commit supports restarting the watch operation multiple times, automatically handling the emission of deletion events for all stale entries. This allows to correctly handle the reconnection procedure when the kvstore client needs to be recreated (e.g., due to a change in the parameters, or a temporary failure), avoiding to leak the entries that have been removed during the connection outage. It additionally supports a Drain operation, to trigger the emission of a deletion event for all currently known entries (e.g., for cleanup purposes). Notably, the RestartableWatchStore implementation conveys the synchronization status through the `kvstore_initial_sync_completed` metric, which is updated at every restart. It is labeled by `scope` (matching the same format adopted for the other kvstore metrics), `source_cluster` (identifying the source of the information) and `action` (either write or read, the WatchStore always does read operations). Additionally, it is possible to specify an additional metric which is automatically updated with the number of synchronized entries. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

This commit introduces the WatchStoreManager abstraction, which enables to register a set of functions to be asynchronously executed when the corresponding kvstore prefixes are synchronized. Two different implementations are provided. The first leverages the sync canaries to detect that the entries under a given prefix have been correctly synchronized from an external source. The second one, instead, is provided for backward compatibility and immediately starts all registered functions. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Currently, a new shared store synchronizing node and service information is created for each connection to the kvstore backend of a remote cluster, and the old one is dropped. Yet, this approach leads to missing the deletion event for possible entries that are removed in the remote cluster during the reconnection process (added entries would instead be detected by the initial "list" operation). This commit fixes this issue using a fixed instance of node and service stores per remote cluster, which are reused upon reconnection, while transparently handling the emission of the appropriate deletion events for all keys no longer present. To prevent reading an incomplete state when watching an ephemeral kvstore instance that has not yet been completely synchronized, the watch operation is started only once the sync canary for the given prefix is present, if support is enabled. In case the sync canary support is not enabled (e.g., because the remote cluster is running an older version of the clustermesh-apiserver), there is the possibility that non-stale keys are temporarily removed upon reconnection, causing brief connectivity disruptions. Yet, this is not different from what already happens today if the agent is restarted when the remote kvstore has not yet been fully synchronized. Additionally, this is not an issue when the remote kvstore is backed by persistent storage, since it is already synchronized. Alternatively, we might disable the deletion of stale entries if sync canaries are not supported, at the cost of leaking those entries until the agent is restarted. This behavior will be further detailed updating the release notes in a subsequent commit (it can be prevented upgrading first all clustermesh-apiservers and then the agents) after addressing the same problem affecting the ipcache and identity entries. Additionally, all keys are drained when disconnecting from a remote cluster, to properly clean-up the status without requiring to restart the agent. Differently, they are not drained when simply shutting down, to avoid breaking existing connections on restart. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 · 2023-05-31T14:37:17Z

Rebased onto main to fix a conflict.

giorio94 · 2023-05-31T14:37:26Z

@squeed PTAL

giorio94 · 2023-05-31T14:38:20Z

/test

giorio94 · 2023-06-01T07:01:40Z

/ci-eks

giorio94 · 2023-06-01T07:33:45Z

Cilium Runtime (privileged) hit #22373. Manually rerunning

giorio94 · 2023-06-01T09:07:51Z

All checks are green, and team reviews are in. Marking as ready-to-merge

julianwiedmann · 2023-06-01T09:57:43Z

👋 as this is release-note/bug, do we need backports?

giorio94 · 2023-06-01T10:03:37Z

wave as this is release-note/bug, do we need backports?

I personally think that backporting is quite risky, since it is a fairly large change which also depends on changes on the clustermesh-apiserver side, plus the follow-ups for identities and ipcaches. I'd be more comfortable not marking it as needs-backport for the moment, and re-evaluate once all pieces are in.

giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. sig/kvstore Impacts the KVStore package interactions. labels May 17, 2023

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events-nodes-services branch 2 times, most recently from 6b78207 to 78f71b4 Compare May 22, 2023 12:42

giorio94 mentioned this pull request May 23, 2023

clustermesh-apiserver: expose information about completion of initial synchronization through etcd #25388

Merged

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events-nodes-services branch 2 times, most recently from 11e39ee to e146c1e Compare May 25, 2023 09:23

giorio94 requested a review from marseel May 25, 2023 09:25

giorio94 marked this pull request as ready for review May 25, 2023 09:25

giorio94 requested review from a team as code owners May 25, 2023 09:25

giorio94 requested review from jibi, squeed, aanm and YutaroHayakawa May 25, 2023 09:25

This was referenced May 25, 2023

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (ipcache entries) #25675

Merged

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (identities) #25677

Merged

YutaroHayakawa reviewed May 26, 2023

View reviewed changes

pkg/kvstore/store/watchstore.go Outdated Show resolved Hide resolved

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events-nodes-services branch from e146c1e to 9bb03d8 Compare May 26, 2023 06:57

giorio94 mentioned this pull request May 26, 2023

clustermesh-apiserver: rework identities, endpoints and nodes synchronization to improve performance #25049

Merged

giorio94 requested a review from YutaroHayakawa May 26, 2023 07:18

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events-nodes-services branch from 9bb03d8 to 0684e3b Compare May 26, 2023 08:26

squeed reviewed May 30, 2023

View reviewed changes

pkg/kvstore/store/watchstore.go Show resolved Hide resolved

squeed reviewed May 30, 2023

View reviewed changes

pkg/kvstore/store/watchstore.go Outdated Show resolved Hide resolved

squeed reviewed May 30, 2023

View reviewed changes

pkg/kvstore/store/watchstore.go Show resolved Hide resolved

squeed reviewed May 30, 2023

View reviewed changes

pkg/clustermesh/remote_cluster.go Outdated Show resolved Hide resolved

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events-nodes-services branch from d84b7f4 to ef4d950 Compare May 30, 2023 11:46

giorio94 requested a review from squeed May 30, 2023 15:21

aanm approved these changes May 30, 2023

View reviewed changes

giorio94 added 5 commits May 31, 2023 16:34

kvstore: introduce KVPair implementing the LocalKey interface

dee3d1b

This commit introduces a new simple implementation of the LocalKey interface used to store plain key-value pairs and retrieve them from the kvstore. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events-nodes-services branch from ef4d950 to ff36250 Compare May 31, 2023 14:37

giorio94 mentioned this pull request Jun 1, 2023

CI: conformance-test-ipv6: invalid nodeport service - provided port is already allocated #25812

Closed

squeed approved these changes Jun 1, 2023

View reviewed changes

giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 1, 2023

julianwiedmann merged commit 150de13 into cilium:main Jun 1, 2023
61 checks passed

This was referenced Jun 5, 2023

Fix missed deletion events when reconnecting to/disconnecting from remote clusters #24924

Closed

docs: add upgrade note about deletion of stale entries in clustermesh #26067

Merged

giorio94 mentioned this pull request Jun 14, 2023

clustermesh: deletion events can be missed when restarting the connection to remote etcd #24740

Closed

2 tasks

marseel mentioned this pull request Jun 28, 2023

Expose initial synchronization of ClusterMesh data in Cilium Status #26532

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services) #25499

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services) #25499

giorio94 commented May 17, 2023 •

edited

giorio94 commented May 25, 2023 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

giorio94 commented May 26, 2023

squeed May 30, 2023

giorio94 May 30, 2023

aanm May 30, 2023

giorio94 May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented Jun 1, 2023

giorio94 commented Jun 1, 2023

giorio94 commented Jun 1, 2023 •

edited

julianwiedmann commented Jun 1, 2023

giorio94 commented Jun 1, 2023

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services) #25499

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services) #25499

Conversation

giorio94 commented May 17, 2023 • edited

giorio94 commented May 25, 2023 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

giorio94 commented May 26, 2023

squeed May 30, 2023

Choose a reason for hiding this comment

giorio94 May 30, 2023

Choose a reason for hiding this comment

aanm May 30, 2023

Choose a reason for hiding this comment

giorio94 May 31, 2023

Choose a reason for hiding this comment

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented Jun 1, 2023

giorio94 commented Jun 1, 2023

giorio94 commented Jun 1, 2023 • edited

julianwiedmann commented Jun 1, 2023

giorio94 commented Jun 1, 2023

giorio94 commented May 17, 2023 •

edited

giorio94 commented May 25, 2023 •

edited by maintainer-s-little-helper bot

giorio94 commented Jun 1, 2023 •

edited