Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (ipcache entries) #25675

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented May 25, 2023

Follow up of #25499 targeting ipcache entries synchronization. Please refer to the above PR and the commit descriptions for additional information.

Related: #24740
Related: #25499

Fix missed deletion events when reconnecting to/disconnecting from remote clusters (ipcache entries) 

@giorio94 giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. sig/kvstore Impacts the KVStore package interactions. labels May 25, 2023
@giorio94 giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events-ipcache branch 3 times, most recently from 400a3d9 to e5d986f Compare May 26, 2023 12:24
This commit extends the IPIdentityPair struct implementing the store.Key
interface, to allow reading/writing it through the store abstractions.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, a new ipcache watcher is created for each connection to the
kvstore backend of a remote cluster, and the old one is dropped. Yet,
this approach leads to missing the deletion event for possible entries
that are removed in the remote cluster during the reconnection process
(added entries would instead be detected by the initial "list" operation).

This commit fixes this issue moving to a single ipcache watcher per
remote cluster, which is reused upon reconnection. Specifically, it now
leverages the newly introduced RestartableWatchStore to automatically
trigger a deletion event for all previously known entries that are no
longer present after a reconnection. Special handling is performed in
case the ClusterID associated with the watcher is changed, draining all
previously known entries as no longer valid (even if still present, they
need to be observed again with the new ClusterID).

Additionally, all ipcache entries are drained when disconnecting from a
remote cluster, to properly clean-up the status without requiring to
restart the agent. Differently, they are not drained when simply shutting
down, to avoid breaking existing connections on restart.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-reconnect-deletion-events-ipcache branch from e5d986f to 47c93e5 Compare June 1, 2023 10:40
@giorio94
Copy link
Member Author

giorio94 commented Jun 1, 2023

/test

@giorio94 giorio94 marked this pull request as ready for review June 1, 2023 10:41
@giorio94 giorio94 requested review from a team as code owners June 1, 2023 10:41
@giorio94 giorio94 requested a review from jrajahalme June 1, 2023 10:41
@giorio94 giorio94 added the release-blocker/1.14 This issue will prevent the release of the next version of Cilium. label Jun 1, 2023
@giorio94
Copy link
Member Author

giorio94 commented Jun 1, 2023

/ci-l4lb

Failed due to a flake which seems related to #24728 (comment)

@giorio94
Copy link
Member Author

giorio94 commented Jun 1, 2023

One of the Cilium Conformance E2E tests failed with #25823. Manually restarting

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite rusty on this code, but LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/kvstore Impacts the KVStore package interactions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants