Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add upgrade note about deletion of stale entries in clustermesh #26067

Merged
merged 1 commit into from Jun 15, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Jun 9, 2023

#25499, along with the followups targeting ipcache entries (#25675) and identities (#25677) modified the cilium agents behavior to automatically clean up stale information after reconnecting to a given remote kvstore. This was needed to fix the issue described in #24740.

The behavior differs based on the remote version of the clustermesh-apiserver though. Indeed, newer versions support "sync canaries" to convey that the synchronization from k8s to the kvstore completed, while older ones don't. When sync canaries are not supported, the agents will trigger the deletion of stale entries once the corresponding etcd list operation completed: this might lead to the removal of valid entries if that information had not yet been synchronized from k8s to the kvstore, causing a temporary connectivity disruption (until that is then synchronized and propagated again to the agents). This commit extends the upgrade notes to detail this behavior and the implication.

@giorio94 giorio94 added area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels Jun 9, 2023
@giorio94 giorio94 requested a review from a team June 9, 2023 09:23
@giorio94 giorio94 requested a review from a team as a code owner June 9, 2023 09:23
@giorio94 giorio94 requested a review from qmonnet June 9, 2023 09:23
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side, thanks!

Please pull someone from the Clustermesh team to do another review, it looks like GH didn't assign anyone for some reason 🤔

@giorio94 giorio94 requested review from a team and YutaroHayakawa and removed request for a team June 9, 2023 15:58
@giorio94
Copy link
Member Author

giorio94 commented Jun 9, 2023

Please pull someone from the Clustermesh team to do another review, it looks like GH didn't assign anyone for some reason thinking

Thanks for the heads up, I didn't notice that it hadn't worked. I've removed the review assignment and added it back, this time Yutaro got assigned.

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

giorio94 commented Jun 12, 2023

/test-runtime

Hit unrelated flake: #23997

@giorio94
Copy link
Member Author

Rebased onto main to pick the fixes for conformance-ginkgo

@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 added the release-blocker/1.14 This issue will prevent the release of the next version of Cilium. label Jun 13, 2023
@giorio94
Copy link
Member Author

giorio94 commented Jun 13, 2023

/test-1.26-net-next

Hit #24964

150de13 ("clustermesh: delete stale node/service entries on
reconnect/disconnect"), along with the followup commits targeting
ipcache entries and identities modified the cilium agents behavior to
automatically clean up stale information after reconnecting to a given
remote kvstore. This was needed to fix the issue described in cilium#24740.

The behavior differs based on the remote version of the
clustermesh-apiserver though. Indeed, newer versions support "sync
canaries" to convey that the synchronization from k8s to the kvstore
completed, while older ones don't. When sync canaries are not supported,
the agents will trigger the deletion of stale entries once the
corresponding etcd list operation completed: this might lead to the
removal of valid entries if that information had not yet been
synchronized from k8s to the kvstore, causing a temporary connectivity
disruption (until that is then synchronized and propagated again to the
agents). This commit extends the upgrade notes to detail this behavior
and the implication.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

Rebased onto main to fix conflicts

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

giorio94 commented Jun 15, 2023

/ci-aks

Hit #26075

@YutaroHayakawa YutaroHayakawa requested review from YutaroHayakawa and removed request for YutaroHayakawa June 15, 2023 12:27
@giorio94
Copy link
Member Author

Reviews are in. Marking as ready to merge given that this is a docs-only PR, and the missing tests were skipped.

@giorio94 giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 15, 2023
@joestringer joestringer merged commit a6720f1 into cilium:main Jun 15, 2023
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants