New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add upgrade note about deletion of stale entries in clustermesh #26067
Conversation
9556b33
to
b7ae588
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side, thanks!
Please pull someone from the Clustermesh team to do another review, it looks like GH didn't assign anyone for some reason 🤔
Thanks for the heads up, I didn't notice that it hadn't worked. I've removed the review assignment and added it back, this time Yutaro got assigned. |
/test |
/test-runtime Hit unrelated flake: #23997 |
b7ae588
to
df94c99
Compare
Rebased onto main to pick the fixes for conformance-ginkgo |
/test |
/test-1.26-net-next Hit #24964 |
150de13 ("clustermesh: delete stale node/service entries on reconnect/disconnect"), along with the followup commits targeting ipcache entries and identities modified the cilium agents behavior to automatically clean up stale information after reconnecting to a given remote kvstore. This was needed to fix the issue described in cilium#24740. The behavior differs based on the remote version of the clustermesh-apiserver though. Indeed, newer versions support "sync canaries" to convey that the synchronization from k8s to the kvstore completed, while older ones don't. When sync canaries are not supported, the agents will trigger the deletion of stale entries once the corresponding etcd list operation completed: this might lead to the removal of valid entries if that information had not yet been synchronized from k8s to the kvstore, causing a temporary connectivity disruption (until that is then synchronized and propagated again to the agents). This commit extends the upgrade notes to detail this behavior and the implication. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
df94c99
to
c5890df
Compare
Rebased onto main to fix conflicts |
/test |
/ci-aks Hit #26075 |
Reviews are in. Marking as ready to merge given that this is a docs-only PR, and the missing tests were skipped. |
#25499, along with the followups targeting ipcache entries (#25675) and identities (#25677) modified the cilium agents behavior to automatically clean up stale information after reconnecting to a given remote kvstore. This was needed to fix the issue described in #24740.
The behavior differs based on the remote version of the clustermesh-apiserver though. Indeed, newer versions support "sync canaries" to convey that the synchronization from k8s to the kvstore completed, while older ones don't. When sync canaries are not supported, the agents will trigger the deletion of stale entries once the corresponding etcd list operation completed: this might lead to the removal of valid entries if that information had not yet been synchronized from k8s to the kvstore, causing a temporary connectivity disruption (until that is then synchronized and propagated again to the agents). This commit extends the upgrade notes to detail this behavior and the implication.