docs: add upgrade note about deletion of stale entries in clustermesh

150de13 ("clustermesh: delete stale node/service entries on reconnect/disconnect"), along with the followup commits targeting ipcache entries and identities modified the cilium agents behavior to automatically clean up stale information after reconnecting to a given remote kvstore. This was needed to fix the issue described in #24740. The behavior differs based on the remote version of the clustermesh-apiserver though. Indeed, newer versions support "sync canaries" to convey that the synchronization from k8s to the kvstore completed, while older ones don't. When sync canaries are not supported, the agents will trigger the deletion of stale entries once the corresponding etcd list operation completed: this might lead to the removal of valid entries if that information had not yet been synchronized from k8s to the kvstore, causing a temporary connectivity disruption (until that is then synchronized and propagated again to the agents). This commit extends the upgrade notes to detail this behavior and the implication. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
cilium · Jun 15, 2023 · a6720f1 · joestringer · Jul 21, 2023 · giorio94
1 parent 9b18e1c
commit a6720f1
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 0 deletions.
diff --git a/Documentation/operations/upgrade.rst b/Documentation/operations/upgrade.rst
@@ -333,6 +333,15 @@ Annotations:
   Egress rules in CiliumNetworkPolicy CRD. The old attribute name is no longer supported,
   please update your CiliumNetworkPolicy CRD accordingly. Also applicable values for this
   attribute are changed to ``disabled``, ``required`` and ``test-always-fail``.
+* Cilium agents now automatically clean up possible stale information about meshed
+  clusters after reconnecting to the corresponding remote kvstores (see :gh-issue:`24740`
+  for the rationale behind this change). This might lead to brief connectivity disruptions
+  towards remote pods and global services when v1.14 Cilium agents connect to older
+  versions of the *clustermesh-apiserver*, and the *clustermesh-apiserver* is restarted.
+  Please upgrade the *clustermesh-apiserver* in all clusters before the Cilium agents
+  to prevent the possibility of connectivity disruptions. Note: this issue does not
+  affect setups using a persistent etcd cluster instead of the ephemeral one bundled
+  with the *clustermesh-apiserver*.
 
 Removed Options
 ~~~~~~~~~~~~~~~

diff --git a/Documentation/spelling_wordlist.txt b/Documentation/spelling_wordlist.txt
@@ -665,6 +665,7 @@ kubernetes
 kubespray
 kvstore
 kvstoremesh
+kvstores
 labelsContext
 latencies
 lbExternalClusterIP