Skip to content

Commit

Permalink
docs: add upgrade note about deletion of stale entries in clustermesh
Browse files Browse the repository at this point in the history
150de13 ("clustermesh: delete stale node/service entries on
reconnect/disconnect"), along with the followup commits targeting
ipcache entries and identities modified the cilium agents behavior to
automatically clean up stale information after reconnecting to a given
remote kvstore. This was needed to fix the issue described in #24740.

The behavior differs based on the remote version of the
clustermesh-apiserver though. Indeed, newer versions support "sync
canaries" to convey that the synchronization from k8s to the kvstore
completed, while older ones don't. When sync canaries are not supported,
the agents will trigger the deletion of stale entries once the
corresponding etcd list operation completed: this might lead to the
removal of valid entries if that information had not yet been
synchronized from k8s to the kvstore, causing a temporary connectivity
disruption (until that is then synchronized and propagated again to the
agents). This commit extends the upgrade notes to detail this behavior
and the implication.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
  • Loading branch information
giorio94 committed Jun 15, 2023
1 parent f014717 commit c5890df
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 0 deletions.
9 changes: 9 additions & 0 deletions Documentation/operations/upgrade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,15 @@ Annotations:
Egress rules in CiliumNetworkPolicy CRD. The old attribute name is no longer supported,
please update your CiliumNetworkPolicy CRD accordingly. Also applicable values for this
attribute are changed to ``disabled``, ``required`` and ``test-always-fail``.
* Cilium agents now automatically clean up possible stale information about meshed
clusters after reconnecting to the corresponding remote kvstores (see :gh-issue:`24740`
for the rationale behind this change). This might lead to brief connectivity disruptions
towards remote pods and global services when v1.14 Cilium agents connect to older
versions of the *clustermesh-apiserver*, and the *clustermesh-apiserver* is restarted.
Please upgrade the *clustermesh-apiserver* in all clusters before the Cilium agents
to prevent the possibility of connectivity disruptions. Note: this issue does not
affect setups using a persistent etcd cluster instead of the ephemeral one bundled
with the *clustermesh-apiserver*.

Removed Options
~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions Documentation/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -664,6 +664,7 @@ kubernetes
kubespray
kvstore
kvstoremesh
kvstores
labelsContext
latencies
lbExternalClusterIP
Expand Down

0 comments on commit c5890df

Please sign in to comment.