clustermesh: allow waiting for the CiliumClusterConfig to appear when required #25671

giorio94 · 2023-05-25T11:55:39Z

Reporting the description of the first commit for convenience:

857060bbcf07 ("clustermesh: Introduce CiliumClusterConfig") introduced
the CiliumClusterConfig, which the clustermesh-apiserver stores into the
kvstore during startup, and the remote cilium agents read when connecting
to new clusters. Cilium agents tolerate the possible lack of the cluster
configuration, to account for backward compatibility.

This process currently suffers from the possibility of a race condition,
because the agents might connect to the kvstore before that the cluster
configuration has been actually written. Although as of today it doesn't
raise particular concerns given that it is only used for sanity checks,
the issue will be more serious when depending on the information therein
contained to tune the behavior. One such case concerns the upcoming
kvstoremesh approach (caching the information of remote clusters in
the local kvstore), which leverages different prefixes for separation
(hence requiring the cluster config to be present for disambiguation)
and increases the race condition window (since the cluster configuration
of new clusters would be written at runtime rather than on startup).

This commit introduces a new well-known key (`cilium/.has-cilium-config`)
that, when present, conveys that the cluster configuration will be
eventually created, and remote clusters shall wait until it is present
rather than falling back to the legacy approach. This key will be
created by the clustermesh-apiserver init container, ensuring that it is
always present when etcd is exposed to the remote agents.

Alternative approaches that have been considered, but discarded because
they would solve the problem for vanilla clustermesh but not for
kvstoremesh include:
* Setting the CiliumClusterConfig in the clustermesh-apiserver init
  container rather than during the initialization phase.
* Leveraging a readiness probe to set the clustermesh-apiserver as ready
  only once the initialization phase completed, and the configuration
  has been written.

Another possibility might involve being strict and making the cluster configuration mandatory starting from v1.14. This looks a no-go to me though, because:

It would prevent a cluster Cilium v1.14 to mesh with another running Cilium v1.12 (I couldn't find any official documentation about the maximum supported skew in the clustermesh scenario to say whether this is acceptable or not).
AFAIK. we are not currently setting the cluster configuration when the clustermesh-apiserver is not used (i.e., Cilium is run with the kvstore backend). In this scenario, even a cluster running Cilium v1.13 would be incompatible (unless the cluster config is created manually before upgrading the other clusters).

/cc @YutaroHayakawa, @marseel, @aanm, @joestringer for opinions.

clustermesh: allow waiting for the CiliumClusterConfig to appear when required

pkg/clustermesh/clustermesh.go

pkg/clustermesh/remote_cluster.go

giorio94 · 2023-05-26T06:47:14Z

/test

giorio94 · 2023-05-26T11:51:37Z

Force-pushed to rebased onto main and fix a conflict.

giorio94 · 2023-05-26T11:51:44Z

/test

sayboras

The explanation in the commit message is just awesome 💯, the overall changes make sense to me ✔️

YutaroHayakawa

Now looks good to me. Thanks!

giorio94 · 2023-05-29T12:07:47Z

/ci-eks

aanm

Approving because my changes are not blocking but would be nice to understand the reason behind the choice of wait.PollImmediateInfiniteWithContext

aanm · 2023-05-30T21:41:46Z

pkg/clustermesh/remote_cluster.go

+	// controller backoff period to avoid recreating every time a new connection to the remote
+	// kvstore, which would introduce an unnecessary overhead. Still, we do return in case of
+	// consecutive failures, to ensure that we do not retry forever is something strange happened.
+	err = wait.PollImmediateInfiniteWithContext(ctx, clusterConfigPollInterval, func(ctx context.Context) (done bool, err error) {


Any reason why we don't use our controllers for this? They have the benefit of showing up in cilium status if something fails. With wait.PollImmediateInfiniteWithContext the failures will get hidden AFAICT.

Essentially, the remote cluster logic is already wrapped by a controller. The idea here was to have a retry mechanism which doesn't cause that controller to restart (by simply returning the error there), because it would also cause the restart of the etcd connection, introducing an unnecessary overhead. But yeah, it makes sense to use another controller also for that rather than wait.PollImmediateInfiniteWithContext.

I've updated the PR to use the controller. @aanm PTAL

giorio94 · 2023-05-31T07:08:53Z

Removing the ready-to-merge label while addressing the above suggestion.

857060b ("clustermesh: Introduce CiliumClusterConfig") introduced the CiliumClusterConfig, which the clustermesh-apiserver stores into the kvstore during startup, and the remote cilium agents read when connecting to new clusters. Cilium agents tolerate the possible lack of the cluster configuration, to account for backward compatibility. This process currently suffers from the possibility of a race condition, because the agents might connect to the kvstore before that the cluster configuration has been actually written. Although as of today it doesn't raise particular concerns given that it is only used for sanity checks, the issue will be more serious when depending on the information therein contained to tune the behavior. One such case concerns the upcoming kvstoremesh approach (caching the information of remote clusters in the local kvstore), which leverages different prefixes for separation (hence requiring the cluster config to be present for disambiguation) and increases the race condition window (since the cluster configuration of new clusters would be written at runtime rather than on startup). This commit introduces a new well-known key (`cilium/.has-cilium-config`) that, when present, conveys that the cluster configuration will be eventually created, and remote clusters shall wait until it is present rather than falling back to the legacy approach. This key will be created by the clustermesh-apiserver init container, ensuring that it is always present when etcd is exposed to the remote agents. Alternative approaches that have been considered, but discarded because they would solve the problem for vanilla clustermesh but not for kvstoremesh include: * Setting the CiliumClusterConfig in the clustermesh-apiserver init container rather than during the initialization phase. * Leveraging a readiness probe to set the clustermesh-apiserver as ready only once the initialization phase completed, and the configuration has been written. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Create the `cilium/.has-cluster-config` key through the clustermesh-apiserver init container, to ensure that it is always present when the remote agents connect to etcd. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Leverage the newly introduced `.has-cluster-config` key to retrigger the retrieval of the cluster configuration in case it was not found, but it is known that it will be eventually present. This removes the possibility of race conditions if an agent connects to the kvstore before that the corresponding cluster configuration has been written. Additionally, the cluster configuration can also be forced as mandatory through a dedicated function parameter, when that is known agent-side because an enabled function requires it to be always present. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

Let's propagate the context from the called to these helper functions, rather than using context.Background(). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 · 2023-05-31T08:29:23Z

Rebased onto main to pick the last changes and allow the Cilium Runtime checks to run successfully.

giorio94 · 2023-05-31T08:29:32Z

/test

giorio94 · 2023-05-31T11:53:13Z

/test-1.16-4.19

Rerunning, as failed with a timeout

giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels May 25, 2023

giorio94 requested review from a team as code owners May 25, 2023 11:55

giorio94 requested review from aanm, YutaroHayakawa and sayboras May 25, 2023 11:55

YutaroHayakawa reviewed May 26, 2023

View reviewed changes

pkg/clustermesh/clustermesh.go Outdated Show resolved Hide resolved

YutaroHayakawa reviewed May 26, 2023

View reviewed changes

pkg/clustermesh/remote_cluster.go Outdated Show resolved Hide resolved

giorio94 force-pushed the mio/wait-for-cluster-config branch from 2e34c46 to e77c108 Compare May 26, 2023 06:44

giorio94 force-pushed the mio/wait-for-cluster-config branch from e77c108 to c6887f2 Compare May 26, 2023 11:51

sayboras approved these changes May 27, 2023

View reviewed changes

YutaroHayakawa approved these changes May 29, 2023

View reviewed changes

aanm approved these changes May 30, 2023

View reviewed changes

maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels May 30, 2023

giorio94 removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023

giorio94 added dont-merge/bad-bot To prevent MLH from marking ready-to-merge. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels May 31, 2023

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023

giorio94 force-pushed the mio/wait-for-cluster-config branch from c6887f2 to c888195 Compare May 31, 2023 07:57

giorio94 removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023

giorio94 added 4 commits May 31, 2023 10:29

clustermesh: propagate context to Get/SetClusterConfig functions

28a7cf6

Let's propagate the context from the called to these helper functions, rather than using context.Background(). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 force-pushed the mio/wait-for-cluster-config branch from c888195 to 28a7cf6 Compare May 31, 2023 08:29

aanm approved these changes May 31, 2023

View reviewed changes

giorio94 removed the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label May 31, 2023

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023

julianwiedmann merged commit b46e154 into cilium:main May 31, 2023
61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clustermesh: allow waiting for the CiliumClusterConfig to appear when required #25671

clustermesh: allow waiting for the CiliumClusterConfig to appear when required #25671

giorio94 commented May 25, 2023

giorio94 commented May 26, 2023

giorio94 commented May 26, 2023

giorio94 commented May 26, 2023

sayboras left a comment

YutaroHayakawa left a comment

giorio94 commented May 29, 2023

aanm left a comment

aanm May 30, 2023

giorio94 May 31, 2023

giorio94 May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023 •

edited

clustermesh: allow waiting for the CiliumClusterConfig to appear when required #25671

clustermesh: allow waiting for the CiliumClusterConfig to appear when required #25671

Conversation

giorio94 commented May 25, 2023

giorio94 commented May 26, 2023

giorio94 commented May 26, 2023

giorio94 commented May 26, 2023

sayboras left a comment

Choose a reason for hiding this comment

YutaroHayakawa left a comment

Choose a reason for hiding this comment

giorio94 commented May 29, 2023

aanm left a comment

Choose a reason for hiding this comment

aanm May 30, 2023

Choose a reason for hiding this comment

giorio94 May 31, 2023

Choose a reason for hiding this comment

giorio94 May 31, 2023

Choose a reason for hiding this comment

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023

giorio94 commented May 31, 2023 • edited

giorio94 commented May 31, 2023 •

edited