Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermesh: allow waiting for the CiliumClusterConfig to appear when required #25671

Merged
merged 4 commits into from May 31, 2023

Conversation

giorio94
Copy link
Member

Reporting the description of the first commit for convenience:

857060bbcf07 ("clustermesh: Introduce CiliumClusterConfig") introduced
the CiliumClusterConfig, which the clustermesh-apiserver stores into the
kvstore during startup, and the remote cilium agents read when connecting
to new clusters. Cilium agents tolerate the possible lack of the cluster
configuration, to account for backward compatibility.

This process currently suffers from the possibility of a race condition,
because the agents might connect to the kvstore before that the cluster
configuration has been actually written. Although as of today it doesn't
raise particular concerns given that it is only used for sanity checks,
the issue will be more serious when depending on the information therein
contained to tune the behavior. One such case concerns the upcoming
kvstoremesh approach (caching the information of remote clusters in
the local kvstore), which leverages different prefixes for separation
(hence requiring the cluster config to be present for disambiguation)
and increases the race condition window (since the cluster configuration
of new clusters would be written at runtime rather than on startup).

This commit introduces a new well-known key (`cilium/.has-cilium-config`)
that, when present, conveys that the cluster configuration will be
eventually created, and remote clusters shall wait until it is present
rather than falling back to the legacy approach. This key will be
created by the clustermesh-apiserver init container, ensuring that it is
always present when etcd is exposed to the remote agents.

Alternative approaches that have been considered, but discarded because
they would solve the problem for vanilla clustermesh but not for
kvstoremesh include:
* Setting the CiliumClusterConfig in the clustermesh-apiserver init
  container rather than during the initialization phase.
* Leveraging a readiness probe to set the clustermesh-apiserver as ready
  only once the initialization phase completed, and the configuration
  has been written.

Another possibility might involve being strict and making the cluster configuration mandatory starting from v1.14. This looks a no-go to me though, because:

  • It would prevent a cluster Cilium v1.14 to mesh with another running Cilium v1.12 (I couldn't find any official documentation about the maximum supported skew in the clustermesh scenario to say whether this is acceptable or not).
  • AFAIK. we are not currently setting the cluster configuration when the clustermesh-apiserver is not used (i.e., Cilium is run with the kvstore backend). In this scenario, even a cluster running Cilium v1.13 would be incompatible (unless the cluster config is created manually before upgrading the other clusters).

/cc @YutaroHayakawa, @marseel, @aanm, @joestringer for opinions.

clustermesh: allow waiting for the CiliumClusterConfig to appear when required

@giorio94 giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels May 25, 2023
@giorio94 giorio94 requested review from a team as code owners May 25, 2023 11:55
@giorio94 giorio94 force-pushed the mio/wait-for-cluster-config branch from 2e34c46 to e77c108 Compare May 26, 2023 06:44
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 force-pushed the mio/wait-for-cluster-config branch from e77c108 to c6887f2 Compare May 26, 2023 11:51
@giorio94
Copy link
Member Author

Force-pushed to rebased onto main and fix a conflict.

@giorio94
Copy link
Member Author

/test

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation in the commit message is just awesome 💯, the overall changes make sense to me ✔️

Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now looks good to me. Thanks!

@giorio94
Copy link
Member Author

/ci-eks

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving because my changes are not blocking but would be nice to understand the reason behind the choice of wait.PollImmediateInfiniteWithContext

// controller backoff period to avoid recreating every time a new connection to the remote
// kvstore, which would introduce an unnecessary overhead. Still, we do return in case of
// consecutive failures, to ensure that we do not retry forever is something strange happened.
err = wait.PollImmediateInfiniteWithContext(ctx, clusterConfigPollInterval, func(ctx context.Context) (done bool, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we don't use our controllers for this? They have the benefit of showing up in cilium status if something fails. With wait.PollImmediateInfiniteWithContext the failures will get hidden AFAICT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, the remote cluster logic is already wrapped by a controller. The idea here was to have a retry mechanism which doesn't cause that controller to restart (by simply returning the error there), because it would also cause the restart of the etcd connection, introducing an unnecessary overhead. But yeah, it makes sense to use another controller also for that rather than wait.PollImmediateInfiniteWithContext.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to use the controller. @aanm PTAL

@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels May 30, 2023
@giorio94 giorio94 removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023
@giorio94 giorio94 added dont-merge/bad-bot To prevent MLH from marking ready-to-merge. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels May 31, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023
@giorio94
Copy link
Member Author

Removing the ready-to-merge label while addressing the above suggestion.

@giorio94 giorio94 force-pushed the mio/wait-for-cluster-config branch from c6887f2 to c888195 Compare May 31, 2023 07:57
@giorio94 giorio94 removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023
857060b ("clustermesh: Introduce CiliumClusterConfig") introduced
the CiliumClusterConfig, which the clustermesh-apiserver stores into the
kvstore during startup, and the remote cilium agents read when connecting
to new clusters. Cilium agents tolerate the possible lack of the cluster
configuration, to account for backward compatibility.

This process currently suffers from the possibility of a race condition,
because the agents might connect to the kvstore before that the cluster
configuration has been actually written. Although as of today it doesn't
raise particular concerns given that it is only used for sanity checks,
the issue will be more serious when depending on the information therein
contained to tune the behavior. One such case concerns the upcoming
kvstoremesh approach (caching the information of remote clusters in
the local kvstore), which leverages different prefixes for separation
(hence requiring the cluster config to be present for disambiguation)
and increases the race condition window (since the cluster configuration
of new clusters would be written at runtime rather than on startup).

This commit introduces a new well-known key (`cilium/.has-cilium-config`)
that, when present, conveys that the cluster configuration will be
eventually created, and remote clusters shall wait until it is present
rather than falling back to the legacy approach. This key will be
created by the clustermesh-apiserver init container, ensuring that it is
always present when etcd is exposed to the remote agents.

Alternative approaches that have been considered, but discarded because
they would solve the problem for vanilla clustermesh but not for
kvstoremesh include:
* Setting the CiliumClusterConfig in the clustermesh-apiserver init
  container rather than during the initialization phase.
* Leveraging a readiness probe to set the clustermesh-apiserver as ready
  only once the initialization phase completed, and the configuration
  has been written.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Create the `cilium/.has-cluster-config` key through the
clustermesh-apiserver init container, to ensure that it is always
present when the remote agents connect to etcd.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Leverage the newly introduced `.has-cluster-config` key to retrigger the
retrieval of the cluster configuration in case it was not found, but it
is known that it will be eventually present. This removes the
possibility of race conditions if an agent connects to the kvstore
before that the corresponding cluster configuration has been written.
Additionally, the cluster configuration can also be forced as mandatory
through a dedicated function parameter, when that is known agent-side
because an enabled function requires it to be always present.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Let's propagate the context from the called to these helper functions,
rather than using context.Background().

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/wait-for-cluster-config branch from c888195 to 28a7cf6 Compare May 31, 2023 08:29
@giorio94
Copy link
Member Author

Rebased onto main to pick the last changes and allow the Cilium Runtime checks to run successfully.

@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 removed the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label May 31, 2023
@giorio94
Copy link
Member Author

giorio94 commented May 31, 2023

/test-1.16-4.19

Rerunning, as failed with a timeout

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023
@julianwiedmann julianwiedmann merged commit b46e154 into cilium:main May 31, 2023
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants