Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermesh: Introduce ClusterID reservation mechanism #26124

Merged
merged 1 commit into from Jun 16, 2023

Conversation

marseel
Copy link
Contributor

@marseel marseel commented Jun 12, 2023

Improve the lifecycle of cluster config by introducing the ClusterID reservation mechanism. More details in commit message

@marseel marseel requested a review from a team as a code owner June 12, 2023 13:53
@marseel marseel requested a review from giorio94 June 12, 2023 13:53
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 12, 2023
@marseel marseel added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jun 12, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 12, 2023
@giorio94 giorio94 added the area/clustermesh Relates to multi-cluster routing functionality in Cilium. label Jun 13, 2023
pkg/clustermesh/types/types.go Outdated Show resolved Hide resolved
pkg/clustermesh/remote_cluster.go Show resolved Hide resolved
pkg/clustermesh/remote_cluster.go Outdated Show resolved Hide resolved
pkg/clustermesh/remote_cluster.go Outdated Show resolved Hide resolved
pkg/clustermesh/remote_cluster.go Outdated Show resolved Hide resolved
pkg/clustermesh/clustermesh.go Outdated Show resolved Hide resolved
pkg/clustermesh/clustermesh.go Outdated Show resolved Hide resolved
pkg/clustermesh/types/types.go Show resolved Hide resolved
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly ready to me, just a couple of comments inline.

pkg/clustermesh/clustermesh.go Outdated Show resolved Hide resolved
pkg/clustermesh/remote_cluster.go Outdated Show resolved Hide resolved
pkg/clustermesh/types/types.go Show resolved Hide resolved
@marseel marseel force-pushed the various_clustermesh_improvements branch 2 times, most recently from 9f40c4e to a97d1c0 Compare June 15, 2023 17:12
@marseel
Copy link
Contributor Author

marseel commented Jun 16, 2023

/test

@marseel marseel force-pushed the various_clustermesh_improvements branch 5 times, most recently from 91e9ebf to 03641a9 Compare June 16, 2023 12:21
@marseel marseel changed the title Improve the lifecycle of cluster config by introducing the ClusterID reservation mechanism clustermesh: Introduce ClusterID reservation mechanism Jun 16, 2023
@marseel
Copy link
Contributor Author

marseel commented Jun 16, 2023

/test

@marseel
Copy link
Contributor Author

marseel commented Jun 16, 2023

/ci-multicluster

Similar flake to #25816

Currently, the ClusterIDs for each remoteClusters are managed by
each remote cluster controllers with rc.config. This makes very
hard to control the access to the ClusterIDs. For example, when
we have a new remote cluster connection and receive a new cluster
config, we need to ensure the new ClusterID is not used by other
remote cluster controller. To ensure that, we need to iterate
over all remoteCluster objects and also access to the rc.config
which may be changed over time depending on each remote cluster's
connection state.

For every time the remoteCluster controller start to use a new ClusterID,
it "reserves" the ClusterID from central registry. By correctly
performing mutex for this reservation, we can guarantee that no one else
uses the reserved ClusterID. So that after the reservation, each
remoteCluster controller can exclusively access to the corresponding
CT/SNAT per-cluster map slots.

This can also replace the complicated canConnect() validation with
ClusterID reservation. Instead of iterate over all clusters and check
ClusterID uniqueness, we can simply try to reserve the ID and if it
fails, reject a new connection.

Once the remote cluster controller finish using the ClusterID, it
cleanups any resources bounded to the ClusterID (e.g. per-cluster maps)
and "releases" the ClusterID.

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
@marseel marseel force-pushed the various_clustermesh_improvements branch from 03641a9 to 743b9d0 Compare June 16, 2023 13:56
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@marseel
Copy link
Contributor Author

marseel commented Jun 16, 2023

/test

@marseel marseel added release-note/misc This PR makes changes that have no direct user impact. and removed release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Jun 16, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 16, 2023
@joestringer
Copy link
Member

I'm going to merge this, but I will note that "reservation system" to me implies that there should be full lifecycle tests that allocate reservations and also ensure that reservations are properly garbage collected and so on. At a glance, I didn't see that sort of testing, so if we don't have any such testing then I would encourage you to think about building additional testing for those cases.

@joestringer joestringer merged commit 766e62b into cilium:main Jun 16, 2023
62 checks passed
@marseel
Copy link
Contributor Author

marseel commented Jun 16, 2023

This was mostly refactoring of existing code and not really introducing any new functionality.
The integration test now covers a case where we reserve multiple IDs, change the ID of the existing cluster, and also free IDs when removing clusters, but I agree it could cover more cases around the error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants