Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermesh: enable per-cluster RBAC in etcd server #24284

Merged

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Mar 10, 2023

This PR reworks how the cilium agents and the cluster-mesh API server are authorized against the clustermesh etcd server, configuring per-cluster RBAC policies, to improve the overall security and provide better isolation.

In particular:

  • The clustermesh API server is switched to use an ad-hoc user rather than root.
  • The agents on each cluster use a different user against the remote etcd servers. A new logic is introduced in the clustermesh API server, and it is responsible for the creation of the etcd users, based on the configuration provided. This per-cluster users operating mode is disabled by default, and it can be enabled through a dedicated helm values flag. An intermediate configuration is additionally provided to allow switching from one mode to another with no disruption.

Additional details are provided in the respective commits.

@reviewers: this is the first time I use the Hive framework. Feel free to let me know if anything should be implemented differently.

clustermesh: enable per-cluster RBAC in etcd server

@giorio94 giorio94 added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. sig/kvstore Impacts the KVStore package interactions. labels Mar 10, 2023
@giorio94 giorio94 force-pushed the mio/clustermesh-per-cluster-users branch from 261a078 to b3458a9 Compare March 10, 2023 09:34
@giorio94 giorio94 changed the title clustermesh: configure per-cluster RBAC in etcd server clustermesh: enable per-cluster RBAC in etcd server Mar 10, 2023
@giorio94 giorio94 force-pushed the mio/clustermesh-per-cluster-users branch 2 times, most recently from fd449de to 514a0c7 Compare March 10, 2023 12:31
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 marked this pull request as ready for review March 10, 2023 13:32
@giorio94 giorio94 requested review from a team as code owners March 10, 2023 13:32
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't spend the time to look at the code in details - Documentation/ changes are good.

By the way, should the feature get documented?

@giorio94
Copy link
Member Author

By the way, should the feature get documented?

I would personally say that the documentation is blocked until the cilium CLI is migrated to use helm. The documentation is based on the CLI, but it currently uses hard-coded manifests, which make impossible to enable this feature.

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we add new clusters to the clustermesh with the auth mode cluster? Seems all clusters need to have the (new) user of the new cluster added, how does that work?

@giorio94
Copy link
Member Author

How do we add new clusters to the clustermesh with the auth mode cluster? Seems all clusters need to have the (new) user of the new cluster added, how does that work?

I agree that all clusters need to have the new user added, but IMHO that is not much different compared to now, since the new etcd configuration needs anyhow to be set on all clusters. In this case, users are automatically synchronized in etcd by the clustermesh API server based on the content of a ConfigMap, which is updated through helm when a new cluster is listed. There is a transient period in which the new user might not be yet present (depending on the ordering of the operations), but the connection will eventually be established once it gets created.

@giorio94
Copy link
Member Author

The ConformanceIngress test failed due to #24347. Retriggered

pkg/kvstore/client.go Outdated Show resolved Hide resolved
Copy link
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! Just a few questions, and a chance for maybe one more cleanup, but looks really good.

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor nit regarding the file in which the GlobalUserMgmtClientPromiseCell is set up. Probably not worth a respin just for that. May can be addressed in a follow-up PR.

pkg/kvstore/client.go Outdated Show resolved Hide resolved
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 28, 2023
@julianwiedmann julianwiedmann added dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 28, 2023
@tklauser tklauser added dont-merge/bad-bot To prevent MLH from marking ready-to-merge. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 28, 2023
@julianwiedmann
Copy link
Member

contrary to what mlh thinks 😄, we do need a rebase before merging.

This commits modifies the clustermesh etcd users creation script to
create password-less users. This slightly simplifies the script and
reduces the attack surface, enabling only certificate-based auth.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit switches the user leveraged by the clustermesh-apiserver to
interact with etcd from root to admin-<cluster-name>. While also the
latter is granted the root role permissions, this improves the security
in case the CA is shared among all clusters (which is the suggested
approach), since the key/cert pair available locally is only valid
against the local etcd instance, and not against the remote ones (as it
would be in case of root).

This modification does not introduce issues in case of upgrades and
downgrades, since:
* it impacts only the local cluster, as remote ones shall use a
  different username with lower permissions;
* certificates are re-generated with the new username if automatic TLS
  certificates generation is enabled;
* the root user is still present, since it cannot be neither removed nor
  de-privileged; hence, the change is backward compatible in case users
  provided their own certificates.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit introduces two new kvstore client operations handling the
creation and the removal of users (in an idempotent manner). They will
be used in the context of the clustermesh-apiserver in a subsequent
commit, to handle the management of users for the remote cluster.

This functionality is implemented only in case of etcd, since (1) it is
the kvstore backend used for clustermesh and (2) consul support has been
deprecated.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-per-cluster-users branch from 514a0c7 to 8389e48 Compare March 29, 2023 06:37
This commit exposes the kvstore client to perform user management tasks
through a promise, to easily integrate it in other hive modules. This is
required since the kvstore client initialization logic has not yet been
refactored according to the hive paradigm.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit introduces a new logic in the clustermesh-apiserver to
handle the creation/removal of users in etcd for each remote cluster.
More in detail, the list of users to be enforced is provided as a yaml
formatted configuration file, and the controller is triggered any time
that file changes. This logic is a building block for enabling each
cluster to authenticate with a different etcd users, improving the
overall clustermesh security posture.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit exposes the configuration of the authentication mode for
clustermesh through helm, supporting three alternatives:
* legacy: all clusters use the same username (i.e., remote);
* migration: provided to upgrade from legacy to cluster (and vice
  versa) with no disruption, enables the creation of the per-cluster
  usernames in etcd, while still using the common one for auth;
* cluster: each cluster uses a different user (i.e.,
  remote-<cluster-name>) to authenticate against remote etcd instances.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-per-cluster-users branch from 8389e48 to ea0a7e9 Compare March 29, 2023 06:40
@giorio94
Copy link
Member Author

The first force-push rebased onto master to fix the conflict, while the second one moved the promise logic to a separate pkg/kvstore/cell.go file as per Tobias's suggestion.

@giorio94 giorio94 removed the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label Mar 29, 2023
@giorio94
Copy link
Member Author

/test

@julianwiedmann julianwiedmann removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. labels Mar 29, 2023
@giorio94
Copy link
Member Author

ConformanceGatewayAPI seems to have hit a new flake: #24621

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 29, 2023
@julianwiedmann
Copy link
Member

The ci-gke fail is #22368.

@julianwiedmann julianwiedmann merged commit fd9ad00 into cilium:master Mar 29, 2023
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/kvstore Impacts the KVStore package interactions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants