Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add multi-tenancy support discussions #1175

Merged
merged 6 commits into from
Jan 2, 2024

Conversation

binbin-li
Copy link
Collaborator

@binbin-li binbin-li commented Nov 15, 2023

Description

What this PR does / why we need it:

It's not a final design for multi-tenancy but includes 2 options with comparison for discussion.

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):

Investigating: #743

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Helm Chart Change (any edit/addition/update that is necessary for changes merged to the main branch)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration

  • Test A
  • Test B

Checklist:

  • Does the affected code have corresponding tests?
  • Are the changes documented, not just with inline documentation, but also with conceptual documentation such as an overview of a new feature, or task-based documentation like a tutorial? Consider if this change should be announced on your project blog.
  • Does this introduce breaking changes that would require an announcement or bumping the major version?
  • Do all new files have appropriate license header?

Post Merge Requirements

  • MAINTAINERS: manually trigger the "Publish Package" workflow after merging any PR that indicates Helm Chart Change

Copy link

codecov bot commented Nov 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b6db2ee) 61.43% compared to head (442493c) 61.43%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1175   +/-   ##
=======================================
  Coverage   61.43%   61.43%           
=======================================
  Files          97       97           
  Lines        6179     6179           
=======================================
  Hits         3796     3796           
  Misses       2051     2051           
  Partials      332      332           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

5. As Ratify pods are isolated by namespace, each team can only access its own logs.


#### Cons
Copy link
Collaborator

@akashsinghal akashsinghal Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another consideration is how do we handle which ED provider is invoked in the constraint template? Will there be a new ED provider per namespace? Will there be a namespace-based naming convention for the providers that must be implemented in the Constraint template to call the correct ED provider?

Or maybe the assumption is that each namespace will have its own dedicated Constraint Template + Constraint.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each Ratify deployment, we should have a Provider CR applied. And I checked GK code base, looks like for each admission review request, it will iterate through all providers to gather feedback. And the constraint template is clustered resource, we cannot have namespaced one.

### Deployment per Cluster
#### Pros
1. More resource efficent. Since there is only one Ratify deployment in the cluster, all tenants share the same Ratify pods and dependency resources. If the load traffic increases, admin could easily scale out the Ratify service accordingly.
2. GK will work in the current way that sends out one ED request to Ratify for each admission review.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does mutation work? For example, some namespace may enable and another may not enable. This would require scoping mutation Assign resources to the exact namespaces that have it enabled. Thus, requiring updates to the Assign Resources every time a new tenant is added. (maybe this is ok but definitely overhead)

3. Users only need to deploye once when enabling Ratify feature in the cluster. Afterwards, admins can configure CRs in their own namespaces.
4. Since GK and Ratify can be deployed to the same namespace, we can enable `cert-controller` of GK to set up mTLS automatically.

#### Cons
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another con: how do we provide more granular control over some configs defined at the ratify service level like cache type and size and ttl and log level? All these are currently configured at ratify startup and not as CRs.

Ratify’s first stable version for production has been officially released, garnering increased adoption as an external data provider among a growing number of customers. In production environments, numerous clients are opting for the implementation of a multi-tenant architecture to enhance resource efficiency within their clusters. To support this scenario, Ratify needs to make some essential refactoring. Enclosed in this document are several proposed design approaches and their respective modifications, warranting further scrutiny and discussion.

# Multi-tenancy category
Firstly, the concept of multi-tenancy within a Kubernetes cluster can be broadly categorized into two distinct classifications: multi-team and multi-customer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to add this link as a reference for concept details? https://kubernetes.io/docs/concepts/security/multi-tenancy/

3. No need to add namespace to the external data request from GK to Ratify. We can keep the same request struct.
4. Users could easily set up ResourceQuota and LimitRange to assign appropriate resources.
5. As Ratify pods are isolated by namespace, each team can only access its own logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have many concerns on Deployment per Namespace. Users are really care about the resource usage and maintenance effort. That's being said users like light-weight solution and lower-maintenance solution.

In addition, Ratify as a an addmission controller, users may concern the single-point failure of the Ratify instance. Users will need to maintain multiple Ratify instances in different namespaces. This would be a pain point for users.

@binbin-li binbin-li changed the title docs: add multi-tenancy support discussions [WIP] docs: add multi-tenancy support discussions Nov 20, 2023
@akashsinghal
Copy link
Collaborator

@binbin-li I have updated the e2e tests required based on k8s bump and gk bump. Please merge main when you have a chance.

@binbin-li binbin-li changed the title [WIP] docs: add multi-tenancy support discussions docs: add multi-tenancy support discussions Nov 28, 2023
From the perspective of user experience, it makes a great difference between 2 deployment strategy.

### Deployment per Namespace
1. Cluster admin or team admin must deploy a Ratify service in each required namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how we ensure must? will we have one ratify controller to make sure each namespace has the ratify deployment? or it will be cluster admin manual operation?
If it is automatically, will it cause unexpected node memory/cpu usage spike when new namespace is created? The ratify default cpu/memory request is not small

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a valid concern. A new namespace would require a new Ratify deployment with related resources. And it might get worse when multiple namespaces are created at the same time


Notes: a Ratify service includes all dependency service, such as Redis.

An overview of deployment per namespace:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have detailed design on the CRD behavior change? i guess most of them will have namespaced level?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is under discussion, and I'll have a detailed design soon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CRs would be updated to namespaced scoped. And we probably need some refactoring on cosign verifier and certStore to avoid mounting keys.

Copy link
Collaborator

@susanshi susanshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the info Binbin. Left some suggestion that helps to finalize the design doc.

docs/discussion/Multi-Tenancy Support.md Outdated Show resolved Hide resolved
docs/discussion/Multi-Tenancy Support.md Show resolved Hide resolved
* One solution is adding a namespace field to all Ratify logs. Although this option cannot avoid cross-tenant access without disabling `kubectl logs`, it can be used to filter logs in Kibana.
* The other solution is to let Ratify partition logs by path. For example, logs from namespaceA are saved to `/var/log/namespaceA` and logs from namespaceB are saved to `/var/log/namespaceB`. This solution can avoid cross-tenant access without disabling `kubectl logs`. However, it requires Ratify to configure logrus output paths for each team explicitly. However, Ratify has to dynamically configure paths for all namespaces, including creating and deleting directories.

#### File system/PVC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add priority to the areas that requires multi-tenancy support. The FileSystem is probably higher pri item and should be in listed towards the top of the doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial plan was to have another doc for implementation details once we decide the model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a brief work break down: https://hackmd.io/@H7a8_rG4SuaKwzu4NLT-9Q/By-jVTBST

docs/discussion/Multi-Tenancy Support.md Show resolved Hide resolved
@binbin-li binbin-li merged commit a25c791 into ratify-project:main Jan 2, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants