From dc4985b85bc201b2c2b2e28e4c312090bab27e21 Mon Sep 17 00:00:00 2001 From: Erik Godding Boye Date: Wed, 26 Jul 2023 23:40:07 +0200 Subject: [PATCH] CEL expressions approver design Signed-off-by: Erik Godding Boye Co-authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com> --- design/20230726-cel-policy.md | 369 ++++++++++++++++++++++++++++++++++ design/template.md | 238 ++++++++++++++++++++++ 2 files changed, 607 insertions(+) create mode 100644 design/20230726-cel-policy.md create mode 100644 design/template.md diff --git a/design/20230726-cel-policy.md b/design/20230726-cel-policy.md new file mode 100644 index 00000000..1dac2fc2 --- /dev/null +++ b/design/20230726-cel-policy.md @@ -0,0 +1,369 @@ + + +# CEL expressions to approve or deny CertificateRequest + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories](#user-stories-optional) + - [X.509 DNS SAN in namespace domain](#x509-dns-san-in-namespace-domain) + - [SPIFFE ID identifying namespace](#spiffe-id-identifying-namespace) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Supported Versions](#supported-versions) +- [Production Readiness](#production-readiness) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + - [CEL expressions with full decoded CSR variable](#cel-expressions-with-full-decoded-csr-variable) + + +## Release Signoff Checklist + +This checklist contains actions which must be completed before a PR implementing this design can be merged. + + +- [ ] This design doc has been discussed and approved +- [ ] Test plan has been agreed upon and the tests implemented +- [ ] Feature gate status has been agreed upon (whether the new functionality will be placed behind a feature gate or not) +- [ ] Graduation criteria is in place if required (if the new functionality is placed behind a feature gate, how will it graduate between stages) +- [ ] User-facing documentation has been PR-ed against the release branch in [cert-manager/website] + + +## Summary + + + +While approver-policy allows policies to include namespace selectors, this does not scale when attempting to +enforce CSR attributes that must be a function of the CertificateRequest namespace (or other CertificateRequest fields). +This proposal suggests an extension to the CertificateRequestPolicy API allowing policy authors to specify approval rules using +[CEL expressions](https://github.com/google/cel-spec) containing variables with information from the CertificateRequest. + +## Motivation + + + +### Goals + + + +- allow policy authors to express `allowed` CSR attribute value(s) to be a function of CertificateRequest fields using CEL expressions +- include variables with information from the CertificateRequest and CSR attribute(s) in the CEL context provided to expressions +- validate CEL expressions in CertificateRequestPolicy on admission +- extensions to the CertificateRequestPolicy API should be backwards compatible + +### Non-Goals + + + +- CEL support in CertificateRequestPolicy `constraints` + +## Proposal + + + +The current CertificateRequestPolicy API allows a policy author to specify `allowed` CSR attributes in the +CertificateRequest, and will deny the request if it contains an attribute which is not present in the `allowed` block. +To configure valid CSR attribute values, the API provides fields for allowed +`values` (array; any element matching the attribute is sufficient), +`value` (must match the attribute) or an array of literals/enums (in `usages`) depending on the request attribute. +Allowed string fields accept wildcards `"*"` within its value(s). + +This proposal suggests to add support for specifying allowed string fields using CEL expressions, similar to the existing +wildcard support. While CEL expressions could eventually replace the current CSR attribute validation API - including the +wildcard support, the main improvement would be support for policies that specifies an allowed attribute value that depends +on CertificateRequest fields e.g. `namespace`. If implemented, this new feature would close +[#62](https://github.com/cert-manager/approver-policy/issues/62). + +### User Stories (Optional) + + + +#### X.509 DNS SAN in namespace domain + +As a cluster admin in a multi-tenant cluster, I want to provide a cert-manager ClusterIssuer to our users allowing +self-provisioning of server certificates. Since the cluster is multi-tenant, I want to ensure that tenants do not +"trip on each other's toes" and enforce that the issuer only signs certificates where the X.509 DNS SAN attribute +ends with `.`. + +A variant of this user story is exemplified in [#62](https://github.com/cert-manager/approver-policy/issues/62), +and I think it's one of the most common user stories the proposed feature will solve. + +#### SPIFFE ID identifying namespace + +As a cluster admin in a multi-tenant cluster, I want to provide a cert-manager ClusterIssuer to our users allowing +self-provisioning of [X.509 SVID certificates](https://github.com/spiffe/spiffe/blob/main/standards/X509-SVID.md). +The [SPIFFE ID](https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE-ID.md) (X.509 URI SAN) should identify +the namespace (and service account), and must start with `spiffe:///ns//sa/`. + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + + +## Design Details + + + +An [experimental approver-policy plugin](https://github.com/erikgb/cel-approver-policy-plugin) providing some of the +requested features exists, and it's proposed to base the design on this plugin. +The main downside of using a plugin for this, in addition increased maintenance/complexity, +is that the approver-policy rules are still in play. +Which means that approver rules are somehow duplicated in the `allowed` block and in the plugin configuration and +should be avoided. + +The proposal is to add sibling properties to `values` and `value` properties inside the `allowed` block. The new +fields should contain CEL expressions used to validate CSR attributes with the same +[behavior](https://cert-manager.io/docs/projects/approver-policy/#allowed) as approver-policy has +today, but with the following extension: If any value **OR** expression matches the corresponding CSR attribute, +the attribute will be allowed. + +For the CEL expressions, we need to decide if they should either: + +1. Emit a string that will be used as values do. Example: `'*.%s.svc'.format(cr.namespace)` +2. Return a bool indicating if the expression allows the value. Example: `self.endsWith(cr.namespace + '.svc')` (preferred) + +To use expressions in CertificateRequestPolicy the API may look like this (example): + +```yaml +spec: + allowed: + dnsNames: + expressions: + - "self.endsWith(cr.namespace + '.svc')" + - "self.endsWith(cr.namespace + '.svc.cluster.local')" +``` + +The CEL context variable `self` is inspired by +[CEL expressions in Kubernetes](https://kubernetes.io/docs/reference/using-api/cel/), +and should appear familiar to our users. CEL expressions returning a bool will resemble Kubernetes CEL validation +expressions and is preferred for this reason (IMO). + +It should be allowed to mix `values` and `expressions`, so the following policy should also be acceptable: + +```yaml +spec: + allowed: + dnsNames: + expressions: + - "self.endsWith(cr.namespace + '.apps.my-cluster.com')" + values: + - "*.sub.domain.com" +``` + +As stated in the [goals](#goals), the CEL expressions should be validated on CertificateRequestPolicy admission. +Since the `cel.Program` generated at the end of parse and check is stateless, thread-safe, and cacheable it should +be cached in approver-policy to speed up things. + +### Test Plan + + + + + +### Graduation Criteria + + + +### Upgrade / Downgrade Strategy + + + +### Supported Versions + + + +## Production Readiness + + + +### How can this feature be enabled / disabled for an existing cert-manager installation? + + + +### Does this feature depend on any specific services running in the cluster? + + + +### Will enabling / using this feature result in new API calls (i.e to Kubernetes apiserver or external services)? + + +### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +### Will enabling / using this feature result in significant increase of resource usage? (CPU, RAM...) + + + +## Drawbacks + + + +## Alternatives + + + +### CEL expressions with full decoded CSR variable + +This alternative was suggested by @inteon on Slack. + +The idea here is to provide the policy author with a CEL context variable containing the decoded CSR replacing the +`self` variable in the proposed design. Since the expressions now get access to the full CSR, the expressions will have +to be "promoted" in the CertificateRequestPolicy API. The expression will have to return a bool indicating approved or +denied. An example policy with this alternative API might look like (example); + +```yaml +spec: + expressions: + - >- + has(csr.dnsNames) && + csr.dnsNames.all(dnsName, + dnsName.endsWith(cr.namespace + '.svc') || + dnsName.endsWith(cr.namespace + '.svc.cluster.local')) +``` + +Pros: + +- Extremely flexible. I.e. allows for CSR cross-attribute validation. +- Some would find this API more expressive and better describe the intentions. + +Cons: + +- More complex expressions that will require more in-depth knowledge of CEL. +- Will create a "split" API, since you would probably use either `expressions` or `allowed`/`constraints`. + This could increase the maintenance burden and source to subtle bugs. Breaking the API could be an option to mend this. +- Will probably require custom CEL functions to fulfill the requirements in approver-policy. This is the main reason for + recommending this design, and will be elaborated in the following. + +Let me start with a quote from the +[approver-policy documentation](https://cert-manager.io/docs/projects/approver-policy/#allowed): + +> Allowed is the block that defines attributes that match against the corresponding attribute in the request. +> A request is permitted by the policy if the request omits an allowed attribute, but will deny the request if +> it contains an attribute which is not present in the allowed block. + +The last sentence describes the requirement that could need custom CEL function(s) to allow the user to specify allowed +CSR attributes. We cannot expect the user to include validations for **all** CSR attributes in the expression(s). So I +think we at least need a function to list allowed CSR attribute usage. \ No newline at end of file diff --git a/design/template.md b/design/template.md new file mode 100644 index 00000000..f9625ea2 --- /dev/null +++ b/design/template.md @@ -0,0 +1,238 @@ + + +# CHANGEME: Title + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Supported Versions](#supported-versions) +- [Production Readiness](#production-readiness) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + + +## Release Signoff Checklist + +This checklist contains actions which must be completed before a PR implementing this design can be merged. + + +- [ ] This design doc has been discussed and approved +- [ ] Test plan has been agreed upon and the tests implemented +- [ ] Feature gate status has been agreed upon (whether the new functionality will be placed behind a feature gate or not) +- [ ] Graduation criteria is in place if required (if the new functionality is placed behind a feature gate, how will it graduate between stages) +- [ ] User-facing documentation has been PR-ed against the release branch in [cert-manager/website] + + +## Summary + + + +## Motivation + + + +### Goals + + + +### Non-Goals + + + +## Proposal + + + +### User Stories (Optional) + + + +#### Story 1 + +#### Story 2 + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + + +## Design Details + + + +### Test Plan + + + + + +### Graduation Criteria + + + +### Upgrade / Downgrade Strategy + + + +### Supported Versions + + + +## Production Readiness + + + +### How can this feature be enabled / disabled for an existing cert-manager installation? + + + +### Does this feature depend on any specific services running in the cluster? + + + +### Will enabling / using this feature result in new API calls (i.e to Kubernetes apiserver or external services)? + + +### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +### Will enabling / using this feature result in significant increase of resource usage? (CPU, RAM...) + + + +## Drawbacks + + + +## Alternatives + +