Skip to content

Commit

Permalink
CEL expressions approver design
Browse files Browse the repository at this point in the history
Signed-off-by: Erik Godding Boye <egboye@gmail.com>

Co-authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
  • Loading branch information
erikgb and jsoref committed Aug 15, 2023
1 parent ce8aeca commit dc4985b
Show file tree
Hide file tree
Showing 2 changed files with 607 additions and 0 deletions.
369 changes: 369 additions & 0 deletions design/20230726-cel-policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,369 @@
<!--
This template is adapted from Kubernetes Enchancements KEP template https://raw.githubusercontent.com/kubernetes/enhancements/a86942e8ba802d0035ec7d4a9c992f03bca7dce9/keps/NNNN-kep-template/README.md
-->

# CEL expressions to approve or deny CertificateRequest

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [User Stories](#user-stories-optional)
- [X.509 DNS SAN in namespace domain](#x509-dns-san-in-namespace-domain)
- [SPIFFE ID identifying namespace](#spiffe-id-identifying-namespace)
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Supported Versions](#supported-versions)
- [Production Readiness](#production-readiness)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [CEL expressions with full decoded CSR variable](#cel-expressions-with-full-decoded-csr-variable)
<!-- /toc -->

## Release Signoff Checklist

This checklist contains actions which must be completed before a PR implementing this design can be merged.


- [ ] This design doc has been discussed and approved
- [ ] Test plan has been agreed upon and the tests implemented
- [ ] Feature gate status has been agreed upon (whether the new functionality will be placed behind a feature gate or not)
- [ ] Graduation criteria is in place if required (if the new functionality is placed behind a feature gate, how will it graduate between stages)
- [ ] User-facing documentation has been PR-ed against the release branch in [cert-manager/website]


## Summary

<!--
This section is important for producing high-quality, user-focused
documentation such as release notes.
A good summary is probably around a paragraph in length.
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
-->

While approver-policy allows policies to include namespace selectors, this does not scale when attempting to
enforce CSR attributes that must be a function of the CertificateRequest namespace (or other CertificateRequest fields).
This proposal suggests an extension to the CertificateRequestPolicy API allowing policy authors to specify approval rules using
[CEL expressions](https://github.com/google/cel-spec) containing variables with information from the CertificateRequest.

## Motivation

<!--
This section is for explicitly listing the motivation, goals, and non-goals of
the proposed enhancement. Describe why the change is important and the benefits to users. The
motivation section can optionally provide links to
demonstrate the interest in this functionality amongst the community.
-->

### Goals

<!--
List specific goals. What is this proposal trying to achieve? How will we
know that this has succeeded?
-->

- allow policy authors to express `allowed` CSR attribute value(s) to be a function of CertificateRequest fields using CEL expressions
- include variables with information from the CertificateRequest and CSR attribute(s) in the CEL context provided to expressions
- validate CEL expressions in CertificateRequestPolicy on admission
- extensions to the CertificateRequestPolicy API should be backwards compatible

### Non-Goals

<!--
What is out of scope for this proposal? Listing non-goals helps to focus discussion
and make progress.
-->

- CEL support in CertificateRequestPolicy `constraints`

## Proposal

<!--
This is where we get down to the specifics of what the proposal actually is.
What is the desired outcome and how do we measure success?
This should have enough detail that reviewers can understand exactly what
you're proposing, but should not include things like API designs or
implementation- those should go into "Design Details" below.
-->

The current CertificateRequestPolicy API allows a policy author to specify `allowed` CSR attributes in the
CertificateRequest, and will deny the request if it contains an attribute which is not present in the `allowed` block.
To configure valid CSR attribute values, the API provides fields for allowed
`values` (array; any element matching the attribute is sufficient),
`value` (must match the attribute) or an array of literals/enums (in `usages`) depending on the request attribute.
Allowed string fields accept wildcards `"*"` within its value(s).

This proposal suggests to add support for specifying allowed string fields using CEL expressions, similar to the existing
wildcard support. While CEL expressions could eventually replace the current CSR attribute validation API - including the
wildcard support, the main improvement would be support for policies that specifies an allowed attribute value that depends
on CertificateRequest fields e.g. `namespace`. If implemented, this new feature would close
[#62](https://github.com/cert-manager/approver-policy/issues/62).

### User Stories (Optional)

<!--
Detail the things that people will be able to do if this proposal gets implemented.
Include as much detail as possible so that people can understand the "how" of
the system. The goal here is to make this feel real for users without getting
bogged down.
-->

#### X.509 DNS SAN in namespace domain

As a cluster admin in a multi-tenant cluster, I want to provide a cert-manager ClusterIssuer to our users allowing
self-provisioning of server certificates. Since the cluster is multi-tenant, I want to ensure that tenants do not
"trip on each other's toes" and enforce that the issuer only signs certificates where the X.509 DNS SAN attribute
ends with `<namespace>.<domain>`.

A variant of this user story is exemplified in [#62](https://github.com/cert-manager/approver-policy/issues/62),
and I think it's one of the most common user stories the proposed feature will solve.

#### SPIFFE ID identifying namespace

As a cluster admin in a multi-tenant cluster, I want to provide a cert-manager ClusterIssuer to our users allowing
self-provisioning of [X.509 SVID certificates](https://github.com/spiffe/spiffe/blob/main/standards/X509-SVID.md).
The [SPIFFE ID](https://github.com/spiffe/spiffe/blob/main/standards/SPIFFE-ID.md) (X.509 URI SAN) should identify
the namespace (and service account), and must start with `spiffe://<trust-domain>/ns/<namespace>/sa/`.

### Notes/Constraints/Caveats (Optional)

<!--
What are the caveats to the proposal?
What are some important details that didn't come across above?
Go into as much detail as necessary here.
This might be a good place to talk about core concepts and how they relate.
-->

### Risks and Mitigations

<!--
What are the risks of this proposal, and how do we mitigate? Think broadly.
For example, consider both security and how this will impact the larger
Kubernetes/PKI ecosystem.
-->

## Design Details

<!--
This section should contain enough information that the specifics of your
change are understandable. This may include API specs (though not always
required) or even code snippets. If there's any ambiguity about HOW your
proposal will be implemented, this is the place to discuss them.
-->

An [experimental approver-policy plugin](https://github.com/erikgb/cel-approver-policy-plugin) providing some of the
requested features exists, and it's proposed to base the design on this plugin.
The main downside of using a plugin for this, in addition increased maintenance/complexity,
is that the approver-policy rules are still in play.
Which means that approver rules are somehow duplicated in the `allowed` block and in the plugin configuration and
should be avoided.

The proposal is to add sibling properties to `values` and `value` properties inside the `allowed` block. The new
fields should contain CEL expressions used to validate CSR attributes with the same
[behavior](https://cert-manager.io/docs/projects/approver-policy/#allowed) as approver-policy has
today, but with the following extension: If any value **OR** expression matches the corresponding CSR attribute,
the attribute will be allowed.

For the CEL expressions, we need to decide if they should either:

1. Emit a string that will be used as values do. Example: `'*.%s.svc'.format(cr.namespace)`
2. Return a bool indicating if the expression allows the value. Example: `self.endsWith(cr.namespace + '.svc')` (preferred)

To use expressions in CertificateRequestPolicy the API may look like this (example):

```yaml
spec:
allowed:
dnsNames:
expressions:
- "self.endsWith(cr.namespace + '.svc')"
- "self.endsWith(cr.namespace + '.svc.cluster.local')"
```

The CEL context variable `self` is inspired by
[CEL expressions in Kubernetes](https://kubernetes.io/docs/reference/using-api/cel/),
and should appear familiar to our users. CEL expressions returning a bool will resemble Kubernetes CEL validation
expressions and is preferred for this reason (IMO).

It should be allowed to mix `values` and `expressions`, so the following policy should also be acceptable:

```yaml
spec:
allowed:
dnsNames:
expressions:
- "self.endsWith(cr.namespace + '.apps.my-cluster.com')"
values:
- "*.sub.domain.com"
```

As stated in the [goals](#goals), the CEL expressions should be validated on CertificateRequestPolicy admission.
Since the `cel.Program` generated at the end of parse and check is stateless, thread-safe, and cacheable it should
be cached in approver-policy to speed up things.

### Test Plan

<!---
Describe how the new functionality will be tested (unit tests, integration tests (if applicable), e2e tests)
-->



### Graduation Criteria

<!--
Describe whether the proposed functionality will be feature gated and why (or why not).
Define graduation milestones and criteria if it should be feature-gated.
Feature maturity is defined using stages alpha, beta, GA.
Feature-gated functionality starts off at alpha and graduates through stages following the defined graduation criteria.
A feature that is in alpha or beta must be opt-in.
Example graduation criteria:
Alpha:
- Feature implemented behind a feature flag
- It is clearly defined which Kubernetes versions this feature supports
- CI tests pass for all supported Kubernetes versions
Beta:
- Gather user feedback
GA:
- N examples of real-world usage
- N installs
- Allowing time for feedback
- Works on all versions of Kubernetes supported by the version of cert-manager at which this feature becomes GA
References in Kubernetes documentation:
[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
-->

### Upgrade / Downgrade Strategy

<!--
Will this feature affect upgrade/downgrade of cert-manager?
-->

### Supported Versions

<!--
What versions of Kubernetes (and other external services if applicable) will this feature support?
-->

## Production Readiness
<!--
This section should confirm that the feature can be safely operated in production environment and can be disabled or rolled back in case it is found to increase failures.
-->


### How can this feature be enabled / disabled for an existing cert-manager installation?

<!--
Can the feature be disabled after having been enabled?
Consider whether any additional steps will need to be taken to start/stop using this feature, i.e change existing resources that have had new field added for the feature before disabling it.
Do the test cases cover both the feature being enabled and it being disabled (where relevant)?
-->

### Does this feature depend on any specific services running in the cluster?

<!--
For example, are external dependencies such as ingress controllers, third party CRDs etc required for this feature to function?
-->

### Will enabling / using this feature result in new API calls (i.e to Kubernetes apiserver or external services)?
<!--
We should ensure that cert-manager does not hammer any external services with excessive calls.
Consider whether there will be sufficient backoff if any external calls fail and need to be retried.
-->

### Will enabling / using this feature result in increasing size or count of the existing API objects?

<!--
For example, will cert-manager `CustomResourceDefinition`s increase in size, will there be more `Secret`s or `CertificateRequest`s created?
-->

### Will enabling / using this feature result in significant increase of resource usage? (CPU, RAM...)

<!--
For example, will implementing this feature result in more objects being cache thus increasing memory consumption?
-->

## Drawbacks

<!--
Why should this proposal _not_ be implemented?
-->

## Alternatives

<!--
What other approaches did you consider, and why did you rule them out? These do
not need to be as detailed as the proposal, but should include enough
information to express the idea and why it was not acceptable.
-->

### CEL expressions with full decoded CSR variable

This alternative was suggested by @inteon on Slack.

The idea here is to provide the policy author with a CEL context variable containing the decoded CSR replacing the
`self` variable in the proposed design. Since the expressions now get access to the full CSR, the expressions will have
to be "promoted" in the CertificateRequestPolicy API. The expression will have to return a bool indicating approved or
denied. An example policy with this alternative API might look like (example);

```yaml
spec:
expressions:
- >-
has(csr.dnsNames) &&
csr.dnsNames.all(dnsName,
dnsName.endsWith(cr.namespace + '.svc') ||
dnsName.endsWith(cr.namespace + '.svc.cluster.local'))
```

Pros:

- Extremely flexible. I.e. allows for CSR cross-attribute validation.
- Some would find this API more expressive and better describe the intentions.

Cons:

- More complex expressions that will require more in-depth knowledge of CEL.
- Will create a "split" API, since you would probably use either `expressions` or `allowed`/`constraints`.
This could increase the maintenance burden and source to subtle bugs. Breaking the API could be an option to mend this.
- Will probably require custom CEL functions to fulfill the requirements in approver-policy. This is the main reason for
recommending this design, and will be elaborated in the following.

Let me start with a quote from the
[approver-policy documentation](https://cert-manager.io/docs/projects/approver-policy/#allowed):

> Allowed is the block that defines attributes that match against the corresponding attribute in the request.
> A request is permitted by the policy if the request omits an allowed attribute, but will deny the request if
> it contains an attribute which is not present in the allowed block.
The last sentence describes the requirement that could need custom CEL function(s) to allow the user to specify allowed
CSR attributes. We cannot expect the user to include validations for **all** CSR attributes in the expression(s). So I
think we at least need a function to list allowed CSR attribute usage.

0 comments on commit dc4985b

Please sign in to comment.