Memory consumption reduction

Release Signoff Checklist
Summary
Motivation
- Goals
- Non-Goals
- Nice-to-Have
- Must-not
Proposal
Design Details
Production Readiness
Drawbacks
Alternatives

Release Signoff Checklist

This checklist contains actions which must be completed before a PR implementing this design can be merged.

This design doc has been discussed and approved
Test plan has been agreed upon and the tests implemented
Feature gate status has been agreed upon (whether the new functionality will be placed behind a feature gate or not)
Graduation criteria is in place if required (if the new functionality is placed behind a feature gate, how will it graduate between stages)
User-facing documentation has been PR-ed against the release branch in [cert-manager/website]

Summary

cert-manager's controller watches and caches all Secret resources in cluster. This causes high memory consumption for cert-manager controller pods in clusters which contain many large Secrets such as Helm release Secrets.

This proposal suggests a mechanism how to avoid caching cert-manager unrelated Secret data.

Motivation

Goals

make cert-manager installation more reliable (no OOM kills caused by events against large cert-manager unrelated Secrets)
reduce cost of running cert-manager installation (need to allocate less memory)
make it easier to predict how much memory needs to be allocated to cert-manager controller

Non-Goals

memory improvements related to caching objects other than Secrets
memory improvements related to caching cert-manager related Secrets
rewrite cert-manager controllers as controller-runtime controllers

Nice to have

have this mechanism eventually be on by default (users shouldn't need to have to discover a feature flag to not cache unrelated Secrets)
use the same mechanism to improve memory consumption by cainjector. This proposal focuses on controller only as it is the more complex part however we need to fix this problem in cainjector too and it would be nice to be consistent

Must not

make our controllers less reliable (i.e by introducing edge cases where a cert-manager related event does not trigger a reconcile). Given the wide usage of cert-manager and the various different usage scenarios, any such edge case would be likely to occur for some users
make our issuance flow harder to reason about or less intuitive
break any existing installation/issuance flows (i.e where some resources, such as issuer Secrets are created after the issuer and the flow relies on the Secret creation event to trigger the issuer reconcile)
significantly slow down issuance

Proposal

The current Secrets informer will have a filter to watch only Secrets that are known to be cert-manager related (using a label selector). A new informer will be added that knows how to watch PartialMetadata for Secrets. This informer will have a filter to watch only Secrets that don't have a known cert-manager label. This will ensure that for each Secret either full data is cached in the typed informer's cache or metadata only is cached in metadata informer's cache. Cert-manager will label cert.spec.secretName and temporary private key Secrets. These are the most frequently accessed Secret resources. Users could also optionally apply the label to other Secrets that cert-manager controller needs to watch to ensure that those get cached.

This will reduce the excessive memory consumption caused by caching full contents of cert-manager unrelated Secrets whilst still ensuring that most of the Secrets that cert-manager needs frequently are retrieved from cache and cert-manager relevant events are not missed.

Background

The excessive memory consumption comes from the amount of cluster objects being stored in the shared informers caches, mostly from Secrets. cert-manager uses client-go's informer factory to create informers for core types. We have auto-generated informers for cert-manager.io types. These informers do not directly expose the cache or the ListerWatcher which is responsible for listing and setting up watches for objects. When cert-manager controller starts, all Secrets are listed and processed, which causes a memory spike. When there is change to Secrets, the cache gets resynced, which can also cause a memory spike. For the rest of the time, Secrets remain in controller's cache.

cert-manager needs to watch all Secrets in the cluster because some user created Secrets, for example issuer credentials, might not be labelled and we do want to trigger issuer reconciles when those Secrets change because:

in cases where an issuer gets created and is unready because its credential has not yet been applied/is incorrect and a user at some point applies or corrects it, it is a better user experience that the creation/update event triggers an immediate reconcile instead of the user having to wait for the failed issuer to be reconciled again after the backoff period (max wait can be 5 minutes for the issuers workqueue)
in cases where an issuer credential change should trigger issuer status update (i.e Venafi credentials Secret gets updated with incorrect credentials) it is a better user experience if the update event caused a reconcile and the issuer status would be changed to unready instead of failing at issuance time
in some cases a missing Secret does not cause issuer reconcile (such as a missing ACME EAB key where we explicitly rely on Secret events to retry issuer setup). In this case, it is more efficient as well as a better user experience to reconcile on Secret creation event as that way we avoid wasting CPU cycles whilst waiting for the user to create the Secret and when the Secret does get created, the issuer will be reconciled immediately.

The caching mechanim is required for ensuring quick issuance and not taking too much of kube apiserver's resources. Secrets with the issued X.509 certificates and with temporary private keys get retrieved a number of times during issuance and all the control loops involved in issuance need full Secret data. Currently the Secrets are retrieved from informers cache. Retrieving them from kube apiserver would mean a large number of additional calls to kube apiserver, which is undesirable. The default cert-manager installation uses a rate-limited client (20QPS with a burst of 50). There is also server-side API Priority and Fairness system that prevents rogue clients from overwhelming kube apiserver. Both these mechanisms mean that the result of a large number of additional calls will be slower issuance as cert-manager will get rate limited (either client-side or server-side). The rate limiting can be modified to allow higher throughput for cert-manager, but this would have an impact of kube apiserver's availability for other tenants - so in either case additional API calls would have a cost for the user.

User Stories

Story 1

User has a cluster with 4 cert-manager Certificates and 30k other (cert-manager unrelated) Secrets. They observe unreasonably high memory consumption in proportion to the amount of cert-manager resources.

See issue description here #4722

Risks and Mitigations

Risk of slowing down issuance in cases where cert-manager needs to retrieve unlabelled Secrets, such as CA issuer's Secret. Users could mitigate this by labelling the Secrets.
Risk of unintentionally or intentionally overwhelming kube apiserver with the additional requests. A default cert-manager installation uses rate limiting (default 50 QPS with a burst of 20). This should be sufficient to ensure that in case of a large number of additional requests from cert-manager controller, the kube apiserver is not slowed down. Cert-manager controller allows to configure rate limiting QPS and burst (there is no upper limit). Since 1.20, Kubernetes by default uses API Priority and Fairness for fine grained server side rate limiting, which should prevent clients that don't sufficiently rate limit themselves from overwhelming the kube apiserver. In a cluster where API Priority and Fairness is disabled and cert-manager's rate limiter has been configured with a very high QPS and burst, it might be possible to overwhelm kube apiserver. However, this is already possible today, if a user has the rights to configure cert-manager installation, i.e by creating a large number of cert-manager resources in a tight loop. To limit the possibility of overwhelming the kube apiserver:
- we should ensure that control loops that access secrets do not unnecessarily retry on errors (i.e if a secret is not found or has invalid data). This should already be the case today, but worth reading through all possible paths
- we could store initialized clients for all issuers as we already do for ACME issuer instead of retrieving credential secrets every time a certificate request needs to be signed
- recommend that users label Secret resources
- start with a non-GA implementation (this design suggests that the implementation starts as an alpha feature) to catch any potential edge cases and gate GA on user feedback from larger installations

Design details

Implementation

Ensure that certificate.Spec.SecretName Secret as well as the Secret with temporary private key are labelled with a controller.cert-manager.io/fao: true ¹ label. The temporary private key Secret is short lived so it should be okay to only label it on creation. The certificate.Spec.SecretName Secret should be checked for the label value on every reconcile of the owning Certificate, same as with the secret template labels and annotations, see here.

Add a partial metadata informers factory, set up with a client-go client that knows how to make GET/LIST/WATCH requests for PartialMetadata. Add a filter to ensure that any informers for this factory will list only resources that are not labelled with a known 'cert-manager' label.

import (
  ...
  "k8s.io/client-go/metadata"
  ...
)
metadataOnlyClient := metadata.NewForConfigOrDie(restConfig)

metadataLabelSelector, _ := notKnownCertManagerSecretLabelSelector()

metadataSharedInformerFactory := metadatainformer.NewFilteredSharedInformerFactory(metadataOnlyClient, resyncPeriod, opts.Namespace, func(listOptions *metav1.ListOptions) {
  // select only objects that do not have a known cert-manager label
		listOptions.LabelSelector = metadataLabelSelector
})

func notKnownCertManagerSecretLabelSelector() (string, error) {
	r, _ := labels.NewRequirement("controller.cert-manager.io/fao", selection.DoesNotExist, make([]string, 0))
	sel := labels.NewSelector().Add(*r)
	return sel.String(), nil
}

Create informer a partial metadata informer that watches events for Secret GVK:

  metadataSecretsInformer := metadataSharedInformerFactory.ForResource(corev1.SchemeGroupVersion.WithResource("secrets"))

Add a label selector to the existing Secrets informer created for typed informers factory to ensure that only Secret that do have a known cert-manager label are watched:

import (
  ...
  kubeinternalinterfaces "k8s.io/client-go/informers/internalinterfaces"
  coreinformers "k8s.io/client-go/informers/core/v1"
  "k8s.io/client-go/kubernetes"
  ...
)
concreteSecretsInformer := NewFilteredSecretsInformer(factory, kubeClient) // factory is the existing typed informers factory

func NewFilteredSecretsInformer(factory kubeinternalinterfaces.SharedInformerFactory, client kubernetes.Interface) coreinformers.SecretInformer {
	return &filteredSecretsInformer{
		factory:     factory,
		client:      client,
		newInformer: newFilteredSecretsInformer,
	}
}

type filteredSecretsInformer struct {
	factory     kubeinternalinterfaces.SharedInformerFactory
	client      kubernetes.Interface
	newInformer kubeinternalinterfaces.NewInformerFunc
	namespace   string
}

func (f *filteredSecretsInformer) Informer() cache.SharedIndexInformer {
	return f.factory.InformerFor(&corev1.Secret{}, f.newInformer)
}

func (f *filteredSecretsInformer) Lister() corelisters.SecretLister {
	return corelisters.NewSecretLister(f.Informer().GetIndexer())
}

func newFilteredSecretsInformer(client kubernetes.Interface, resyncPeriod time.Duration) cache.SharedIndexInformer {
	secretLabelSeclector, _ := knownCertManagerSecretLabelSelector()
	return coreinformers.NewFilteredSecretInformer(client, "", resyncPeriod, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc}, func(listOptions *metav1.ListOptions) {
		listOptions.LabelSelector = secretLabelSeclector
	})
}

func knownCertManagerSecretLabelSelector() (string, error) {
	r, _ := labels.NewRequirement("controller.cert-manager.io/fao", selection.Exists, make([]string, 0))
	sel := labels.NewSelector().Add(*r)
	return sel.String(), nil
}

Create a new Secrets getter function. The function will check for the Secret in both typed and PartialMetadata cache.

If the object is found in both caches, it assumes that either cache must be stale and get the Secret from kube apiserver²
If the object is found in PartialMetadata cache, it will get it from kube apiserver
If the object is found in the typed cache, it will get it from there
If the object is not found, it will return NotFound error

func SecretGetter(ctx context.Context, liveSecretsClient typedcorev1.SecretsGetter, cacheSecretsClient corelisters.SecretLister, partialMetadataClient cache.GenericLister, name string, namespace string) (*corev1.Secret, error) {
	var secretFoundInTypedCache, secretFoundInMetadataCache bool
	secret, err := cacheSecretsClient.Secrets(namespace).Get(name)
	if err == nil {
		secretFoundInTypedCache = true
	}

	if err != nil && !apierrors.IsNotFound(err) {
		return nil, fmt.Errorf("error retrieving secret from the typed cache: %w", err)
	}
	_, partialMetadataGetErr := partialMetadataClient.ByNamespace(namespace).Get(name)
	if partialMetadataGetErr == nil {
		secretFoundInMetadataCache = true
	}

	if partialMetadataGetErr != nil && !apierrors.IsNotFound(partialMetadataGetErr) {
		return nil, fmt.Errorf("error retrieving object from partial object metadata cache: %w", err)
	}

	if secretFoundInMetadataCache && secretFoundInTypedCache {
		return liveSecretsClient.Secrets(namespace).Get(ctx, name, metav1.GetOptions{})
	}

	if secretFoundInTypedCache {
		return secret, nil
	}

	if secretFoundInMetadataCache {
		return liveSecretsClient.Secrets(namespace).Get(ctx, name, metav1.GetOptions{})
	}

	return nil, partialMetadataGetErr
}

Use the new Secrets getter in all control loops that need to get any Secret:

  ...
	// Fetch and parse the 'next private key secret'
	nextPrivateKeySecret, err := SecretGetter(ctx, c.secretLiveClient, c.secretLister, c.metadataSecretLister, *crt.Status.NextPrivateKeySecretName, crt.Namespace)
  ...

Metrics

The following metrics are based on a prototype implementation of this design. The tests were run on a kind cluster.

Cluster with large cert-manager unrelated secrets

Test the memory spike caused by the inital LIST-ing of Secrets, the size of cache after the inital LIST has been processed and a spike caused by changes to Secret resources.

cert-manager v1.11

Create 300 cert-manager unrelated Secrets of size ~1Mb:

Install cert-manager from latest master with client-go metrics enabled.

Wait for cert-manager to start and populate the caches.

Apply a label to all Secrets to initate cache resync:

Observe that memory consumption spikes on controller startup when all Secrets are initally listed, there is a second smaller spike around the time the Secrets got labelled and that memory consumption remains high:

partial metadata prototype

Create 300 cert-manager unrelated Secrets of size ~1Mb:

Deploy cert-manager from partial metadata prototype.

Wait for cert-manager to start and populate the caches.

Apply a label to all Secrets to initate cache resync:

Observe that the memory consumption is significantly lower:

Issuance of a large number of `Certificate`s

This scenario tests issuing 500 certificates from 10 cert-manager CA issuers. The CA issuers have been set up with CA certificates that do not have known cert-manager labels.

Here is a script that sets up the issuers, creates the Certificates, waits for them to become ready and outputs the total time taken https://gist.github.com/irbekrm/bc56a917a164b1a3a097bda483def0b8.

latest cert-manager

This test was run against a version of cert-manager that corresponds to v1.11.0-alpha.2 with some added client-go metrics https://github.com/irbekrm/cert-manager/tree/client_go_metrics. Run a script to set up 10 CA issuers, create 500 certificates and observe the time taken for all certs to be issued:

Observe resource consumption, request rate and latency for cert-manager controller:

Observe resource consumption and rate of requests for Secret resources for kube apiserver:

partial metadata

Run a script to set up 10 CA issuers, create 500 certificates and observe the time taken for all certs to be issued:

Observe resource consumption, request rate and latency for cert-manager controller:

Observe resource consumption and rate of requests for Secret resources for kube apiserver:

The issuance is slightly slowed down because on each issuance cert-manager needs to get the unlabelled CA Secret directly from kube apiserver. Users could mitigate this by adding cert-manager labels to the CA Secrets. Run a modified version of the same script, but with CA Secrets labelled:

For CA issuers, normally a Secret will be retrieved once per issuer reconcile and once per certificate request signing. In some cases, two Secrets might be retrieved during certificate request signing see secrets for issuers. We could look into improving this, by initializing a client with credentials and sharing with certificate request controllers, similarly to how it's currently done with ACME clients.

Pros

In most setups in majority of cases where a control loop needs a Secret it would still be retrieved from cache (as it is certificate secrets that get parsed most frequently and those will be labelled in practically all cases)
Memory consumption improvements appear quite significant
Once graduated to GA would work for all installations without needing to discover a flag to set

Cons

All cluster Secrets are still listed
Slower issuance in cases where cert-manager needs to retrieve unlabelled Secrets

Test Plan

Unit and e2e tests (largely updating our existing e2e tests and writing unit tests for any new functions).

We do not currently have any automated tests that observe resource consumption/do load testing.

See Metrics for how to test resource consumption/issuance speed manually.

Graduation Criteria

Alpha (cert-manager 1.12):

feature implemented behind a feature flag
CI tests pass for all supported Kubernetes versions
this design discussed and merged

Beta:

User feedback:

does this solve the target use case (memory consumption reduction for clusters with large number of cert-manager unrelated Secrets)?
does this work in cases where large number of Certificates need to be issued around the same time (i.e is the slight slowdown of issuance acceptable)?

GA:

TODO: define criteria which should be a certain number of working installations

Upgrade / Downgrade Strategy

Recommend users to upgrade to cert-manager v1.11 first to ensure that all Certificate Secrets are labelled to avoid spike in apiserver calls on controller startup.

Supported Versions

This feature will work with all versions of Kubernetes currently supported by cert-manager.

PartialMetadata support by kube apiserver has been GA since Kubernetes 1.15. The oldest Kubernetes version supported by cert-manager 1.12 will be 1.22.

Notes

Current state

This sections lists all Secrets that need to be watched by cert-manager controller's reconcile loops.

Secrets for Certificates

certificate.spec.secretName Secrets (that contain the issued certs). These can be created by cert-manager or pre-created by users or external tools (i.e ingress controller). If created by cert-manager, they will have a number of cert-manager.io annotations. Secrets without annotations will cause re-issuance (see https://cert-manager.io/docs/faq/#when-do-certs-get-re-issued) and upon successful issuance cert-manager.io annotations will be added.
The temporary Secrets that get created for each issuance and contain the private key of that the certificate request is signed with. These can only be created by cert-manager controller and are all labelled with cert-manager.io/next-private-key: true label.

Secrets for [Cluster]Issuers

The issuers and clusterissuers controllers set up watches for all events on all secrets, but have a filter to determine whether an event should cause a reconcile.

ACME issuer

the secret referenced by issuer.spec.acme.privateKeySecretRef. This can be created by user (for an already existing ACME account) or by cert-manager. Cert-manager does not currently add any labels or annotations to this secret.

A number of optional secrets that will always be created by users with no labelling enforced:

the secret referenced in issuer.spec.acme.externalAccountBinding.
the secret referenced by issuer.spec.acme.solvers.dns01.acmeDNS.accountSecretRef.
the secret referenced in issuer.spec.acme.solvers.dns01.akamai.clientSecretSecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.akamai.accessTokenSecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.azureDNS.clientSecretSecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.cloudDNS.serviceAccountSecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.cloudflare.apiTokenSecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.cloudflare.apiKeySecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.digitalocean.tokenSecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.rfc2136.tsigSecretSecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.route53.accessKeyIDSecretRef
the secret referenced in issuer.spec.acme.solvers.dns01.route53.secretAccessKeySecretRef

The ACME account key secret and, if configured, the secret with EAB key will be returned once per issuer reconcile (on events against issuer or the account key or EAB key secret). The ACME client initialized with the credentials is then stored in a registry shared with orders controller, so the secrets are not retrieved again when a certificate request for the issuer needs to be signed. For a DNS-01 challenge, one (possibly two in case of AWS) calls for secrets will be made during issuance to retrieve the relevant credentials secret.

CA

the secret referenced by issuer.spec.ca.secretName. This will always be created by user. No labelling is currently enforced.

This will be retrieved twice when the issuer is reconciled (when an event occurs against the issuer or its secret) and once when a certificate request for the issuer is being signed.

Vault

the optional secret referenced by issuers.spec.vault.caBundleSecretRef. Always created by user with no labelling enforced

One of the following credentials secrets:

secret referenced by issuers.spec.vault.auth.appRole.secretRef. Always created by user with no labelling enforced
secret referenced by issuers.spec.vault.auth.kubernetes.secretRef. Always created by user with no labelling enforced
secret referenced by issuers.spec.vault.auth.tokenSecretRef. Always created by user with no labelling enforced

The configured credentials Secrets and, if configured, CA bundle Secret will be retrieved every time the issuer is reconciled (on events against the issuer and either of the Secrets) and every time a certificate request needs to be signed.

Venafi

One of:

the secret referenced by issuers.spec.venafi.tpp.secretRef. Always created by user with no labelling enforced
the secret referenced by issuers.spec.venafi.cloud.secretRef. Always created by user with no labelling enforced

The configured Secret will be retrieved when the issuer is reconciled (events against issuer and its secret) and when a certificate request is signed.

Upstream mechanisms

There are a number of existing upstream mechanisms how to limit what gets stored in the cache. This section focuses on what is available for client-go informers which we use in cert-manager controllers, but there is a controller-runtime wrapper available for each of these mechanisms that should make it usable in cainjector as well.

Filtering

Filtering which objects get watched using label or field selectors. These selectors allow to filter what resources are retrieved during the initial list call and watch calls to kube apiserver by informer's ListerWatcher component (and therefore will end up in the cache). client-go informer factory allows configuring individual informers with list options that will be used for list and watch calls. This mechanism is used by other projects that use client-go controllers, for example istio. The same filtering mechanism is also available for cert-manager.io resources. We shouldn't need to filter what cert-manager.io resoruces we watch though. This mechanism seems the most straightforward to use, but currently we don't have a way to identify all resources (secrets) we need to watch using a label or field selector, see [###Secrets].

Partial object metadata

Caching only metadata for a given object. This mechanism relies on making list and watch calls against kube apiserver with a PartialObjectMetadata header. The apiserver then returns PartialObjectMetadata instead of an object of a concrete type such as a Secret. The PartialObjectMetadata only contains the metadata and type information of the object. To use this mechanism to ensure that metadata only is being cached for a particular resource type that triggers a reconcile, ListerWatcher of the informer for that type needs to use a client that knows how to make calls with PartialObjectMetadata header. Also if the reconcile loop can only retrieve PartialObjectMetadata types from cache. client-go has a metadata only client that can be used to get, list and watch with PartialObjectMetadata. client-go also has a metadata informer that uses the metadata only client to list and watch resources. This informer implements the same SharedIndexInformer interface as the core and cert-manager.io informers that we use currently, so it would fit our existing controller setup. The downside to having metadata only in cache is that if the reconcile loop needs the whole object, it needs to make another call to the kube apiserver to get the actual object. We have a number of reconcile loops that retrieve and parse secret data numerous times, for example readiness controller retrieves and parses spec.SecretName secret for a Certificate on any event associated with the Certificate, any of its CertificateRequests or the spec.secretName secret. TODO: add which projects have adopted metadata-only watches, especially with client-go informers

Transform functions

Transforming the object before it gets placed into cache. Client-go allows configuring core informers with transform functions. These functions will get called with the object as an argument before the object is placed into cache. The transformer will need to convert the object to a concrete or metadata type if it wants to retrieve its fields. This is a lesser used functionality in comparison with metadata only caching. A couple usage examples:

support for transform functions was added in controller-runtime controller-runtime#1805 with the goal of allowing users to remove managed fields and annotations
Istio's pilot controller uses this mechanism to configure their client-go informers to remove managed fields before putting object into cache I haven't seen any usage examples where non-metadata fields are modified using this mechanism. I cannot see a reason why new fields (i.e a label that signals that a transform was applied could not be added) as well as fields being removed.

Future changes

There is an open KEP for replacing initial LIST with a WATCH kubernetes/enhancements#3667

Perhaps this would also reduce the memory spike on controller startup.

Production Readiness

How can this feature be enabled / disabled for an existing cert-manager installation?

Does this feature depend on any specific services running in the cluster?

No

Will enabling / using this feature result in new API calls (i.e to Kubernetes apiserver or external services)?

There will be additional calls to kube apiserver to retrieve unlabelled Secrets.

See Metrics and Risks and Mitigation

Will enabling / using this feature result in increasing size or count of the existing API objects?

No new objects will be created

Will enabling / using this feature result in significant increase of resource usage? (CPU, RAM...)

No, see Metrics

Alternatives

Use transform functions to remove `data` for non-labelled `Secret`s before adding them to informers cache

Watch all Secrets as before. Use client-go's transform functions mechanism to remove the data field for a Secret that does not have a known cert-manager label before it gets placed in informer's cache. In the same transform function add a custom cert-manager.io/metadata-only label to all Secrets whose data got removed (this label will only exist on the cached object). In reconcilers, use a custom Secrets getter that can get the Secret either from kube apiserver or cache, depending on whether it has the cert-manager.io/metadata-only label that suggests that the Secret's data has been removed. Additionally, ensure that as many Secrets as we can (ACME registry account keys) get labelled. Users would be encouraged to add a cert-manager label to all Secrets they create to reduce extra calls to kube apiserver.

In practice:

cert-manager would cache the full Secret object for all certificate.spec.secretName Secrets and all Secrets containing temporary private keys in almost all cases and would retrieve these Secrets from cache in almost all cases (see the section about Secrets for Certificates)
cert-manager would cache the full Secret object for all labelled user created Secrets (issuer credentials)
cert-manager would cache metadata only for user created unlabelled Secrets that are used by issuers/cluster-issuers and would call kube apiserver directly to retrieve Secret data for those Secrets
cert-manager would cache metadata for all other unrelated cluster Secrets

This would need to start as an alpha feature and would require alpha/beta testing by actual users for us to be able to measure the gain in memory reduction in concrete cluster setup.

Here is a prototype of this solution. In the prototype Secrets Transformer function is the tranform that gets applied to all Secrets before they are cached. If a Secret does not have any known cert-manager labels or annotations it removes data, metada.managedFields and metadata.Annotations and applies a cert-manager.io/metadata-only label. SecretGetter is used by any control loop that needs to GET a Secret. It retrieves it from kube apiserver or cache dependign on whether cert-manager.io/metadata-only label was found.

Drawbacks

All cluster Secrets are still listed
The transform functions only get run before the object is placed into informer's cache. The full object will be in controller's memory for a period of time before that (in DeltaFIFO store (?)). So the users will still see memory spikes when events related to cert-manager unrelated cluster Secrets occur. See performance of the protototype:

Create 300 cert-manager unrelated Secrets of size ~1Mb:

Deploy cert-manager from https://github.com/irbekrm/cert-manager/tree/experimental_transform_funcs

Wait for cert-manager caches to sync, then run a command to label all Secrets to make caches resync:

Observe that altough altogether memory consumption remains quite low, there is a spike corresponding to the initial listing of Secrets:

Use PartialMetadata only

We could cache PartialMetadata only for Secret objects. This would mean having just one, metadata, informer for Secrets and always GETting the Secrets directly from kube apiserver.

Drawbacks

Large number of additional requests to kube apiserver. For a default cert-manager installation this would mean slow issuance as client-go rate limiting would kick in. The limits can be modified via cert-manager controller flags, however this would then mean less availability of kube apisever to other cluster tenants. Additionally, the Secrets that we actually need to cache are not likely going to be large in size, so there would be less value from memory savings perspective.

Here is a branch that implements a very experimental version of using partial metadata only https://github.com/irbekrm/cert-manager/tree/just_partial.

The following metrics are approximate as the prototype could probably be optimized. Compare with metrics section of this proposal for an approximate idea of the increase in kube apiserver calls during issuance.

Deploy cert-manager from https://github.com/irbekrm/cert-manager/tree/just_partial

Run a script to set up 10 CA issuers, create 500 certificates and observe that the time taken is significantly higher than for latest version of cert-manager:

Observe high request latency for cert-manager:

Observe a large number of additional requests to kube apiserver:

Use paging to limit the memory spike when controller starts up

LIST calls to kube apiserver can be paginated. Perhaps not getting all objects at once on the initial LIST would limit the spike in memory when cert-manager controller starts up.

However, currently it is not possible to paginate the initial LISTs made by client-go informers. Although it is possible to set page limit when creating a client-go informer factory or an individual informer, this will in practice not be used for the inital LIST. LIST requests can be served either from etcd or kube apiserver watch cache. Watch cache does not support pagination, so if a request is forwarded to the cache, the response will contain a full list. Client-go makes the inital LIST request with resource version 0 for performance reasons (to ensure that watch cache is used) and this results in the response being served from kube apiserver watch cache.

There is currently an open PR to implement pagination from watch cache kubernetes/kubernetes#108392.

Filter the Secrets to watch with a label

Only watch Secrets with known cert-manager.io labels. Ensure that label gets applied to all Secrets we manage (such as spec.secretName Secret for Certificate). We already ensure that all spec.secretName Secrets get annotated when synced- we can use the same mechanism to apply a label. Users will have to ensure that Secrets they create are labelled. We can help them to discover which Secrets that are currently deployed to cluster and need labelling with a cmctl command. In terms of resource consumption and calls to apiserver, this would be the most efficient solution (only relevant Secrets are being listed/watched/cached and all relevant Secrets are cached in full).

Drawbacks

Bad user experience - breaking change to adopt and introduces a potential footgun after adoption as even if users labelled all relevant Secrets in cluster at time of adoption, there would likely be no visible warning if an unlabelled Secret for an issuer got created at some point in future and things would just silently not work (i.e Secret data updates would not trigger issuer reconcile etc).
This feature would likely need to be opt-in 'forever' as else it would be a major breaking change when adopting and a potential footgun after adoption
Maintenance cost of the cmctl command: if a new user created Secret needs to be watched in a reconcile loop, the cmctl command would also need to be updated, which could be easily forgotten

Allow users to pass a custom filter

Add a flag that allows users to pass a custom selector (a label or field filter)

See an example flag implementation for cainjector in #5174 thanks to @aubm for working on this.

It might work well for cases where 'known' selectors need to be passed that we could event document such as type!=helm.sh/release.v1.

Drawbacks

bad user experience- no straightforward way to tell if the selector actually does what was expected and an easy footgun especially when users attempt to specify which Secrets should (rather than shouldn't) be watched
users should aim to use 'negative' selectors, but that be complicated if there is a large number of random Secrets in cluster that don't have a unifying selector

Use a standalone typed cache populated from different sources

We could have a standalone cache for typed Secrets that gets populated by a standard watch for labelled Secrets as well as from Secrets that were retrieved in reconciler loops. A metadata only cache would also be maintained. This should ensure that a Secret that our control loop needs, but is not labelled only gets retrieved from kube apiserver once. So it should provide the same memory improvements as the main design, but should avoid additional kube apiserver calls in cases where users have unlabelled cert-manager related Secrets in cluster.

Drawbacks

complexity of implementation and maintenance of a custom caching mechanism

Footnotes

fao = 'for attention of' ↩
We thought this might happen when the known cert-manager label gets added to or removed from a Secret. There is a mechanism for removing such Secret from a cache that should no longer have it, see this Slack conversation and when experimenting with the prototype implementation I have not observed stale cache when adding/removing labels ↩

Files

20221205-memory-management.md

Latest commit

History

20221205-memory-management.md

File metadata and controls

Memory consumption reduction

Release Signoff Checklist

Summary

Motivation

Goals

Non-Goals

Nice to have

Must not

Proposal

Background

User Stories

Story 1

Risks and Mitigations

Design details

Implementation

Metrics

Cluster with large cert-manager unrelated secrets

cert-manager v1.11

partial metadata prototype

Issuance of a large number of Certificates

latest cert-manager

partial metadata

Pros

Cons

Test Plan

Graduation Criteria

Upgrade / Downgrade Strategy

Supported Versions

Notes

Current state

Secrets for Certificates

Secrets for [Cluster]Issuers

Upstream mechanisms

Filtering

Partial object metadata

Transform functions

Future changes

Production Readiness

How can this feature be enabled / disabled for an existing cert-manager installation?

Does this feature depend on any specific services running in the cluster?

Will enabling / using this feature result in new API calls (i.e to Kubernetes apiserver or external services)?

Will enabling / using this feature result in increasing size or count of the existing API objects?

Will enabling / using this feature result in significant increase of resource usage? (CPU, RAM...)

Alternatives

Use transform functions to remove data for non-labelled Secrets before adding them to informers cache

Drawbacks

Use PartialMetadata only

Drawbacks

Use paging to limit the memory spike when controller starts up

Filter the Secrets to watch with a label

Drawbacks

Allow users to pass a custom filter

Drawbacks

Use a standalone typed cache populated from different sources

Drawbacks

Footnotes

Issuance of a large number of `Certificate`s

Use transform functions to remove `data` for non-labelled `Secret`s before adding them to informers cache