- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness
- Drawbacks
- Alternatives
This checklist contains actions which must be completed before a PR implementing this design can be merged.
- This design doc has been discussed and approved
- Test plan has been agreed upon and the tests implemented
- Feature gate status has been agreed upon (whether the new functionality will be placed behind a feature gate or not)
- Graduation criteria is in place if required (if the new functionality is placed behind a feature gate, how will it graduate between stages)
- User-facing documentation has been PR-ed against the release branch in [cert-manager/website]
cert-manager's controller watches and caches all Secret
resources in cluster.
This causes high memory consumption for cert-manager controller pods in clusters which contain many large Secret
s such as Helm release Secret
s.
This proposal suggests a mechanism how to avoid caching cert-manager unrelated Secret
data.
-
make cert-manager installation more reliable (no OOM kills caused by events against large cert-manager unrelated
Secret
s) -
reduce cost of running cert-manager installation (need to allocate less memory)
-
make it easier to predict how much memory needs to be allocated to cert-manager controller
-
memory improvements related to caching objects other than
Secret
s -
memory improvements related to caching cert-manager related
Secret
s -
rewrite cert-manager controllers as controller-runtime controllers
-
have this mechanism eventually be on by default (users shouldn't need to have to discover a feature flag to not cache unrelated
Secret
s) -
use the same mechanism to improve memory consumption by cainjector. This proposal focuses on controller only as it is the more complex part however we need to fix this problem in cainjector too and it would be nice to be consistent
-
make our controllers less reliable (i.e by introducing edge cases where a cert-manager related event does not trigger a reconcile). Given the wide usage of cert-manager and the various different usage scenarios, any such edge case would be likely to occur for some users
-
make our issuance flow harder to reason about or less intuitive
-
break any existing installation/issuance flows (i.e where some resources, such as issuer
Secret
s are created after the issuer and the flow relies on theSecret
creation event to trigger the issuer reconcile) -
significantly slow down issuance
The current Secret
s informer will have a filter to watch only Secret
s that are known to be cert-manager related (using a label selector).
A new informer will be added that knows how to watch PartialMetadata
for Secret
s. This informer will have a filter to watch only Secret
s that don't have a known cert-manager label. This will ensure that for each Secret
either full data is cached in the typed informer's cache or metadata only is cached in metadata informer's cache.
Cert-manager will label cert.spec.secretName
and temporary private key Secret
s. These are the most frequently accessed Secret
resources. Users could also optionally apply the label to other Secret
s that cert-manager controller needs to watch to ensure that those get cached.
This will reduce the excessive memory consumption caused by caching full contents of cert-manager unrelated Secret
s whilst still ensuring that most of the Secret
s that cert-manager needs frequently are retrieved from cache and cert-manager relevant events are not missed.
The excessive memory consumption comes from the amount of cluster objects being stored in the shared informers caches, mostly from Secret
s.
cert-manager uses client-go's informer factory to create informers for core types. We have auto-generated informers for cert-manager.io types. These informers do not directly expose the cache or the ListerWatcher which is responsible for listing and setting up watches for objects.
When cert-manager controller starts, all Secret
s are listed and processed, which causes a memory spike.
When there is change to Secret
s, the cache gets resynced, which can also cause a memory spike.
For the rest of the time, Secret
s remain in controller's cache.
cert-manager needs to watch all Secret
s in the cluster because some user created Secret
s, for example issuer credentials, might not be labelled and we do want to trigger issuer reconciles when those Secret
s change because:
-
in cases where an issuer gets created and is unready because its credential has not yet been applied/is incorrect and a user at some point applies or corrects it, it is a better user experience that the creation/update event triggers an immediate reconcile instead of the user having to wait for the failed issuer to be reconciled again after the backoff period (max wait can be 5 minutes for the issuers workqueue)
-
in cases where an issuer credential change should trigger issuer status update (i.e Venafi credentials
Secret
gets updated with incorrect credentials) it is a better user experience if the update event caused a reconcile and the issuer status would be changed to unready instead of failing at issuance time -
in some cases a missing
Secret
does not cause issuer reconcile (such as a missing ACME EAB key where we explicitly rely onSecret
events to retry issuer setup). In this case, it is more efficient as well as a better user experience to reconcile onSecret
creation event as that way we avoid wasting CPU cycles whilst waiting for the user to create theSecret
and when theSecret
does get created, the issuer will be reconciled immediately.
The caching mechanim is required for ensuring quick issuance and not taking too much of kube apiserver's resources. Secret
s with the issued X.509 certificates and with temporary private keys get retrieved a number of times during issuance and all the control loops involved in issuance need full Secret
data. Currently the Secret
s are retrieved from informers cache. Retrieving them from kube apiserver would mean a large number of additional calls to kube apiserver, which is undesirable. The default cert-manager installation uses a rate-limited client (20QPS with a burst of 50). There is also server-side API Priority and Fairness system that prevents rogue clients from overwhelming kube apiserver. Both these mechanisms mean that the result of a large number of additional calls will be slower issuance as cert-manager will get rate limited (either client-side or server-side). The rate limiting can be modified to allow higher throughput for cert-manager, but this would have an impact of kube apiserver's availability for other tenants - so in either case additional API calls would have a cost for the user.
User has a cluster with 4 cert-manager Certificate
s and 30k other (cert-manager unrelated) Secret
s.
They observe unreasonably high memory consumption in proportion to the amount of cert-manager resources.
See issue description here #4722
-
Risk of slowing down issuance in cases where cert-manager needs to retrieve unlabelled
Secret
s, such as CA issuer'sSecret
. Users could mitigate this by labelling theSecret
s. -
Risk of unintentionally or intentionally overwhelming kube apiserver with the additional requests. A default cert-manager installation uses rate limiting (default 50 QPS with a burst of 20). This should be sufficient to ensure that in case of a large number of additional requests from cert-manager controller, the kube apiserver is not slowed down. Cert-manager controller allows to configure rate limiting QPS and burst (there is no upper limit). Since 1.20, Kubernetes by default uses API Priority and Fairness for fine grained server side rate limiting, which should prevent clients that don't sufficiently rate limit themselves from overwhelming the kube apiserver. In a cluster where API Priority and Fairness is disabled and cert-manager's rate limiter has been configured with a very high QPS and burst, it might be possible to overwhelm kube apiserver. However, this is already possible today, if a user has the rights to configure cert-manager installation, i.e by creating a large number of cert-manager resources in a tight loop. To limit the possibility of overwhelming the kube apiserver:
- we should ensure that control loops that access secrets do not unnecessarily retry on errors (i.e if a secret is not found or has invalid data). This should already be the case today, but worth reading through all possible paths
- we could store initialized clients for all issuers as we already do for ACME issuer instead of retrieving credential secrets every time a certificate request needs to be signed
- recommend that users label
Secret
resources - start with a non-GA implementation (this design suggests that the implementation starts as an alpha feature) to catch any potential edge cases and gate GA on user feedback from larger installations
Ensure that certificate.Spec.SecretName
Secret
as well as the Secret
with temporary private key are labelled with a controller.cert-manager.io/fao: true
1 label.
The temporary private key Secret
is short lived so it should be okay to only label it on creation.
The certificate.Spec.SecretName
Secret
should be checked for the label value on every reconcile of the owning Certificate
, same as with the secret template labels and annotations, see here.
Add a partial metadata informers factory, set up with a client-go client that knows how to make GET/LIST/WATCH requests for PartialMetadata
.
Add a filter to ensure that any informers for this factory will list only resources that are not labelled with a known 'cert-manager' label.
import (
...
"k8s.io/client-go/metadata"
...
)
metadataOnlyClient := metadata.NewForConfigOrDie(restConfig)
metadataLabelSelector, _ := notKnownCertManagerSecretLabelSelector()
metadataSharedInformerFactory := metadatainformer.NewFilteredSharedInformerFactory(metadataOnlyClient, resyncPeriod, opts.Namespace, func(listOptions *metav1.ListOptions) {
// select only objects that do not have a known cert-manager label
listOptions.LabelSelector = metadataLabelSelector
})
func notKnownCertManagerSecretLabelSelector() (string, error) {
r, _ := labels.NewRequirement("controller.cert-manager.io/fao", selection.DoesNotExist, make([]string, 0))
sel := labels.NewSelector().Add(*r)
return sel.String(), nil
}
Create informer a partial metadata informer that watches events for Secret
GVK:
metadataSecretsInformer := metadataSharedInformerFactory.ForResource(corev1.SchemeGroupVersion.WithResource("secrets"))
Add a label selector to the existing Secret
s informer created for typed informers factory to ensure that only Secret
that do have a known cert-manager label are watched:
import (
...
kubeinternalinterfaces "k8s.io/client-go/informers/internalinterfaces"
coreinformers "k8s.io/client-go/informers/core/v1"
"k8s.io/client-go/kubernetes"
...
)
concreteSecretsInformer := NewFilteredSecretsInformer(factory, kubeClient) // factory is the existing typed informers factory
func NewFilteredSecretsInformer(factory kubeinternalinterfaces.SharedInformerFactory, client kubernetes.Interface) coreinformers.SecretInformer {
return &filteredSecretsInformer{
factory: factory,
client: client,
newInformer: newFilteredSecretsInformer,
}
}
type filteredSecretsInformer struct {
factory kubeinternalinterfaces.SharedInformerFactory
client kubernetes.Interface
newInformer kubeinternalinterfaces.NewInformerFunc
namespace string
}
func (f *filteredSecretsInformer) Informer() cache.SharedIndexInformer {
return f.factory.InformerFor(&corev1.Secret{}, f.newInformer)
}
func (f *filteredSecretsInformer) Lister() corelisters.SecretLister {
return corelisters.NewSecretLister(f.Informer().GetIndexer())
}
func newFilteredSecretsInformer(client kubernetes.Interface, resyncPeriod time.Duration) cache.SharedIndexInformer {
secretLabelSeclector, _ := knownCertManagerSecretLabelSelector()
return coreinformers.NewFilteredSecretInformer(client, "", resyncPeriod, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc}, func(listOptions *metav1.ListOptions) {
listOptions.LabelSelector = secretLabelSeclector
})
}
func knownCertManagerSecretLabelSelector() (string, error) {
r, _ := labels.NewRequirement("controller.cert-manager.io/fao", selection.Exists, make([]string, 0))
sel := labels.NewSelector().Add(*r)
return sel.String(), nil
}
Create a new Secret
s getter function. The function will check for the Secret
in both typed and PartialMetadata
cache.
- If the object is found in both caches, it assumes that either cache must be stale and get the
Secret
from kube apiserver2 - If the object is found in
PartialMetadata
cache, it will get it from kube apiserver - If the object is found in the typed cache, it will get it from there
- If the object is not found, it will return NotFound error
func SecretGetter(ctx context.Context, liveSecretsClient typedcorev1.SecretsGetter, cacheSecretsClient corelisters.SecretLister, partialMetadataClient cache.GenericLister, name string, namespace string) (*corev1.Secret, error) {
var secretFoundInTypedCache, secretFoundInMetadataCache bool
secret, err := cacheSecretsClient.Secrets(namespace).Get(name)
if err == nil {
secretFoundInTypedCache = true
}
if err != nil && !apierrors.IsNotFound(err) {
return nil, fmt.Errorf("error retrieving secret from the typed cache: %w", err)
}
_, partialMetadataGetErr := partialMetadataClient.ByNamespace(namespace).Get(name)
if partialMetadataGetErr == nil {
secretFoundInMetadataCache = true
}
if partialMetadataGetErr != nil && !apierrors.IsNotFound(partialMetadataGetErr) {
return nil, fmt.Errorf("error retrieving object from partial object metadata cache: %w", err)
}
if secretFoundInMetadataCache && secretFoundInTypedCache {
return liveSecretsClient.Secrets(namespace).Get(ctx, name, metav1.GetOptions{})
}
if secretFoundInTypedCache {
return secret, nil
}
if secretFoundInMetadataCache {
return liveSecretsClient.Secrets(namespace).Get(ctx, name, metav1.GetOptions{})
}
return nil, partialMetadataGetErr
}
Use the new Secret
s getter in all control loops that need to get any Secret
:
...
// Fetch and parse the 'next private key secret'
nextPrivateKeySecret, err := SecretGetter(ctx, c.secretLiveClient, c.secretLister, c.metadataSecretLister, *crt.Status.NextPrivateKeySecretName, crt.Namespace)
...
The following metrics are based on a prototype implementation of this design. The tests were run on a kind cluster.
Test the memory spike caused by the inital LIST-ing of Secret
s, the size of cache after the inital LIST has been processed and a spike caused by changes to Secret
resources.
Create 300 cert-manager unrelated Secret
s of size ~1Mb:
Install cert-manager from latest master with client-go metrics enabled.
Wait for cert-manager to start and populate the caches.
Apply a label to all Secret
s to initate cache resync:
Observe that memory consumption spikes on controller startup when all Secret
s are initally listed, there is a second smaller spike around the time the Secret
s got labelled and that memory consumption remains high:
Create 300 cert-manager unrelated Secret
s of size ~1Mb:
Deploy cert-manager from partial metadata prototype.
Wait for cert-manager to start and populate the caches.
Apply a label to all Secret
s to initate cache resync:
Observe that the memory consumption is significantly lower:
This scenario tests issuing 500 certificates from 10 cert-manager CA issuers. The CA issuers have been set up with CA certificates that do not have known cert-manager labels.
Here is a script that sets up the issuers, creates the Certificate
s, waits for them to become ready and outputs the total time taken https://gist.github.com/irbekrm/bc56a917a164b1a3a097bda483def0b8.
This test was run against a version of cert-manager that corresponds to v1.11.0-alpha.2 with some added client-go metrics https://github.com/irbekrm/cert-manager/tree/client_go_metrics.
Run a script to set up 10 CA issuers, create 500 certificates and observe the time taken for all certs to be issued:
Observe resource consumption, request rate and latency for cert-manager controller:
Observe resource consumption and rate of requests for Secret
resources for kube apiserver:
Run a script to set up 10 CA issuers, create 500 certificates and observe the time taken for all certs to be issued:
Observe resource consumption, request rate and latency for cert-manager controller:
Observe resource consumption and rate of requests for Secret
resources for kube apiserver:
The issuance is slightly slowed down because on each issuance cert-manager needs to get the unlabelled CA Secret
directly from kube apiserver.
Users could mitigate this by adding cert-manager labels to the CA Secret
s.
Run a modified version of the same script, but with CA Secret
s labelled:
For CA issuers, normally a Secret
will be retrieved once per issuer reconcile and once per certificate request signing. In some cases, two Secret
s might be retrieved during certificate request signing see secrets for issuers. We could look into improving this, by initializing a client with credentials and sharing with certificate request controllers, similarly to how it's currently done with ACME clients.
-
In most setups in majority of cases where a control loop needs a
Secret
it would still be retrieved from cache (as it is certificate secrets that get parsed most frequently and those will be labelled in practically all cases) -
Memory consumption improvements appear quite significant
-
Once graduated to GA would work for all installations without needing to discover a flag to set
-
All cluster
Secret
s are still listed -
Slower issuance in cases where cert-manager needs to retrieve unlabelled
Secret
s
Unit and e2e tests (largely updating our existing e2e tests and writing unit tests for any new functions).
We do not currently have any automated tests that observe resource consumption/do load testing.
See Metrics for how to test resource consumption/issuance speed manually.
Alpha (cert-manager 1.12):
-
feature implemented behind a feature flag
-
CI tests pass for all supported Kubernetes versions
-
this design discussed and merged
Beta:
User feedback:
- does this solve the target use case (memory consumption reduction for clusters with large number of cert-manager unrelated
Secret
s)? - does this work in cases where large number of
Certificate
s need to be issued around the same time (i.e is the slight slowdown of issuance acceptable)?
GA:
- TODO: define criteria which should be a certain number of working installations
Recommend users to upgrade to cert-manager v1.11 first to ensure that all Certificate
Secret
s are labelled to avoid spike in apiserver calls on controller startup.
This feature will work with all versions of Kubernetes currently supported by cert-manager.
PartialMetadata
support by kube apiserver has been GA since Kubernetes 1.15.
The oldest Kubernetes version supported by cert-manager 1.12 will be 1.22.
This sections lists all Secret
s that need to be watched by cert-manager controller's reconcile loops.
-
certificate.spec.secretName
Secret
s (that contain the issued certs). These can be created by cert-manager or pre-created by users or external tools (i.e ingress controller). If created by cert-manager, they will have a number ofcert-manager.io
annotations. Secrets without annotations will cause re-issuance (see https://cert-manager.io/docs/faq/#when-do-certs-get-re-issued) and upon successful issuance cert-manager.io annotations will be added. -
The temporary
Secret
s that get created for each issuance and contain the private key of that the certificate request is signed with. These can only be created by cert-manager controller and are all labelled withcert-manager.io/next-private-key: true
label.
The issuers and clusterissuers controllers set up watches for all events on all secrets, but have a filter to determine whether an event should cause a reconcile.
ACME issuer
- the secret referenced by
issuer.spec.acme.privateKeySecretRef
. This can be created by user (for an already existing ACME account) or by cert-manager. Cert-manager does not currently add any labels or annotations to this secret.
A number of optional secrets that will always be created by users with no labelling enforced:
-
the secret referenced in
issuer.spec.acme.externalAccountBinding
. -
the secret referenced by
issuer.spec.acme.solvers.dns01.acmeDNS.accountSecretRef
. -
the secret referenced in
issuer.spec.acme.solvers.dns01.akamai.clientSecretSecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.akamai.accessTokenSecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.azureDNS.clientSecretSecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.cloudDNS.serviceAccountSecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.cloudflare.apiTokenSecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.cloudflare.apiKeySecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.digitalocean.tokenSecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.rfc2136.tsigSecretSecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.route53.accessKeyIDSecretRef
-
the secret referenced in
issuer.spec.acme.solvers.dns01.route53.secretAccessKeySecretRef
The ACME account key secret and, if configured, the secret with EAB key will be returned once per issuer reconcile (on events against issuer or the account key or EAB key secret). The ACME client initialized with the credentials is then stored in a registry shared with orders controller, so the secrets are not retrieved again when a certificate request for the issuer needs to be signed. For a DNS-01 challenge, one (possibly two in case of AWS) calls for secrets will be made during issuance to retrieve the relevant credentials secret.
CA
- the secret referenced by
issuer.spec.ca.secretName
. This will always be created by user. No labelling is currently enforced.
This will be retrieved twice when the issuer is reconciled (when an event occurs against the issuer or its secret) and once when a certificate request for the issuer is being signed.
Vault
- the optional secret referenced by
issuers.spec.vault.caBundleSecretRef
. Always created by user with no labelling enforced
One of the following credentials secrets:
-
secret referenced by
issuers.spec.vault.auth.appRole.secretRef
. Always created by user with no labelling enforced -
secret referenced by
issuers.spec.vault.auth.kubernetes.secretRef
. Always created by user with no labelling enforced -
secret referenced by
issuers.spec.vault.auth.tokenSecretRef
. Always created by user with no labelling enforced
The configured credentials Secret
s and, if configured, CA bundle Secret
will be retrieved every time the issuer is reconciled (on events against the issuer and either of the Secret
s) and every time a certificate request needs to be signed.
Venafi
One of:
-
the secret referenced by
issuers.spec.venafi.tpp.secretRef
. Always created by user with no labelling enforced -
the secret referenced by
issuers.spec.venafi.cloud.secretRef
. Always created by user with no labelling enforced
The configured Secret
will be retrieved when the issuer is reconciled (events against issuer and its secret) and when a certificate request is signed.
There are a number of existing upstream mechanisms how to limit what gets stored in the cache. This section focuses on what is available for client-go informers which we use in cert-manager controllers, but there is a controller-runtime wrapper available for each of these mechanisms that should make it usable in cainjector as well.
Filtering which objects get watched using label or field selectors. These selectors allow to filter what resources are retrieved during the initial list call and watch calls to kube apiserver by informer's ListerWatcher
component (and therefore will end up in the cache). client-go informer factory allows configuring individual informers with list options that will be used for list and watch calls.
This mechanism is used by other projects that use client-go controllers, for example istio.
The same filtering mechanism is also available for cert-manager.io resources. We shouldn't need to filter what cert-manager.io resoruces we watch though.
This mechanism seems the most straightforward to use, but currently we don't have a way to identify all resources (secrets) we need to watch using a label or field selector, see [###Secrets].
Caching only metadata for a given object. This mechanism relies on making list and watch calls against kube apiserver with a PartialObjectMetadata
header. The apiserver then returns PartialObjectMetadata instead of an object of a concrete type such as a Secret
. The PartialObjectMetadata
only contains the metadata and type information of the object.
To use this mechanism to ensure that metadata only is being cached for a particular resource type that triggers a reconcile, ListerWatcher
of the informer for that type needs to use a client that knows how to make calls with PartialObjectMetadata
header. Also if the reconcile loop can only retrieve PartialObjectMetadata
types from cache.
client-go has a metadata only client that can be used to get, list and watch with PartialObjectMetadata
. client-go also has a metadata informer that uses the metadata only client to list and watch resources. This informer implements the same SharedIndexInformer interface as the core and cert-manager.io informers that we use currently, so it would fit our existing controller setup.
The downside to having metadata only in cache is that if the reconcile loop needs the whole object, it needs to make another call to the kube apiserver to get the actual object. We have a number of reconcile loops that retrieve and parse secret data numerous times, for example readiness controller retrieves and parses spec.SecretName
secret for a Certificate
on any event associated with the Certificate
, any of its CertificateRequest
s or the spec.secretName
secret.
TODO: add which projects have adopted metadata-only watches, especially with client-go informers
Transforming the object before it gets placed into cache. Client-go allows configuring core informers with transform functions. These functions will get called with the object as an argument before the object is placed into cache. The transformer will need to convert the object to a concrete or metadata type if it wants to retrieve its fields. This is a lesser used functionality in comparison with metadata only caching. A couple usage examples:
- support for transform functions was added in controller-runtime controller-runtime#1805 with the goal of allowing users to remove managed fields and annotations
- Istio's pilot controller uses this mechanism to configure their client-go informers to remove managed fields before putting object into cache I haven't seen any usage examples where non-metadata fields are modified using this mechanism. I cannot see a reason why new fields (i.e a label that signals that a transform was applied could not be added) as well as fields being removed.
There is an open KEP for replacing initial LIST with a WATCH kubernetes/enhancements#3667
Perhaps this would also reduce the memory spike on controller startup.
No
Will enabling / using this feature result in new API calls (i.e to Kubernetes apiserver or external services)?
There will be additional calls to kube apiserver to retrieve unlabelled Secret
s.
See Metrics and Risks and Mitigation
No new objects will be created
No, see Metrics
Use transform functions to remove data
for non-labelled Secret
s before adding them to informers cache
Watch all Secret
s as before. Use client-go's transform functions mechanism to remove the data
field for a Secret
that does not have a known cert-manager label before it gets placed in informer's cache. In the same transform function add a custom cert-manager.io/metadata-only
label to all Secret
s whose data
got removed (this label will only exist on the cached object).
In reconcilers, use a custom Secret
s getter that can get the Secret
either from kube apiserver or cache, depending on whether it has the cert-manager.io/metadata-only
label that suggests that the Secret
's data
has been removed.
Additionally, ensure that as many Secret
s as we can (ACME registry account keys) get labelled.
Users would be encouraged to add a cert-manager label to all Secret
s they create to reduce extra calls to kube apiserver.
In practice:
-
cert-manager would cache the full
Secret
object for allcertificate.spec.secretName
Secret
s and allSecret
s containing temporary private keys in almost all cases and would retrieve theseSecret
s from cache in almost all cases (see the section about Secrets for Certificates) -
cert-manager would cache the full
Secret
object for all labelled user createdSecret
s (issuer credentials) -
cert-manager would cache metadata only for user created unlabelled
Secret
s that are used by issuers/cluster-issuers and would call kube apiserver directly to retrieveSecret
data for thoseSecret
s -
cert-manager would cache metadata for all other unrelated cluster
Secret
s
This would need to start as an alpha feature and would require alpha/beta testing by actual users for us to be able to measure the gain in memory reduction in concrete cluster setup.
Here is a prototype of this solution.
In the prototype Secrets Transformer
function
is the tranform that gets applied to all Secret
s before they are cached. If a Secret
does not have any known cert-manager labels or annotations it removes data
, metada.managedFields
and metadata.Annotations
and applies a cert-manager.io/metadata-only
label.
SecretGetter
is used by any control loop that needs to GET a Secret
. It retrieves it from kube apiserver or cache dependign on whether cert-manager.io/metadata-only
label was found.
-
All cluster
Secret
s are still listed -
The transform functions only get run before the object is placed into informer's cache. The full object will be in controller's memory for a period of time before that (in DeltaFIFO store (?)). So the users will still see memory spikes when events related to cert-manager unrelated cluster
Secret
s occur. See performance of the protototype:
Create 300 cert-manager unrelated Secret
s of size ~1Mb:
Deploy cert-manager from https://github.com/irbekrm/cert-manager/tree/experimental_transform_funcs
Wait for cert-manager caches to sync, then run a command to label all Secret
s to make caches resync:
Observe that altough altogether memory consumption remains quite low, there is a spike corresponding to the initial listing of Secret
s:
We could cache PartialMetadata only for Secret
objects. This would mean having
just one, metadata, informer for Secret
s and always GETting the Secret
s
directly from kube apiserver.
Large number of additional requests to kube apiserver. For a default cert-manager installation this would mean slow issuance as client-go rate limiting would kick in. The limits can be modified via cert-manager controller flags, however this would then mean less availability of kube apisever to other cluster tenants.
Additionally, the Secret
s that we actually need to cache are not likely going to be large in size, so there would be less value from memory savings perspective.
Here is a branch that implements a very experimental version of using partial metadata only https://github.com/irbekrm/cert-manager/tree/just_partial.
The following metrics are approximate as the prototype could probably be optimized. Compare with metrics section of this proposal for an approximate idea of the increase in kube apiserver calls during issuance.
Deploy cert-manager from https://github.com/irbekrm/cert-manager/tree/just_partial
Run a script to set up 10 CA issuers, create 500 certificates and observe that the time taken is significantly higher than for latest version of cert-manager:
Observe high request latency for cert-manager:
Observe a large number of additional requests to kube apiserver:
LIST calls to kube apiserver can be paginated. Perhaps not getting all objects at once on the initial LIST would limit the spike in memory when cert-manager controller starts up.
However, currently it is not possible to paginate the initial LISTs made by client-go informers. Although it is possible to set page limit when creating a client-go informer factory or an individual informer, this will in practice not be used for the inital LIST. LIST requests can be served either from etcd or kube apiserver watch cache. Watch cache does not support pagination, so if a request is forwarded to the cache, the response will contain a full list. Client-go makes the inital LIST request with resource version 0 for performance reasons (to ensure that watch cache is used) and this results in the response being served from kube apiserver watch cache.
There is currently an open PR to implement pagination from watch cache kubernetes/kubernetes#108392.
Only watch Secret
s with known cert-manager.io
labels. Ensure that label gets applied to all Secret
s we manage (such as spec.secretName
Secret
for Certificate
).
We already ensure that all spec.secretName
Secret
s get annotated when synced- we can use the same mechanism to apply a label.
Users will have to ensure that Secret
s they create are labelled.
We can help them to discover which Secret
s that are currently deployed to cluster and need labelling with a cmctl
command.
In terms of resource consumption and calls to apiserver, this would be the most efficient solution (only relevant Secret
s are being listed/watched/cached and all relevant Secret
s are cached in full).
-
Bad user experience - breaking change to adopt and introduces a potential footgun after adoption as even if users labelled all relevant
Secret
s in cluster at time of adoption, there would likely be no visible warning if an unlabelledSecret
for an issuer got created at some point in future and things would just silently not work (i.eSecret
data updates would not trigger issuer reconcile etc). -
This feature would likely need to be opt-in 'forever' as else it would be a major breaking change when adopting and a potential footgun after adoption
-
Maintenance cost of the
cmctl
command: if a new user createdSecret
needs to be watched in a reconcile loop, the cmctl command would also need to be updated, which could be easily forgotten
Add a flag that allows users to pass a custom selector (a label or field filter)
See an example flag implementation for cainjector in #5174 thanks to @aubm for working on this.
It might work well for cases where 'known' selectors need to be passed that we could event document such as type!=helm.sh/release.v1
.
-
bad user experience- no straightforward way to tell if the selector actually does what was expected and an easy footgun especially when users attempt to specify which
Secret
s should (rather than shouldn't) be watched -
users should aim to use 'negative' selectors, but that be complicated if there is a large number of random
Secret
s in cluster that don't have a unifying selector
As suggested by @sftim https://kubernetes.slack.com/archives/C0EG7JC6T/p1671478591357519
We could have a standalone cache for typed Secret
s that gets populated by a standard watch for labelled Secret
s as well as from Secret
s that were retrieved in reconciler loops. A metadata only cache would also be maintained.
This should ensure that a Secret
that our control loop needs, but is not labelled only gets retrieved from kube apiserver once. So it should provide the same memory improvements as the main design, but should avoid additional kube apiserver calls in cases where users have unlabelled cert-manager related Secret
s in cluster.
- complexity of implementation and maintenance of a custom caching mechanism
Footnotes
-
fao = 'for attention of' ↩
-
We thought this might happen when the known cert-manager label gets added to or removed from a
Secret
. There is a mechanism for removing suchSecret
from a cache that should no longer have it, see this Slack conversation and when experimenting with the prototype implementation I have not observed stale cache when adding/removing labels ↩