Limit ingested Secrets to those actually used #2868

rainest · 2022-08-24T21:53:46Z

Is there an existing issue for this?

I have searched the existing issues

Problem Statement

Currently, KIC ingests all Secrets, places them in the store, and filters which make it into Kong configuration in the parser based on the presence of a konghq.com/ca-cert: true label or references from Ingresses, Gateways, KongConsumers, or Kong(Cluster)Plugins. Because Secrets are a generic resource, many Secrets in a cluster are not relevant to KIC (e.g. we do not need to ingest service account tokens, they will never be used in Kong configuration)

Although these Secrets are unused, we still ingest them, they consume (potentially a sizeable amount of) memory. Users cannot reduce the amount of memory used by any means other than simply using fewer Secrets, which isn't a viable option if you're actually using them.

Aside from CA certificate Secrets, we cannot determine if a Secret will be needed later based on the Secret alone. We need to know if there is a reference to the Secret from some other object, our standard generated controller cannot find those references.

Proposed Solution

controller-runtime provides a standard example pattern for tracking resources referenced by another. My initial read is that this should let us avoid reading Secrets we will not need.

We use a dedicated Secret controller that only ingests Secrets with the CA certificate label.
We add code to store Secret references in indices for resources that use them.
We add code to reconcilers that retrieves Secrets and upserts them to the store.
We add a new kind=Secret watch to controllers whose resources reference Secrets. These watches use new (*whateverReconciler) findObjectForSecret() map function handlers and the resource version changed predicate.
The new findObjectForSecret() handler lists and queues requests for any resource handled by that reconciler that includes a reference to that Secret. FWIW It doesn't look like this uses the new index directly; I'm guessing controller-runtime uses it as an under the hood optimization.
The queued requests update the store copy of the Secret via the get/upsert we added.

Additional information

This should not require changes to the parser or store. We'll still have Secrets in the store as usual, and the parser will still be able to find them when needed, but the store will only contain Secrets that the parser actually needs.
This is a significant enough split in the way that the controllers function that I expect we'll likely need to remove these controllers from the template system. We should consider removing it entirely, or possibly somehow making it more modular so that we inject individual boilerplate sections into separate controllers. Not sure if templating makes that easy.
We can't really test the end goal of "use less memory" in integration tests. We should conduct ad hoc testing along the lines of loading 1000 1MB garbage Secrets and 2 referenced TLS secrets and comparing memory usage before/after the change.

Acceptance Criteria

Controllers that reference Secrets retrieve those Secrets and add them to the store.
When a referenced Secret changes, controllers that reference them queue reconciles and update the Secret in the store.
The standalone Secret controller handles CA certificate Secrets only.
Existing user-facing functionality should remain unchanged. If that's not the case, we may need to revise this plan.

The text was updated successfully, but these errors were encountered:

rainest · 2022-09-01T17:16:47Z

Tentatively sticking this in 2.7: this is large enough that it may slip, but as a known performance limitation and general annoyance (we don't want to scan and generate logs for a bunch of irrelevant Secrets) I'd like to try and get it fixed sooner if possible.

randmonkey · 2022-10-12T08:48:02Z

Do we allow referencing a secret that does not exist when we create the referencing object(ingress/kongplugin/...)? I think we should allow it.

rainest · 2022-10-12T21:31:51Z

I think it has to be determined on a per-reference basis. There are some cases where you can potentially create partial configuration and some where you cannot.

In the proposed solution, for example, the reconciler for an Ingress is now responsible for adding both the Ingress and Secret to the store. If it processes an Ingress successfully, it won't process it again until the Ingress is modified. If the Secret isn't available at the time that it processes that reconcile event, it will not add the Secret to the store. You need to requeue the reconcile in order to eventually add the Secret if it becomes available.

While we do need to requeue the reconcile events if we can't fetch a referent, nothing stops us from adding the referrer to the store if some referent is unavailable. This may or may not be desirable or consistent with current behavior. Following the examples below, maintaining status quo would mean that Ingresses with missing cert Secrets are added to the store and do result in routes, whereas plugins with Secret configuration are not.

At present: kinda? It depends on where you're looking and the specific reference. Following are some examples for the two main areas that handle this.

Some checks in the admission webhook require referenced Secrets:

Consumer/Credential validation checks referenced Secrets to ensure, among other things, that consumers are not trying to reuse credentials. This check cannot return an affirmative answer if the Secret is not available, and so it fails altogether and rejects the proposed resource.
Kong(Cluster)Plugins that reference Secrets use them to provide configuration, and validating that configuration does not violate the schema is the main point of that validator, and it's impossible if the config is unavailable, so that similarly rejects the resource.

This is a general problem with cross-resource validation, and doesn't play nice with eventual consistency. However, absent a feedback mechanism by which we can indicate these resources are invalid after admission (the issues linked to FTI-4350), we can reasonably consider the consequences of allowing these through (no updates at all until they're fixed) worse than the annoyance of needing to create the Secrets first.

Elsewhere, in the parser, we may allow these or not depending on the specifics:

Ingresses referencing non-existent Secrets are admitted and the parser will process them into routes and mark that it should add a Kong certificate with the Ingress hostname later. Later, in getCerts(), the parser will see that the requested Secret is not in the store, and skip adding it. You'll get some configuration from the resource but not all.
KongPlugins that reference a Secret will simply return an empty plugin and an error if their Secret is available. There's no reasonable way to construct partial configuration from them.

The simplest way to implement this is to allow upserts without valid references in the modified controllers. AFAIK, none of the current controllers perform these checks because we've historically placed this responsibility primarily in the parser and have not had the controllers handle any checks outside their main resource (with the exception of some IngressClass-related checks). So long as you do requeue to ensure those resources do end up in the store, you can ignore broken references. This will preserve the status quo: the existing parser-level checks will continue functioning, and they are generally written with the expectation that references won't always work.

Whether this responsibility should live in the parser or whether we should allow partial configuration is something of an open question. My overall impression of controller-runtime is that it does expect you to place more validation in the controllers than we currently do, but that's subjective. Discussions with the Gateway API working group have generally leaned against allowing partial/best effort configuration and toward instead invalidating the top-level resource altogether until issues are fixed. The latter approach is usually less complicated to implement and avoids inferring a potentially incorrect interpretation of user intent, but is risky insofar as it can make typos catastrophic.

pmalek · 2022-10-13T10:15:08Z

I may not be able to fully comment on this issue but one note I'd like to make is that

Discussions with the Gateway API working group have generally leaned against allowing partial/best effort configuration and toward instead invalidating the top-level resource altogether until issues are fixed. The latter approach is usually less complicated to implement and avoids inferring a potentially incorrect interpretation of user intent, but is risky insofar as it can make typos catastrophic.

I'd say having an outright failure and returning a meaningful error to the user makes sense to me. It's better to have a consistent, expected failed state rather than a quasi-OK state where some settings/objects are applied and some are not.

Having said that, ingesting the referrer (e.g. Ingress) without having the Secret ready in the store sounds OK to me as long as we do not apply whatever the Ingress and its config entail (a route? only?) until the secret that's necessary to fulfil the "Ingress contract" (of applying a working route) is ingested by the store as well. I imaging that whatever is handling the secret (in secrets controller?) would be able to figure out somehow that there is an Ingress (in that particular scenario) waiting for it so the its config can be applied.

rainest · 2022-10-13T23:54:16Z

https://danielmangum.com/posts/controller-runtime-client-go-rate-limiting/ covers the "are requeues a problem if they never encounter an acceptable resource?" question quite extensively. controller-runtime does throttle requeues using exponential backoff. We can even tune the throttling behavior if that's not working well.

If we're in the expected race condition, where someone creates an Ingress and Secret in short succession, and the first reconcile runs before the Secret is visible, controller-runtime will requeue quickly and eventually successfully process the resource.

If someone instead forgets to create the referent Secret entirely, the requeues will never succeed and never end, but will eventually become very infrequent.

rainest added area/feature New feature or request area/perf Performance Related Issues size/large labels Aug 24, 2022

rainest added this to the KIC v2.7.0 milestone Sep 1, 2022

pmalek modified the milestones: KIC v2.7.0, KIC v2.8.0 Sep 27, 2022

randmonkey self-assigned this Sep 28, 2022

randmonkey mentioned this issue Oct 13, 2022

feat: store only secrets that are possible to be used in object cache of dataplane #3047

Merged

1 task

rainest mentioned this issue Oct 13, 2022

Provide parent name to parser service builder #2566

Closed

1 task

randmonkey closed this as completed in #3047 Oct 26, 2022

rainest mentioned this issue Jul 12, 2023

Change kongCredType to a label #2502

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit ingested Secrets to those actually used #2868

Limit ingested Secrets to those actually used #2868

rainest commented Aug 24, 2022

rainest commented Sep 1, 2022

randmonkey commented Oct 12, 2022

rainest commented Oct 12, 2022

pmalek commented Oct 13, 2022

rainest commented Oct 13, 2022

Limit ingested Secrets to those actually used #2868

Limit ingested Secrets to those actually used #2868

Comments

rainest commented Aug 24, 2022

Is there an existing issue for this?

Problem Statement

Proposed Solution

Additional information

Acceptance Criteria

rainest commented Sep 1, 2022

randmonkey commented Oct 12, 2022

rainest commented Oct 12, 2022

pmalek commented Oct 13, 2022

rainest commented Oct 13, 2022