Validate uniqueness of `prefix` field in `KongVault` resources #5395

randmonkey · 2024-01-04T09:46:27Z

Is there an existing issue for this?

I have searched the existing issues

Problem Statement

Split from #5333

The prefix of Kong vault is unique, but we do not have a CEL to check the uniqueness of the prefix field in KongVault resource. To avoid duplicate prefix to generate invalid Kong configuration causing Kong gateway stuck, we should check the uniqueness of prefix of KongVault in webhooks.

Proposed Solution

List all "managed" KongVaults and reject the update if the prefix of changed KongVault appears in other instances.

Additional information

No response

Acceptance Criteria

Creating or updating KongVault with duplicated prefix will be rejected by webhook

The text was updated successfully, but these errors were encountered:

rainest · 2024-01-08T18:23:14Z

Copied from related PR, webhook checks that rely on resources other than the single resource being evaluated are dicey.

Checking against the current Kong state may be less bad than trying to check other resources available in the Kubernetes API, but it's still imperfect. We can at least fetch a single prefix from the Kong API, since it is an API identifier: https://docs.konghq.com/gateway/3.3.x/admin-api/#retrieve-vault

randmonkey · 2024-01-10T10:11:47Z

Copied from related PR, webhook checks that rely on resources other than the single resource being evaluated are dicey.

Checking against the current Kong state may be less bad than trying to check other resources available in the Kubernetes API, but it's still imperfect. We can at least fetch a single prefix from the Kong API, since it is an API identifier: https://docs.konghq.com/gateway/3.3.x/admin-api/#retrieve-vault

Both solutions are not perfect and may judge on incorrect/inconsistent status. But I have the different opinion on which way to choose.
When we fetch other resources from controller's cache (NOT directly from kubernetes API), we may come into inconsistent status if other objects are not synced into the cache yet. But if we check against Kong gateway's API, it will reach consistency until at least a round of syncing configuration to Kong. The duration to reach consistency is probably longer if we check on Kong API. Also, calling Kong admin API is an external call on the network, it may consume more resource and have a longer latency.

What do you think is reason for checking on Kong API "less bad" than the other? @rainest

rainest · 2024-01-11T00:31:45Z

AFAIK the two variants result in functionally equivalent race conditions. While either approach avoids duplicates when there's sufficient time between resource creation, neither avoids duplicates when both are created at roughly the same time.

Cache inserts necessarily precede store inserts and config rendering, so I think the admin API approach actually exacerbates the problem: there's an additional 5s/tick period delay before the webhook will be able to detect a duplicate. Its advantage is more that we can check only the resource that matters, since we don't maintain a prefix index on the cache (we could add one though, and it's probably better to do so than listing and checking every KongVault).

Startup behavior is worth considering as well: a fresh controller start essentially creates all existing resources at the same time from the controller cache perspective, and furthermore does not involve the webhook. Failure policy can prevent resource creation during downtime, but we don't expose this per resource hook and default it to Ignore because of the hooks we create for Secrets and other non-custom resources ☹️

rainest · 2024-01-11T00:32:28Z

As an alternative, can we check at store insert time and add a conflict condition, and fail/requeue the reconcile?

Rejecting and retrying in the controller reconcile isn't a universally available option, but I think it's possible here:

KongVault resources have class information, so we do know that they're intended for our instance.
The prefix field does not undergo any complicated translation. We dump the value from the KongVault CR as-is into a unique field on the vault entity.

If we add an index on prefix, we can then fail adds that would create a conflict and, in the reconciler, requeue. We don't currently do this, but review of the reconciler->store path suggests we can.

The main limitation here is that there's no guarantee which resource will win if both are present when the controller starts. If you create a conflicting resource, ignore the failure conditions on it, and leave it in place, you may find that your configuration changes after a restart.

I'm actually not sure why we aren't doing this for other resources. It makes sense for KongConsumer usernames and custom IDs also. Given how simple it is, I have a nagging feeling we did consider it and found some reason to reject it, but can't recall when/what that was if so.

Question to the controller-runtime community on whether there's some way to handle post-restart state rebuilds well: https://kubernetes.slack.com/archives/C02MRBMN00Z/p1704934745286119

They note that you can use creationTime to decide precedence. Although we've been over the problems with using this to judge the "correct" resource in the past, it does provide a reasonable enough tiebreaker that'd allow consistent order through restarts.

That would lead to an edge case where modifying an existing older resource could clobber configuration, but consistency across restarts feels more valuable than avoiding that. IMO it's reasonable to make the prefix field immutable to avoid this. That field shouldn't change often (changing it would break all existing loose references in configuration using the vault), and needing to delete/recreate the CR if you typo it initially feels acceptable given the other benefits.

czeslavo · 2024-01-11T09:13:46Z

If we add an index on prefix, we can then fail adds that would create a conflict and, in the reconciler, requeue. We don't currently do this, but review of the reconciler->store path suggests we can.

The main limitation here is that there's no guarantee which resource will win if both are present when the controller starts. If you create a conflicting resource, ignore the failure conditions on it, and leave it in place, you may find that your configuration changes after a restart.

That sounds like an interesting idea. I'm wondering what we do in the following case (all mentioned vaults are duplicates):

Vault A is created, reconciled and added to the store.
Vault B is created, reconciled and rejected from being added to the store.
Controller restart happens.
Vault B is reconciled first, it's added to the store (as there's no conflict yet).
Vault A is reconciled, (?)

What would be the desired behavior for the question mark? I understand that we'd like to put the tiebreaker logic into the store. Would that mean it would remove the newer Vault B and add older Vault A (based on their creation timestamp), returning no error? What about propagating the conflict condition to Vault A in this case? I imagine we could trigger the reconciliation of Vault A by pushing an event via a channel and there we'd get an error from the store which would allow us to populate the condition properly. 🤔

Anyway, it's effectively a very similar effect that we'd get by breaking ties in the translator itself, we just shift where we keep the logic doing this. Maybe we could have a separate spike issue to verify if adding such logic to the store and controller would help us with other resources, just to not block the KongVault work?

randmonkey · 2024-01-11T09:44:12Z

I have created 2 issues to track these:

Reject KongVault with duplicate spec.prefix and generate translation error events #5423 for implementing "tie breaker" in translator
Check uniqueness of required fields in controller caches by Indexer instead of Store #5425 for implementing indexer to shift the rejection to controller cache.
Looks like the translator "tie breaker" does not require too much work, but implementing it in controller cache phase requires more discussion and more modifications on controller logic. Would we move the following discussion of details of controller cache solution to Check uniqueness of required fields in controller caches by Indexer instead of Store #5425?
@czeslavo @rainest

rainest · 2024-01-11T18:18:28Z

#As a much simpler alternative: prefix is effectively the same unique name field we have for routes/services/etc. (because the vault entity schema decided name was instead a reference to the vault implementation). We could reasonably just choose it for you from namespace+name, since that combination must be unique and there's a 1:1 relationship between KongVault and vault.

There is a constraint on the allowed values, though I'm not good at reading Lua patterns and am not quite sure how to express it as a regex instead. https://gitspartv.github.io/lua-patterns/?pattern=%5B%5E%5Ba-z%5D%5Ba-z%25d-%5D-%5Ba-z%25d%5D%2B%24%5D sorta helps.

    -- note: prefix must be valid in a host part of vault reference uri:
    -- {vault://<vault-prefix>/<secret-id>[/<secret-key]}
    { prefix = { description = "The unique prefix (or identifier) for this Vault configuration.", type = "string", required = true, unique = true, unique_across_ws = true,
      match = [[^[a-z][a-z%d-]-[a-z%d]+$]], not_one_of = VAULTS, indexed = true } },

I'm guessing this is approximately the hostname segment regex used for namespaces and names ([a-z]([-a-z0-9]*[a-z0-9])?), so we shouldn't have issues with disallowed characters, but do have to use a separator character (-, presumably) that could also appear in a namespace or name string.

rainest · 2024-01-11T19:22:15Z

I'm wondering what we do in the following case (all mentioned vaults are duplicates):
1. Vault A is created, reconciled and added to the store.

2. Vault B is created, reconciled and rejected from being added to the store.

3. Controller restart happens.

4. Vault B is reconciled first, it's added to the store (as there's no conflict yet).

5. Vault A is reconciled, (?)
What would be the desired behavior for the question mark?

I understand that we'd like to put the tiebreaker logic into the store. Would that mean it would remove the newer Vault B and add older Vault A (based on their creation timestamp), returning no error? What about propagating the conflict condition to Vault A in this case? I imagine we could trigger the reconciliation of Vault A by pushing an event via a channel and there we'd get an error from the store which would allow us to populate the condition properly.

The initial store add attempt fails due to a conflict. If the attempted resource (A) wins the tiebreaker, the error handler deletes the existing conflicting resource (B) from the store adds the proposed resource (A), and returns a successful reconcile for (A), yeah. The error handler logic would be split, one half handling tiebreaker winners, updating the store, and returning success; the other half adding a conflict condition to the reconciled resource and returning failure to requeue it.

I think the above's second question meant propagating the condition to B, since it's the one being evicted. I'd failed to consider that originally, but you're right, we need to mark it conflicted and start retrying it. The tiebreaker won half of the error handler would need to create a reconcile event via a channel source and the following reconcile would add the condition, since B will lose the tiebreaker.

Anyway, it's effectively a very similar effect that we'd get by breaking ties in the translator itself, we just shift where we keep the logic doing this. Maybe we could have a separate spike issue to verify if adding such logic to the store and controller would help us with other resources, just to not block the KongVault work?

The main benefit is that we do things "correctly" as far as controller-runtime is concerned. The translator can't do this as-is because the reconcile is already complete; the translator cannot return an error to the controller-runtime event processor and thus can't trigger a requeue or indicate a problem in the affected resource's status.

Theoretically you could maybe rig up something to share error metadata from the translator, trigger channel events from the translator, and have the reconciler evict/update status of resources that have a translator error until their version increments, but it seems backwards and overcomplicated to do so if you can handle it entirely within the reconciler. This could maybe be a way to achieve eviction of resources causing lockups without redesigning the entire reconciler/store/translator/Kong error handler though 🤔 (famous last words).

Other user-provided values that populate unique fields could also use the simpler all-reconciler approach, but I figured it'd make sense to build one and expand. Generating prefixes from namespace+name is the much simpler way to unblock this.

randmonkey · 2024-01-12T00:54:10Z

#As a much simpler alternative: prefix is effectively the same unique name field we have for routes/services/etc. (because the vault entity schema decided name was instead a reference to the vault implementation). We could reasonably just choose it for you from namespace+name, since that combination must be unique and there's a 1:1 relationship between KongVault and vault.

There is a constraint on the allowed values, though I'm not good at reading Lua patterns and am not quite sure how to express it as a regex instead. https://gitspartv.github.io/lua-patterns/?pattern=%5B%5E%5Ba-z%5D%5Ba-z%25d-%5D-%5Ba-z%25d%5D%2B%24%5D sorta helps.
    -- note: prefix must be valid in a host part of vault reference uri:
    -- {vault://<vault-prefix>/<secret-id>[/<secret-key]}
    { prefix = { description = "The unique prefix (or identifier) for this Vault configuration.", type = "string", required = true, unique = true, unique_across_ws = true,
      match = [[^[a-z][a-z%d-]-[a-z%d]+$]], not_one_of = VAULTS, indexed = true } },
I'm guessing this is approximately the hostname segment regex used for namespaces and names ([a-z]([-a-z0-9]*[a-z0-9])?), so we shouldn't have issues with disallowed characters, but do have to use a separator character (-, presumably) that could also appear in a namespace or name string.

Sounds like a reasonable solution for uniqueness. KongVault is a cluster scoped resource in current definition (no namespace), so we do not neeed a separator char. While this requires some re-design of KongVault resource.

randmonkey self-assigned this Jan 4, 2024

randmonkey added area/admission area/feature New feature or request labels Jan 4, 2024

randmonkey added this to the KIC v3.1.x milestone Jan 4, 2024

randmonkey mentioned this issue Jan 8, 2024

Implement controller and translator for KongVault #5384

Merged

1 task

randmonkey mentioned this issue Jan 9, 2024

feat: add uniqueness check of spec.prefix of KongVault #5412

Merged

1 task

randmonkey mentioned this issue Jan 11, 2024

Reject KongVault with duplicate spec.prefix and generate translation error events #5423

Closed

3 tasks

randmonkey mentioned this issue Jan 12, 2024

Use name in KongVault as prefix of vault in translation #5430

Closed

3 tasks

randmonkey closed this as completed in #5412 Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate uniqueness of `prefix` field in `KongVault` resources #5395

Validate uniqueness of `prefix` field in `KongVault` resources #5395

randmonkey commented Jan 4, 2024 •

edited

Loading

rainest commented Jan 8, 2024

randmonkey commented Jan 10, 2024 •

edited

Loading

rainest commented Jan 11, 2024

rainest commented Jan 11, 2024 •

edited

Loading

czeslavo commented Jan 11, 2024

randmonkey commented Jan 11, 2024 •

edited

Loading

rainest commented Jan 11, 2024 •

edited

Loading

rainest commented Jan 11, 2024

randmonkey commented Jan 12, 2024

Validate uniqueness of prefix field in KongVault resources #5395

Validate uniqueness of prefix field in KongVault resources #5395

Comments

randmonkey commented Jan 4, 2024 • edited Loading

Is there an existing issue for this?

Problem Statement

Proposed Solution

Additional information

Acceptance Criteria

rainest commented Jan 8, 2024

randmonkey commented Jan 10, 2024 • edited Loading

rainest commented Jan 11, 2024

rainest commented Jan 11, 2024 • edited Loading

czeslavo commented Jan 11, 2024

randmonkey commented Jan 11, 2024 • edited Loading

rainest commented Jan 11, 2024 • edited Loading

rainest commented Jan 11, 2024

randmonkey commented Jan 12, 2024

Validate uniqueness of `prefix` field in `KongVault` resources #5395

Validate uniqueness of `prefix` field in `KongVault` resources #5395

randmonkey commented Jan 4, 2024 •

edited

Loading

randmonkey commented Jan 10, 2024 •

edited

Loading

rainest commented Jan 11, 2024 •

edited

Loading

randmonkey commented Jan 11, 2024 •

edited

Loading

rainest commented Jan 11, 2024 •

edited

Loading