Skip to content

[bug]: Webhook installation is not idempotent #525

@anekdoti

Description

@anekdoti

Describe the bug

Webhooks are typically installed in the init container of the operator pod. It is possible, that the webhook installation fails, e.g., due to a connection loss to the Kubernetes API. In such cases, the init container is restarted until it succeeds.

However, if the ValidatingWebhookConfiguration or MutatingWebhookConfiguration are already created, the init container throws an exception (see below) hinting to a conflict with an already existing resource in the cluster.

This is due to that IMHO the method KubernetesClient.Save (https://github.com/buehler/dotnet-operator-sdk/blob/master/src/KubeOps.KubernetesClient/KubernetesClient.cs#L116) is not implemented correctly: for the decision whether to create or update a resource in the cluster, it checks whether the uid of the resource given as the argument to the method is null. In the webhook installation (and similar places in the framework), the resource given to the Save method are always freshly created, and therefore the uid is always null - independently of the possibility, that the resource already might exist in the cluster. A proper implementation of the Save method would check the existence of the resource in the cluster instead.

Another option would be to use the same pattern as for the service for the WebhookConfigurations, i.e., delete the already existing resource in the cluster before. I am not sure whether there is a reason why Save was used instead.

To reproduce

  1. Deploy a KubeOps operator and ensure that the webhook installation fails (e.g., by creating a MutatingWebhookConfiguration of the right name beforehand).
  2. Remove the obstacle
  3. Observe that further webhook installation attempts will fail (since the ValidatingWebhookConfiguration already got created)

Expected behavior

The webhook installation should succeed eventually.

Screenshots

The exception thrown by the webhook installer:

Create validator definition.
k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Conflict'
   at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.CreateClusterCustomObjectWithHttpMessagesAsync(Object body, String group, String version, String plural, String dryRun, String fieldManager, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.GenericClient.CreateAsync[T](T obj, CancellationToken cancel)
   at KubeOps.KubernetesClient.KubernetesClient.Create[TResource](TResource resource)
   at KubeOps.Operator.Commands.Management.Webhooks.Install.OnExecuteAsync(CommandLineApplication app)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.InvokeAsync(MethodInfo method, Object instance, Object[] arguments)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.OnExecute(ConventionContext context, CancellationToken cancellationToken)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.<>c__DisplayClass0_0.<<Apply>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at McMaster.Extensions.CommandLineUtils.CommandLineApplication.ExecuteAsync(String[] args, CancellationToken cancellationToken)
   at Program.<Main>$(String[] args) in /build/Controller/Program.cs:line 105
Unhandled exception. k8s.Autorest.HttpOperationException: Operation returned an invalid status code 'Conflict'
   at k8s.Kubernetes.SendRequestRaw(String requestContent, HttpRequestMessage httpRequest, CancellationToken cancellationToken)
   at k8s.AbstractKubernetes.k8s.ICustomObjectsOperations.CreateClusterCustomObjectWithHttpMessagesAsync(Object body, String group, String version, String plural, String dryRun, String fieldManager, Nullable`1 pretty, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken)
   at k8s.GenericClient.CreateAsync[T](T obj, CancellationToken cancel)
   at KubeOps.KubernetesClient.KubernetesClient.Create[TResource](TResource resource)
   at KubeOps.Operator.Commands.Management.Webhooks.Install.OnExecuteAsync(CommandLineApplication app)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.InvokeAsync(MethodInfo method, Object instance, Object[] arguments)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.OnExecute(ConventionContext context, CancellationToken cancellationToken)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.<>c__DisplayClass0_0.<<Apply>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at McMaster.Extensions.CommandLineUtils.CommandLineApplication.ExecuteAsync(String[] args, CancellationToken cancellationToken)
   at Program.<Main>$(String[] args) in /build/Controller/Program.cs:line 105
   at Program.<Main>(String[] args)

Additional Context

Kubernetes: v1.23
KubeOps: 7.0.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions