Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to update status.conditions denied by cert-manager webhook #297

Closed
erikgb opened this issue Oct 31, 2023 · 16 comments · Fixed by #303
Closed

Attempt to update status.conditions denied by cert-manager webhook #297

erikgb opened this issue Oct 31, 2023 · 16 comments · Fixed by #303

Comments

@erikgb
Copy link
Contributor

erikgb commented Oct 31, 2023

We have been running cert-manager for a long time in multi-tenant clusters - with the default built-in approver. Yesterday we promoted approver-policy to one of our busiest cluster (TEST). While everything seems to work as it should: new certificate requests are approved by approver-policy, we notice significant log noise in approver policy logs. Here is an example:

E1031 08:40:15.331193 1 controller.go:329] controller-manager "msg"="Reconciler error" "error"="failed to apply connection patch: admission webhook \"webhook.cert-manager.io\" denied the request: status.conditions: Forbidden: 'Approved' condition may not be modified once set" "CertificateRequest"={"name":"switchstatus-adapter-tls-53","namespace":"int-switchstatus"} "controller"="certificaterequest" "controllerGroup"="cert-manager.io" "controllerKind"="CertificateRequest" "name"="switchstatus-adapter-tls-53" "namespace"="int-switchstatus" "reconcileID"="00c940e8-87d9-4ae9-ae46-5c258203e7af"

CertificateRequest (after the error is logged):

apiVersion: cert-manager.io/v1
kind: CertificateRequest
metadata:
  annotations:
    cert-manager.io/certificate-name: switchstatus-adapter-tls
    cert-manager.io/certificate-revision: '53'
    cert-manager.io/private-key-secret-name: switchstatus-adapter-tls-z7w74
  resourceVersion: '3060778645'
  name: switchstatus-adapter-tls-53
  uid: b93e2a35-89bb-4186-9332-3c5e971eba0a
  creationTimestamp: '2023-10-30T13:44:46Z'
  generation: 1
  managedFields:
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:conditions':
            'k:{"type":"Approved"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
      manager: approver-policy
      operation: Apply
      subresource: status
      time: '2023-10-30T13:45:09Z'
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:ca': {}
          'f:certificate': {}
          'f:conditions':
            'k:{"type":"Approved"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Ready"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
      manager: cert-manager-certificaterequests-issuer-vault
      operation: Apply
      subresource: status
      time: '2023-10-30T13:45:09Z'
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:cert-manager.io/certificate-name': {}
            'f:cert-manager.io/certificate-revision': {}
            'f:cert-manager.io/private-key-secret-name': {}
          'f:labels':
            .: {}
            'f:app.kubernetes.io/managed-by': {}
            'f:app.kubernetes.io/name': {}
          'f:ownerReferences':
            .: {}
            'k:{"uid":"286e32ea-6d28-49fa-85c0-eff3666537c1"}': {}
        'f:spec':
          .: {}
          'f:duration': {}
          'f:issuerRef':
            .: {}
            'f:group': {}
            'f:kind': {}
            'f:name': {}
          'f:request': {}
          'f:usages': {}
      manager: cert-manager-certificates-request-manager
      operation: Update
      time: '2023-10-30T13:44:46Z'
  namespace: int-switchstatus
  ownerReferences:
    - apiVersion: cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Certificate
      name: switchstatus-adapter-tls
      uid: 286e32ea-6d28-49fa-85c0-eff3666537c1
  labels:
    app.kubernetes.io/managed-by: application-operator
    app.kubernetes.io/name: switchstatus-adapter
spec:
  duration: 336h0m0s
  extra:
    authentication.kubernetes.io/pod-name:
      - cert-manager-849dff9745-k5s6h
    authentication.kubernetes.io/pod-uid:
      - 075d6fef-7a32-457a-a5a9-9d3756680d72
  groups:
    - 'system:serviceaccounts'
    - 'system:serviceaccounts:cert-manager'
    - 'system:authenticated'
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: vault-spiffe-issuer
  request: >-
    <REDACTED>
  uid: 256a8935-1197-4f9a-915d-5dfd38d5f0c6
  usages:
    - digital signature
    - key encipherment
  username: 'system:serviceaccount:cert-manager:cert-manager'
status:
  ca: >-
    <REDACTED>
  certificate: >-
    <REDACTED>
  conditions:
    - lastTransitionTime: '2023-10-30T13:45:09Z'
      message: 'Approved by CertificateRequestPolicy: "vault-tls-cert"'
      reason: policy.cert-manager.io
      status: 'True'
      type: Approved
    - lastTransitionTime: '2023-10-30T13:45:09Z'
      message: Certificate fetched from issuer successfully
      reason: Issued
      status: 'True'
      type: Ready

I suspect this is a transitional problem that will disappear after all the certificates in our cluster are renewed, and possibly related to approver-policy attempting to modify conditions previously added by the default built-in approver. As everything seems to work as it should, I will leave the setup as it is now. As most of our certificates are only valid for a relatively short period (a.t.m. 6 days), we should see if the problems disappear by the end of this week.

But I still think this error-logging is annoying and should be avoided. Please let me know if more details about our setup is needed.

@SgtCoDFish
Copy link
Member

See also this slack thread

@erikgb
Copy link
Contributor Author

erikgb commented Nov 5, 2023

Based on our certificate renewal policy, the frequency of this error should have decreased if this was a transient error. This is not the case, so I suspect this to be a real bug.

It seems like cert-manager/cert-manager#3735 introduced this strict validation in cert-manager: https://github.com/cert-manager/cert-manager/blob/5141dddf2c0c5e10c5d4452c99a2e260c6eb2983/internal/apis/certmanager/validation/certificaterequest.go#L202-L214. I wonder if the validation was intentionally made extremely strict (DeepEqual), or if the validation could be loosened up a bit? I am pretty sure it's not vital to validate that conditions do not change at all, or?

@erikgb
Copy link
Contributor Author

erikgb commented Nov 5, 2023

Looking closer at the managedFields in the example request, I wonder if this might be a bug in cert-manager:

  managedFields:
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:conditions':
            'k:{"type":"Approved"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
      manager: approver-policy
      operation: Apply
      subresource: status
      time: '2023-10-30T13:45:09Z'
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:ca': {}
          'f:certificate': {}
          'f:conditions':
            'k:{"type":"Approved"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Ready"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
      manager: cert-manager-certificaterequests-issuer-vault
      operation: Apply
      subresource: status
      time: '2023-10-30T13:45:09Z'

I do not expect the cert-manager Vault issuer to approve certificate requests, but it seems like it (cert-manager-certificaterequests-issuer-vault) did from the managed fields.

@erikgb
Copy link
Contributor Author

erikgb commented Nov 5, 2023

Checking if cert-manager is configured correctly, since I am not expecting cert-manager to approve any requests:

I1103 11:05:17.741662 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-vault"
I1103 11:05:17.742502 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-acme"
I1103 11:05:17.742515 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="ingress-shim"
I1103 11:05:17.742521 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificaterequests-approver"
I1103 11:05:17.743122 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-metrics"
I1103 11:05:17.743150 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-trigger"
I1103 11:05:17.743166 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-key-manager"
I1103 11:05:17.743181 1 controller.go:215] "cert-manager/controller: starting controller" controller="issuers"
I1103 11:05:17.743197 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-revision-manager"
I1103 11:05:17.743214 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-issuing"
I1103 11:05:17.743319 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-request-manager"
I1103 11:05:17.746431 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-ca"
I1103 11:05:17.746998 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-selfsigned"
I1103 11:05:17.747009 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="certificatesigningrequests-issuer-venafi"
I1103 11:05:17.747549 1 controller.go:192] "cert-manager/controller: not starting controller as it's disabled" controller="gateway-shim"
I1103 11:05:17.748065 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-selfsigned"
I1103 11:05:17.748113 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-vault"
I1103 11:05:17.748131 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-venafi"
I1103 11:05:17.748147 1 controller.go:215] "cert-manager/controller: starting controller" controller="clusterissuers"
I1103 11:05:17.748159 1 controller.go:215] "cert-manager/controller: starting controller" controller="challenges"
I1103 11:05:17.748174 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificates-readiness"
I1103 11:05:17.748190 1 controller.go:215] "cert-manager/controller: starting controller" controller="orders"
I1103 11:05:17.748632 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-acme"
I1103 11:05:17.748656 1 controller.go:215] "cert-manager/controller: starting controller" controller="certificaterequests-issuer-ca"

@erikgb
Copy link
Contributor Author

erikgb commented Nov 5, 2023

Here is a newer request from today, which I think proves that there is a bug in cert-manager: the cert-manager Vault issuer sets/owns the Approved condition - with a timestamp after approver-policy approval. No idea why the error ends up in approver-policy tho. I cannot find any related errors/warnings in cert-manager logs.

apiVersion: cert-manager.io/v1
kind: CertificateRequest
metadata:
  annotations:
    cert-manager.io/certificate-name: prefixverification-forecast-selector
    cert-manager.io/certificate-revision: '7'
    cert-manager.io/private-key-secret-name: prefixverification-forecast-selector-9qvxg
  resourceVersion: '3086765553'
  name: prefixverification-forecast-selector-7
  uid: 05ae7f2b-8672-4185-87d5-bc5e6ba1cf4e
  creationTimestamp: '2023-11-05T09:57:44Z'
  generation: 1
  managedFields:
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:conditions':
            'k:{"type":"Approved"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
      manager: approver-policy
      operation: Apply
      subresource: status
      time: '2023-11-05T09:57:44Z'
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:ca': {}
          'f:certificate': {}
          'f:conditions':
            'k:{"type":"Approved"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Ready"}':
              .: {}
              'f:lastTransitionTime': {}
              'f:message': {}
              'f:reason': {}
              'f:status': {}
              'f:type': {}
      manager: cert-manager-certificaterequests-issuer-vault
      operation: Apply
      subresource: status
      time: '2023-11-05T09:57:45Z'
    - apiVersion: cert-manager.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:cert-manager.io/certificate-name': {}
            'f:cert-manager.io/certificate-revision': {}
            'f:cert-manager.io/private-key-secret-name': {}
          'f:labels':
            .: {}
            'f:app.kubernetes.io/managed-by': {}
            'f:app.kubernetes.io/name': {}
          'f:ownerReferences':
            .: {}
            'k:{"uid":"fc04c6eb-d67e-466c-93f2-0b0fa6a12f9f"}': {}
        'f:spec':
          .: {}
          'f:duration': {}
          'f:issuerRef':
            .: {}
            'f:group': {}
            'f:kind': {}
            'f:name': {}
          'f:request': {}
          'f:usages': {}
      manager: cert-manager-certificates-request-manager
      operation: Update
      time: '2023-11-05T09:57:44Z'
  namespace: preprod-fifty-for-imbalance-forecast-postprocessing
  ownerReferences:
    - apiVersion: cert-manager.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Certificate
      name: prefixverification-forecast-selector
      uid: fc04c6eb-d67e-466c-93f2-0b0fa6a12f9f
  labels:
    app.kubernetes.io/managed-by: identity-provider
    app.kubernetes.io/name: prefixverification-forecast-selector
spec:
  duration: 144h0m0s
  extra:
    authentication.kubernetes.io/pod-name:
      - cert-manager-5f86f8b489-6k548
    authentication.kubernetes.io/pod-uid:
      - f5d36458-dfb1-458e-8c79-31c78c87a0ea
  groups:
    - 'system:serviceaccounts'
    - 'system:serviceaccounts:cert-manager'
    - 'system:authenticated'
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: vault-spiffe-issuer
  request: >-
    <REDACTED>
  uid: 256a8935-1197-4f9a-915d-5dfd38d5f0c6
  usages:
    - client auth
  username: 'system:serviceaccount:cert-manager:cert-manager'
status:
  ca: >-
    <REDACTED>
  certificate: >-
    <REDACTED>
  conditions:
    - lastTransitionTime: '2023-11-05T09:57:44Z'
      message: 'Approved by CertificateRequestPolicy: "vault-spiffe"'
      reason: policy.cert-manager.io
      status: 'True'
      type: Approved
    - lastTransitionTime: '2023-11-05T09:57:45Z'
      message: Certificate fetched from issuer successfully
      reason: Issued
      status: 'True'
      type: Ready

@wallrj
Copy link
Member

wallrj commented Nov 7, 2023

@erikgb I looked at the code and my theory is that there are two problems:

  1. approver-policy sometimes attempts to approve an already approved CertificateRequest

    I'm hopeless at describing these sorts of interactions, but bear with me:

    a. cert-manager creates CR v0
    b. cert-manager and approver-policy both reconcile CR v0 in parallel:

    1. approver-policy begins reconciling CR v0 and performs policy checks , meanwhile
    2. cert-manager reconciles CR v0 and adds the Ready condition with status False,
    3. approver-policy cache receives the updated CR v1 with Ready=False and re-queues CR v1 to be re-reconciled
    4. approver-policy completes policy checks and sets Approved condition.

    c. approver-policy reconciles CR v1

    1. approver-policy completes policy checks and again attempts to set Approved condition
    2. cert-manager webhook denies the request and approver-policy logs the error

    d. approver-policy cache receives the updated CR v2 with Approved=True, Ready=False and re-queues CR v2 to be re-reconciled
    e. approver-policy ignores CR v2 because it already has an Approved condition.

One solution would be to modify approver-policy to expect this situation and ignore the error, let's call it ApprovedConditionAlreadySet, in much the same way that we might IgnoreNotFound or IgnoreAlreadyExists errors.

  1. cert-manager server side apply is setting (but not changing) the Approved condition and in so doing, taking joint ownership of that field.

    I haven't looked into this, but I would guess that cert-manager is creating a server-side apply patch which contains the existing conditions from the CR, rather than only the Ready condition.

@inteon
Copy link
Member

inteon commented Nov 8, 2023

@erikgb I looked at the code and my theory is that there are two problems:

  1. approver-policy sometimes attempts to approve an already approved CertificateRequest

    I'm hopeless at describing these sorts of interactions, but bear with me:

    a. cert-manager creates CR v0
    b. cert-manager and approver-policy both reconcile CR v0 in parallel:

    1. approver-policy begins reconciling CR v0 and performs policy checks , meanwhile
    2. cert-manager reconciles CR v0 and adds the Ready condition with status False,
    3. approver-policy cache receives the updated CR v1 with Ready=False and re-queues CR v1 to be re-reconciled
    4. approver-policy completes policy checks and sets Approved condition.

    c. approver-policy reconciles CR v1

    1. approver-policy completes policy checks and again attempts to set Approved condition
    2. cert-manager webhook denies the request and approver-policy logs the error

    d. approver-policy cache receives the updated CR v2 with Approved=True, Ready=False and re-queues CR v2 to be re-reconciled
    e. approver-policy ignores CR v2 because it already has an Approved condition.

One solution would be to modify approver-policy to expect this situation and ignore the error, let's call it ApprovedConditionAlreadySet, in much the same way that we might IgnoreNotFound or IgnoreAlreadyExists errors.

  1. cert-manager server side apply is setting (but not changing) the Approved condition and in so doing, taking joint ownership of that field.

    I haven't looked into this, but I would guess that cert-manager is creating a server-side apply patch which contains the existing conditions from the CR, rather than only the Ready condition.

I don't think the scenario you are describing is possible because each resource can only be reconciled one at a time

@wallrj
Copy link
Member

wallrj commented Nov 8, 2023

I don't think the scenario you are describing is possible because each resource can only be reconciled one at a time

Yes, and that is how it is in the scenario I described.

approver-policy is re-reconciling the CR at v1 before its cache has been updated, with the result of its previous reconcile.
I.e. approver-policy cache contains CR with the condition Ready=False, but not yet Approved=true/false.

@inteon
Copy link
Member

inteon commented Nov 8, 2023

I don't think the scenario you are describing is possible because each resource can only be reconciled one at a time

Yes, and that is how it is in the scenario I described.

approver-policy is re-reconciling the CR at v1 before its cache has been updated, with the result of its previous reconcile.
I.e. approver-policy cache contains CR with the condition Ready=False, but not yet Approved=true/false.

Thanks for explaining. I would expect the patch to be identical, so the webhook shouldn't complain about the condition changing, right?

@erikgb
Copy link
Contributor Author

erikgb commented Nov 8, 2023

Thanks for explaining. I would expect the patch to be identical, so the webhook shouldn't complain about the condition changing, right?

The cache is only updated from watch push events, ref. kubernetes-sigs/controller-runtime#1622, so if the cache is not updated from the previous round of reconciliation, the patch will not be identical. The lastTransitionTime field is the culprit.

Thinking a bit more about the suggested solution from @wallrj, I am more in favor of using a non-caching client when reconciling certificaterequests in approver-policy. Ignoring a custom error message will never be clean, and quite fragile. If anyone changes the literal error message in the cert-manager webhook, this will break again. The HTTP status code is just Forbidden - which could mean many things.

@wallrj
Copy link
Member

wallrj commented Nov 9, 2023

I am more in favor of using a non-caching client when reconciling certificaterequests in approver-policy.

Yes, I agree that would solve the problem.
You'd GET the CertificateRequest using a non-caching client, here:

cr := new(cmapi.CertificateRequest)
if err := c.lister.Get(ctx, req.NamespacedName, cr); err != nil {
return ctrl.Result{}, nil, client.IgnoreNotFound(err)
}

Ignoring a custom error message will never be clean, and quite fragile. If anyone changes the literal error message in the cert-manager webhook, this will break again. The HTTP status code is just Forbidden - which could mean many things.

The full error message is:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "admission webhook \"webhook.cert-manager.io\" denied the request: status.conditions: Forbidden: 'Approved' condition may not be modified once set",
  "reason": "NotAcceptable",
  "code": 406
}

Given the following patch.json:

{
    "apiVersion": "cert-manager.io/v1",
    "kind": "CertificateRequest",
    "status": {
        "conditions": [
            {
                "message": "new message",
                "type": "Approved"
            }
        ]
    }
}
curl 'https://127.0.0.1:44121/apis/cert-manager.io/v1/namespaces/default/certificaterequests/foo/status?fieldManager=cmctl&force=true' \
     --silent \
     --insecure \
     --header "Accept: application/json" \
     --header "Authorization: Bearer $(kubectl create token -n cert-manager cert-manager)" \
     --header "Content-Type: application/apply-patch+yaml" \
     --request PATCH \
     --data @patch.json

So that error response might be unique and stable enough to reliably discern it from other errors.
I'd probably choose this over adding a non-caching client because the code solving the problem would be easy to understand and close to the site of the problem.

The configuration of the non-caching client would be here, I suppose, which is a further away from where the site of the problem:

mgr, err := ctrl.NewManager(opts.RestConfig, ctrl.Options{
Scheme: policyapi.GlobalScheme,
LeaderElection: true,
LeaderElectionID: "policy.cert-manager.io",
LeaderElectionReleaseOnCancel: true,
LeaderElectionResourceLock: "leases",
LeaderElectionNamespace: opts.LeaderElectionNamespace,
ReadinessEndpointName: "/readyz",
HealthProbeBindAddress: opts.ReadyzAddress,
Metrics: server.Options{
BindAddress: opts.MetricsAddress,
},
WebhookServer: ctrlwebhook.NewServer(ctrlwebhook.Options{
Port: opts.Webhook.Port,
Host: opts.Webhook.Host,
TLSOpts: []func(*tls.Config){
func(cfg *tls.Config) {
cfg.GetCertificate = certificateSource.GetCertificate
},
},
}),
Logger: mlog,
})

Another option might be to use the cached CR resourceVersion in the patch request, which would cause a HTTP conflict error if the cached CR had since been modified by cert-manager or by an external issuer,
approver-policy could handle the conflict by requeuing the CR, by returning an error and relying on the error backoff mechanism or by explicitly requeuing the CR.
Some future reconcile attempt would eventually succeed once the approver-policy cache had the latest version of the CR.

E.g. the following json patch applied using the previous curl command:

{
    "apiVersion": "cert-manager.io/v1",
    "kind": "CertificateRequest",
    "metadata": {
        "resourceVersion": "999"
    },
    "status": {
        "conditions": [
            {
                "message": "new message",
                "type": "Approved"
            }
        ]
    }
}

Gives:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Operation cannot be fulfilled on certificaterequests.cert-manager.io \"foo\": the object has been modified; please apply your changes to the latest version and try again",
  "reason": "Conflict",
  "details": {
    "name": "foo",
    "group": "cert-manager.io",
    "kind": "certificaterequests"
  },
  "code": 409
}

@erikgb
Copy link
Contributor Author

erikgb commented Nov 9, 2023

Wow! 👏 Thanks for your deep analysis and suggestions, @wallrj! I didn't know you could use resourceVersion with SSA as with CSA. The difference is probably that it's required with CSA, but optional with SSA. I am more in favor of your last suggestion now. It will address the root cause of this issue. Handling the error from cert-manager is just a workaround IMO. WDYT? @inteon

@inteon
Copy link
Member

inteon commented Nov 12, 2023

I'm confused, does #303 not solve this issue? Do we still get these errors after merging that PR?

@erikgb
Copy link
Contributor Author

erikgb commented Nov 12, 2023

Confused? Why? I think it might solve this issue, but we won't know for sure. I do not have the intention to build and provision a snapshot version in our busiest cluster.

@wallrj
Copy link
Member

wallrj commented Nov 14, 2023

@inteon Yeah, I realise that I was over complicating...I hadn't looked carefully at the code. I still think the situation I described can happen. It will be interesting to see whether @erikgb observes these errors after he deploys this fix (albeit fewer of them).

And still think that there is a bug in the cert-manager server-side apply code which causes it to take joint field ownership of the approval conditions.

@erikgb
Copy link
Contributor Author

erikgb commented Nov 15, 2023

To all following this issue, I can report that it seems to work a lot better now! 🎉 I had a feeling the fix would not matter, but it has an effect....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants