cert-manager created multiple CertificateRequest objects with the same certificate-revision #4956

jayme-github · 2022-03-17T16:44:06Z

Describe the bug:
At Wikimedia we run cert-manager with our own issuer and a cfssl PKI: https://gerrit.wikimedia.org/g/operations/software/cfssl-issuer

We've got a staging cluster with short-lived certificates (24h) where I noticed one not being refreshed.
From the data in API objects is seems as if the certificate was triggered for renewal 2022-02-06T17:00:03Z which led to the creation of CertificateRequest/toolhub-l8xjm first (2022-02-06T17:01:59Z) and CertificateRequest/toolhub-wvz2q second (2022-02-06T17:04:19Z), both sharing the same cert-manager.io/certificate-revision: 49 and cert-manager.io/private-key-secret-name: toolhub-rrvtj . See kubernetes_objects.yaml

During that time we had some pretty elevated latency on the kubernetes apiserver. Mainly CREATE and UPDATE on cert-manager.io/certificaterequest resources and apparently some connectivity issues with the kubernetes apiserver, as seen in cert-manager.log below.

I0206 17:00:03.399283       1 conditions.go:201] Setting lastTransitionTime for Certificate "toolhub" condition "Issuing" to 2022-02-06 17:00:03.399274083 +0000 UTC m=+10735.693974692
I0206 17:00:03.399216       1 trigger_controller.go:181] cert-manager/controller/certificates-trigger "msg"="Certificate must be re-issued" "key"="istio-system/toolhub" "message"="Renewing certificate as renewal was scheduled at 2022-02-06 17:00:00 +0000 UTC" "reason"="Renewing"
E0206 17:00:20.630289       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=10s: context deadline exceeded" "key"="istio-system/toolhub" 
E0206 17:00:31.650166       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: context deadline exceeded" "key"="istio-system/toolhub" 
E0206 17:00:43.668839       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: context deadline exceeded" "key"="istio-system/toolhub" 
E0206 17:00:48.729312       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: dial tcp 10.192.76.185:443: connect: connection refused" "key"="istio-system/toolhub" 
E0206 17:00:57.816629       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: dial tcp 10.192.76.185:443: connect: connection refused" "key"="istio-system/toolhub" 
E0206 17:02:20.402566       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="failed whilst waiting for CertificateRequest to exist - this may indicate an apiserver running slowly. Request will be retried" "key"="istio-system/toolhub" 
E0206 17:06:26.899074       1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="failed whilst waiting for CertificateRequest to exist - this may indicate an apiserver running slowly. Request will be retried" "key"="istio-system/toolhub" 
I0206 17:07:34.907870       1 trace.go:205] Trace[861803746]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.21.3/tools/cache/reflector.go:167 (06-Feb-2022 17:07:21.603) (total time: 13304ms):
Trace[861803746]: [13.30407288s] [13.30407288s] END
Trace[861803746]: ---"Objects listed" 13303ms (17:07:34.906)
I0206 17:07:57.199752       1 trace.go:205] Trace[782326899]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.21.3/tools/cache/reflector.go:167 (06-Feb-2022 17:07:35.007) (total time: 22100ms):
Trace[782326899]: [22.100469212s] [22.100469212s] END
Trace[782326899]: ---"Objects listed" 22095ms (17:07:57.102)
I0206 17:08:05.299156       1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "toolhub-l8xjm" condition "Approved" to 2022-02-06 17:08:04.500992384 +0000 UTC m=+44.594675153
I0206 17:08:05.199563       1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "toolhub-wvz2q" condition "Approved" to 2022-02-06 17:08:03.500823508 +0000 UTC m=+43.594506243
I0206 17:08:05.904575       1 trace.go:205] Trace[850423915]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.21.3/tools/cache/reflector.go:167 (06-Feb-2022 17:07:35.006) (total time: 30897ms):
Trace[850423915]: ---"Objects listed" 30895ms (17:08:05.902)
Trace[850423915]: [30.897705171s] [30.897705171s] END
I0206 17:08:08.699125       1 requestmanager_controller.go:210] cert-manager/controller/certificates-request-manager "msg"="Multiple matching CertificateRequest resources exist, delete one of them. This is likely an error and should be reported on the issue tracker!" "key"="istio-system/toolhub" 
I0206 17:08:09.201178       1 requestmanager_controller.go:210] cert-manager/controller/certificates-request-manager "msg"="Multiple matching CertificateRequest resources exist, delete one of them. This is likely an error and should be reported on the issue tracker!" "key"="istio-system/toolhub" 
I0206 17:08:09.300472       1 requestmanager_controller.go:210] cert-manager/controller/certificates-request-manager "msg"="Multiple matching CertificateRequest resources exist, delete one of them. This is likely an error and should be reported on the issue tracker!" "key"="istio-system/toolhub" 
E0206 17:08:10.599180       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="multiple CertificateRequests were found for the 'next' revision 49, issuance is skipped until there are no more duplicates" "key"="istio-system/toolhub" 
E0206 17:08:13.400912       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="multiple CertificateRequests were found for the 'next' revision 49, issuance is skipped until there are no more duplicates" "key"="istio-system/toolhub"
...this message keeps repeating ever since

2022-02-06T17:02:00.026484+00:00 controller.certificaterequest istio-system toolhub-l8xjm CertificateRequest has not been approved yet. Ignoring.
2022-02-06T17:02:00.423743+00:00 controller.certificaterequest istio-system toolhub-l8xjm CertificateRequest has not been approved yet. Ignoring.
2022-02-06T17:04:19.921125+00:00 controller.certificaterequest istio-system toolhub-wvz2q CertificateRequest has not been approved yet. Ignoring.
2022-02-06T17:04:20.280381+00:00 controller.certificaterequest istio-system toolhub-wvz2q CertificateRequest has not been approved yet. Ignoring.
2022-02-06T17:08:06.328312+00:00 controller.certificaterequest istio-system toolhub-wvz2q Initialising Ready condition
2022-02-06T17:08:07.376063+00:00 controller.certificaterequest istio-system toolhub-l8xjm Initialising Ready condition
2022-02-06T17:08:07.441243+00:00 controller.certificaterequest istio-system toolhub-wvz2q Signing cert with k8s_staging discovery true
2022-02-06T17:08:08.047227+00:00 controller.certificaterequest istio-system toolhub-l8xjm Signing cert with k8s_staging discovery true
2022-02-06T17:08:08.536874+00:00 controller.certificaterequest istio-system toolhub-l8xjm CertificateRequest is Ready. Ignoring.
2022-02-06T17:08:08.536832+00:00 controller.certificaterequest istio-system toolhub-wvz2q CertificateRequest is Ready. Ignoring.

Expected behaviour:
I'd have expected that even if two CertificateRequests would have been created they would not be allowed to share the same cert-manager.io/certificate-revision. Also I would have expected some kind of error metric telling me that something was wrong and/or the certmanager_certificate_ready_status being False or Unknown.

Steps to reproduce the bug:
Unfortunately I don't know how to reproduce.

Anything else we need to know?:

Environment details::

Kubernetes version: 1.16
Cloud-provider/provisioner: n/a
cert-manager version: 1.5.4
Install method: helm

/kind bug

The text was updated successfully, but these errors were encountered:

irbekrm · 2022-03-23T17:00:45Z

Hi @jayme-github thanks for creating the issue.

I'd have expected that even if two CertificateRequests would have been created they would not be allowed to share the same cert-manager.io/certificate-revision.

There shouldn't be two CertificateRequests with the same revision that have both succeeded. If a CertificateRequest fails, a new one will be created with the same revision in most cases. But this isn't relevant to what you are reporting as I see that both CRs have eventually succeeded.

E0206 17:02:20.402566 1 controller.go:163] cert-manager/controller/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="failed whilst waiting for CertificateRequest to exist - this may indicate an apiserver running slowly. Request will be retried" "key"="istio-system/toolhub"

It looks like what may have happened is that one CertificateRequest got created, cert-manager timed out waiting for it to be available due to the apiserver being slow, the reconciler was triggered again and, I guess due to the same apiserver slowness, cert-manager did not get the first CertificateRequest from the apiserver here and as it saw no CertificateRequests with that revision it created another one.

After that, once the connectivity stabilized, the two CertificateRequests would be reconciled and fulfilled seperately by another controller which isn't responsible for checking the number of CertificateRequests for revision, but just for issuing certs.

At the moment I am not sure if there is any way around this kind of issue.

Also I would have expected some kind of error metric telling me that something was wrong and/or the certmanager_certificate_ready_status being False or Unknown.

Had the certificate actually expired? The ready status reflects the status of the issued certificate, not whether any issuance succeeded or failed.
The main risk when this kind of issue happens is that too many certificate requests are made to an external issuer. It isn't really a failure state of the Certificate as such. I am not really sure how we could warn about this beyond what is already being logged.

jayme-github · 2022-03-24T09:57:11Z

Hi @irbekrm thanks for taking the time.

After that, once the connectivity stabilized, the two CertificateRequests would be reconciled and fulfilled seperately by another controller which isn't responsible for checking the number of CertificateRequests for revision, but just for issuing certs.

AIUI this would have meant that the result of one CertificateRequest (e.g. the signed certificate) should have ended up in a kubernetes.io/tls secret, right? As far as I can tell that never happened.

Had the certificate actually expired? The ready status reflects the status of the issued certificate, not whether any issuance succeeded or failed. The main risk when this kind of issue happens is that too many certificate requests are made to an external issuer. It isn't really a failure state of the Certificate as such. I am not really sure how we could warn about this beyond what is already being logged.

Yes, it did expire and certmanager_certificate_expiration_timestamp_seconds was reflecting that. But the certificate never entered an not ready state (metrics wise).
As I see it there where no new CertificateRequests created (apart from those two in existence). So further calls to the issuer should not have been made in this case.

From my naive point of view and looking at the logs it might be an option to add a new metric that counts events that end up being logged as re-queuing item due to error processing as elevated rates of those errors (for longer period of time maybe) might point towards an underlying issue.
The re-queuing event issuance is skipped until there are no more duplicates seems kind of special still, as cert-manager is unable to get out of this situation without human intervention. While I do appreciate that being logged so clearly, I would still like to be able to alert on a corresponding metric to call out for said human. :) I have not looked at the code at all, but could it be an option to just count "hard errors" like that making cleat it's not going to go away just by retrying?

irbekrm · 2022-03-24T19:30:19Z

Hi, thanks for the extra information, this is all very useful.

AIUI this would have meant that the result of one CertificateRequest (e.g. the signed certificate) should have ended up in a kubernetes.io/tls secret, right? As far as I can tell that never happened.

So the cert in toolhub-tls-certificate was never updated?
I had another look at the code and I see that this is possible that the controller responsible for writing the cert to the secret hit this case here which is pretty unhelpful as there is actually no error, it would just silently return nil and yes in that case the cert never gets written to the secret.
So that needs improving. Ideally I'd like for there to be no human intervention needed at this point, I'll see if it might make sense to actually read the cert from one of the CertificateRequests at that point anyway or else to delete the other CertificateRequest(s) in requestmanager

Yes, it did expire and certmanager_certificate_expiration_timestamp_seconds was reflecting that. But the certificate never entered an not ready state (metrics wise).

Do you know whether the Ready status of the Certificate was eventually set to false and if not, do you know for how long after the cert expired the Ready status remained true?
There is a controller responsible for updating the Ready condition of the certificate on basis of the contents of the Secret. It is possible that there was no event at the time when the cert expired that would have caused a reconcile of that controller, however we do re-sync all objects every 10 hours that, I think, should have resulted in the controller updating the Ready condition. (This also makes me realize that 10 hours is too long if we are relying on this to catch expired short lived certs in edge cases where the reconciler is not being triggered due to events)

From my naive point of view and looking at the logs it might be an option to add a new metric that counts events that end up being logged as re-queuing item due to error processing as elevated rates of those errors (for longer period of time maybe) might point towards an underlying issue.
The re-queuing event issuance is skipped until there are no more duplicates seems kind of special still, as cert-manager is unable to get out of this situation without human intervention. While I do appreciate that being logged so clearly, I would still like to be able to alert on a corresponding metric to call out for said human. :) I have not looked at the code at all, but could it be an option to just count "hard errors" like that making cleat it's not going to go away just by retrying?

Thank you, this is all useful insight- we don't manage production cert-manager installations, so for metrics in particular we rely on what users tell us they would find useful. For this particular case, I would like to get rid of the need of human intervention.
A metric for hard errors though could be useful, is this something that you have and use for other software?

jayme-github · 2022-03-25T13:58:17Z

So the cert in toolhub-tls-certificate was never updated?

Exactly, yes.

Ideally I'd like for there to be no human intervention needed at this point, I'll see if it might make sense to actually read the cert from one of the CertificateRequests at that point anyway or else to delete the other CertificateRequest(s) in requestmanager

That would be awesome!

Do you know whether the Ready status of the Certificate was eventually set to false and if not, do you know for how long after the cert expired the Ready status remained true?

The Certificate object that I attached was dumped ~30 days after the actual certificate had expired. So as far as I can tell Ready status never changed to false (and the certmanager_certificate_ready_status stayed True as well the whole time for that certificate). So that should have definitely been catched by the resync...

Not sure if it makes as difference in all of this but as it derives from default config let me point out that all our Certificate objects have revisionHistoryLimit: 2 set.

Thank you, this is all useful insight- we don't manage production cert-manager installations, so for metrics in particular we rely on what users tell us they would find useful. For this particular case, I would like to get rid of the need of human intervention. A metric for hard errors though could be useful, is this something that you have and use for other software?

It would be really nice if cert-manager would be able to resolve this situation on it's own in the future.
Having thought a bit more about a possible metric that would have exposed this I think (and that is what other software does as well) having a counter for re-queuing events would have helped in this case as well.
We would have seen a steep increase in the rate of those events, leading to someone investigating (and finding the root cause nicely printed in the logs). As this would also make other issues visible I guess it would be a good signal to have.

Introducing a new metric controller_requeue_count counting the number of re-queuing events issued per controller and reason. Current reasons can be either "optimistic-locking" (logged as INFO) or "processing-error" (logged as ERROR). This adds more visibility to potential issues randing from things like connection problems to the API or webhooks to possible hard errors. For context, please see cert-manager#4956

Introducing a new metric controller_requeue_count counting the number of re-queuing events issued per controller and reason. Current reasons can be either "optimistic-locking" (logged as INFO) or "processing-error" (logged as ERROR). This adds more visibility to potential issues randing from things like connection problems to the API or webhooks to possible hard errors. For context, please see cert-manager#4956 Signed-off-by: Janis Meybohm <jmeybohm@wikimedia.org>

Introducing a new metric controller_requeue_count counting the number of re-queuing events issued per controller and reason. Current reasons can be either "optimistic-locking" (logged as INFO) or "processing-error" (logged as ERROR). This adds more visibility to potential issues ranging from things like connection problems to the API or webhooks to possible hard errors. For context, please see cert-manager#4956 Signed-off-by: Janis Meybohm <jmeybohm@wikimedia.org>

Introducing a new metric controller_sync_error_count counting the number of errors during sync() of a controller. This adds more visibility to potential issues ranging from things like connection problems to the API or webhooks to possible hard errors. For context, please see cert-manager#4956 Signed-off-by: Janis Meybohm <jmeybohm@wikimedia.org>

smlx · 2022-06-09T03:11:33Z

Hi Just noting that I've also seen this issue, and also on a cluster with k8s API latency/availability issues.

The workaround was to delete the two certificaterequest objects with duplicate revisions and allow cert-manager to generate a new request successfully.

For one certificate we actually had three certificaterequests with duplicate revisions.

nathan-c · 2022-07-26T16:02:07Z

We also see this issue. We use Regional GKE clusters so don't have much control over k8s API latency. We use some fairly short lived (1hr) client certificates generated by namespaced cert-manager CA Issuer's. When this problem occurs we only have a limited time before our services start failing because of expired certificates. If we don't delete the duplicate CertificateRequest objects in this time then we can experience outages. I can see how this is low-priority for long lived certs but its critical for short lived ones.

We use the new metric to set up an alertmanager alert so we do at least have a chance of preventing an outage but it would be better if cert-manager was able to de-dupe these CertificateRequest objects itself (or never create them in the first place)

ptsk5 · 2022-08-01T10:06:08Z

I want to report the same (bad) behaviour. Are there any plans to fix this, please?

I0801 10:01:36.376261       1 requestmanager_controller.go:210] cert-manager/certificates-request-manager "msg"="Multiple matching CertificateRequest resources exist, delete one of them. This is likely an error and should be reported on the issue tracker!" "key"="my-ns/my-cert"

AcidLeroy · 2022-10-03T18:12:11Z

Hey @irbekrm, we are seeing this fairly frequently in our code base. At this point, we have seen it in versions v1.5.3 and in 1.8.0. For example, we see:

I0928 19:24:21.113681       1 requestmanager_controller.go:210] cert-manager/controller/certificates-request-manager "msg"="Multiple matching CertificateRequest resources exist, delete one of them. This is likely an error and should be reported on the issue tracker!" "key"="vmware-system-nsop/vmware-system-nsop-serving-cert"
E0928 19:24:21.125432       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="multiple CertificateRequests were found for the 'next' revision 1, issuance is skipped until there are no more duplicates" "key"="vmware-system-nsop/vmware-system-nsop-serving-cert"
E0928 19:24:22.126395       1 controller.go:163] cert-manager/controller/certificates-readiness "msg"="re-queuing item due to error processing" "error"="multiple CertificateRequests were found for the 'next' revision 1, issuance is skipped until there are no more duplicates" "key"="vmware-system-nsop/vmware-system-nsop-serving-cert"

In addition, when we look at the certificates, we see:

root@422ae4b32e9127ab363a62c945891ae0 [ ~ ]# kubectl -n vmware-system-nsop get certificaterequests.cert-manager.io
NAME                                    APPROVED   DENIED   READY   ISSUER                                 REQUESTOR                                                       AGE
vmware-system-nsop-serving-cert-q82rg   True                True    vmware-system-nsop-selfsigned-issuer   system:serviceaccount:vmware-system-cert-manager:cert-manager   4h9m
vmware-system-nsop-serving-cert-tkbkr   True                True    vmware-system-nsop-selfsigned-issuer   system:serviceaccount:vmware-system-cert-manager:cert-manager   4h10m

In our particular case, we are using self-signed issuers, so we don't care if we have to delete the duplicates. We have not been able to consistently get this error, however. It is very much intermittent and happens to different certificates within the system. Part of our problem here is that we cannot rely on manually deleting these requests. We need to get to the root cause of this, but I could imagine a workaround where we specify in the YAML specification of the certificate to "auto-resolve" the issue by deleting the duplicate certificate.

Environment details:

cert-manager: v1.5.3
Kubernetes version: v1.22.6

yannizhang2019 · 2022-10-04T17:19:44Z

We are seeing the same issue as well. We are using version 1.8.0.
There were numerous repeating error message -

E1004 15:35:00.237428       1 controller.go:163] cert-manager/certificates-readiness "msg"="re-queuing item due to error
 processing" "error"="multiple CertificateRequests were found for the 'next' revision 1, issuance is skipped until there are 
no more duplicates" "key"="cp4i/common-web-ui-ca-cert"

There were multiple CertificateRequests created by the cert manager for common-web-ui-ca-cert with the same revision. We manually deleted them. After that, the certificate creation was successful.

Is there any plan to fix the root cause?

AcidLeroy · 2022-10-05T17:06:22Z

@munnerz @irbekrm per our discussion today in the bi-weekly, I was able to dig up some logs that did indeed contain the log statement,

 1 controller.go:166] cert-manager/certificates-request-manager "msg"="re-queuing item due to error processing" "error"="failed whilst waiting for CertificateRequest to exist - this may indicate an apiserver running slowly. Request will be retried" "key"="vmware-system-tkg/tkr-resolver-cluster-webhook-serving-cert"

Then in a few lines below that, I see:

1 requestmanager_controller.go:217] cert-manager/certificates-request-manager "msg"="Multiple matching CertificateRequest resources exist, delete one of them. This is likely an error and should be reported on the issue tracker!" "key"="vmware-system-tkg/tkr-resolver-cluster-webhook-serving-cert"

So, it does appear that the apiserver could play a role in this issue.

jetstack-bot · 2023-01-03T17:49:47Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

ksauzz · 2023-01-04T01:42:42Z

/remove-lifecycle stale

sathyanarays · 2023-01-04T09:28:38Z

@ksauzz , this has been fixed and the fix has been disabled by default. It can be enabled using the feature flag StableCertificateRequestName. Please let us know if this works!

ksauzz · 2023-01-05T01:30:01Z

@sathyanarays Thank you for the fix! I didn't notice that. Let me test it.

We had issues with cert-manager crating multiple CertificateRequest objects with the same certificate-revision in the past, see: * https://phabricator.wikimedia.org/T304092 * cert-manager/cert-manager#4956 Upstream introduced a fix that ensures CertificateRequest objects are created with predictable names, so no duplicates are possible: * cert-manager/cert-manager#5487 This fix is hidden behind a feature gate which this change opens for wikikube staging clusters. Bug: T304092 Change-Id: Ibb063cc653fc24dc306282154892c6a6b25f705e

jetstack-bot · 2023-04-05T02:23:30Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

ksauzz · 2023-05-02T06:31:32Z

We have used StableCertificateRequestName for 3 months, and it works to us nicely so far. Thank you!

jetstack-bot · 2023-06-01T07:20:06Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale

jetstack-bot · 2023-07-01T07:20:43Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

jetstack-bot · 2023-07-01T07:20:45Z

@jetstack-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ffilippopoulos · 2023-11-09T10:24:32Z

We've seen this again running cert-manager v1.13.1 which has the StableCertificateRequestName feature gate enabled by default.

E1024 15:15:56.304905       1 controller.go:167] "cert-manager/certificates-readiness: re-queuing item due to error processing" err="multiple CertificateRequests were found for the 'next' revision 117, issuance is skipped until there are no more duplicates" key="otel/otel-collector-kafka-client"

Reading through the comments in the code: https://github.com/cert-manager/cert-manager/blob/master/pkg/controller/certificates/requestmanager/requestmanager_controller.go#L424-L426 looks like disabling the flag will actually help us prevent this issue.
@sathyanarays Are we missing something here, or shall I try disabling the feature?

atsang36 · 2024-02-28T18:15:50Z

Hi all,

Like @ffilippopoulos I'm still seeing this issue in v1.13.3 as well. Does disabling the flag work?

@ksauzz @sathyanarays @irbekrm

jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 17, 2022

jayme-github mentioned this issue Mar 25, 2022

Add controller_sync_error_count metric #4987

Merged

ptsk5 mentioned this issue Aug 1, 2022

requestmanager_controller got stuck in a loop and stopped generating new certificates afterward #3565

Closed

jroper mentioned this issue Sep 9, 2022

6 certificaterequests are creating for one single certificate and causing daily limit to exceed #4696

Closed

inteon mentioned this issue Oct 5, 2022

log more information on why the get CertificateRequest request failed #5484

Merged

AcidLeroy mentioned this issue Oct 5, 2022

Increase timeout waitForCertificateRequestToExist #5485

Closed

sathyanarays mentioned this issue Oct 6, 2022

Generate Certificate Request with predictable name #5487

Merged

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2023

jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2023

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2023

jetstack-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 1, 2023

jetstack-bot closed this as completed Jul 1, 2023

misalcedo mentioned this issue Mar 7, 2024

Duplicate CertificateRequests for next revision require manual intervention #6837

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cert-manager created multiple CertificateRequest objects with the same certificate-revision #4956

cert-manager created multiple CertificateRequest objects with the same certificate-revision #4956

jayme-github commented Mar 17, 2022

irbekrm commented Mar 23, 2022

jayme-github commented Mar 24, 2022

irbekrm commented Mar 24, 2022

jayme-github commented Mar 25, 2022

smlx commented Jun 9, 2022 •

edited

Loading

nathan-c commented Jul 26, 2022

ptsk5 commented Aug 1, 2022 •

edited

Loading

AcidLeroy commented Oct 3, 2022

yannizhang2019 commented Oct 4, 2022

AcidLeroy commented Oct 5, 2022

jetstack-bot commented Jan 3, 2023

ksauzz commented Jan 4, 2023

sathyanarays commented Jan 4, 2023

ksauzz commented Jan 5, 2023

jetstack-bot commented Apr 5, 2023

ksauzz commented May 2, 2023

jetstack-bot commented Jun 1, 2023

jetstack-bot commented Jul 1, 2023

jetstack-bot commented Jul 1, 2023

ffilippopoulos commented Nov 9, 2023

atsang36 commented Feb 28, 2024 •

edited

Loading

cert-manager created multiple CertificateRequest objects with the same certificate-revision #4956

cert-manager created multiple CertificateRequest objects with the same certificate-revision #4956

Comments

jayme-github commented Mar 17, 2022

irbekrm commented Mar 23, 2022

jayme-github commented Mar 24, 2022

irbekrm commented Mar 24, 2022

jayme-github commented Mar 25, 2022

smlx commented Jun 9, 2022 • edited Loading

nathan-c commented Jul 26, 2022

ptsk5 commented Aug 1, 2022 • edited Loading

AcidLeroy commented Oct 3, 2022

yannizhang2019 commented Oct 4, 2022

AcidLeroy commented Oct 5, 2022

jetstack-bot commented Jan 3, 2023

ksauzz commented Jan 4, 2023

sathyanarays commented Jan 4, 2023

ksauzz commented Jan 5, 2023

jetstack-bot commented Apr 5, 2023

ksauzz commented May 2, 2023

jetstack-bot commented Jun 1, 2023

jetstack-bot commented Jul 1, 2023

jetstack-bot commented Jul 1, 2023

ffilippopoulos commented Nov 9, 2023

atsang36 commented Feb 28, 2024 • edited Loading

smlx commented Jun 9, 2022 •

edited

Loading

ptsk5 commented Aug 1, 2022 •

edited

Loading

atsang36 commented Feb 28, 2024 •

edited

Loading