-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
requestmanager_controller got stuck in a loop and stopped generating new certificates afterward #3565
Comments
We recently observed a similar issue where multiple CertificateRequests were created by cert-manager (not cainjector in our case) for the same Certificate. First,
Then an retry:
And another CertificateRequest,
After which there are repeated errors during processing of both requests:
Eventually both requests seemingly succeeded:
After which they were considered Ready:
However, no secret was created:
A couple of other points:
|
As the problem continues to occur for us, I started digging into the possible root cause and a solution to see if I can be of any help here. The issue seems to occur after a failure of Server API communication that is followed shortly by a certificate issuance process. At that particular moment, there might be a delay between creating a Server API object and a subsequent local cache update from a shared informer. Therefore, creating a new CertificateRequests in the CertificateRequestManager might return an error because the local cache wasn't updated in a specified timeout. As a result, subsequent retries of processing (i.e. ProcessItem) won't be able to find previous CertificateRequests (with identical revisions) and will continue creating new CertificateRequests. However, the original CertificateRequests object exists on Server API. The consequence of this behaviour is that the CertificateIssuing controller won't move further and renewal of the given Certificate stops forever. The relevant piece of code is here:
What do you think? Thanks. |
I'm facing the same issue in OVH.. Multiple CertificateRequests created and multiple entries like this in cert-manager pod log:
|
Same issue for me, infinete creation of certificaterequest, one every 30-40 seconds; no orders and no challenges created. This is a sample log from cert-manager:
cainjector is not involved, no special logs inside. |
Same issue on OVH cloud provider, as the certmanager controller continues to spawn new CertificateRequest objects, without ever detecting them. Is there any progress ? |
Exact same issue here too. Same OVH cloud provider. Any ideas ? Thank you in advance for any response, I'm still investigating ! |
Order is not created from the certificateRequest. |
+1 |
Hello everyone. Im having the same problem with Ovh provider. Did someone find how to solve this problem? |
Hi! I was not able to reproduce the issue (yet). I created a cluster using the OVHCloud Managed Kubernetes offering; I then created a lot (5000) certificates hoping that I would trigger the message "this may indicate an apiserver running slowly"; instead, I have hit a quota limit ("The OVHcloud storage quota has been reached"). My guess is that (somehow) the apiserver, or etcd instance, runs slowly, leading to the informer cache being updated too late (more than 5 seconds). I don't think the issue comes from cert-manager itself, but rather the fact that 5 seconds might be too short for some Kubernetes clusters. Related: |
For the record, for other OVH users having this issue and looking for a workaround/quick fix :
|
I'm seeing this using a ca issuer. We've only got one Certificate on this cluster, so I doubt its a throughput problem and we shouldn't be reaching out to any external CAs. We're running on Azure AKS, Kubernetes v1.20.9. Logs attached - the interesting bits with my comments are here:
To fix it I had to delete both existing CertificateRequests. I first tried just deleting the oldest one and a new CertificateRequest was immediately created, so I deleted the other old one and the Certificate was renewed. |
having the same issue with ovh aswell. @pdesgarets solution fixed the infinite loop (thx 💪🏼) But this issue need to be adressed properly, a sync error shouldn't result in.. that |
Hi, I still have the same issue with ovh for the second time... |
Issues go stale after 90d of inactivity. |
/remove-lifecycle stale |
Same problem here, I gave up, I'll use Traefik. |
Is there any progress ? |
I want to report the same (bad) behaviour. Are there any plans to fix this, please? (the same question posted here)
|
Checking the k8s API server logs clarified things further. For me, this happened because cert-manager-webhook pod was not up. |
Issues go stale after 90d of inactivity. |
Stale issues rot after 30d of inactivity. |
Rotten issues close after 30d of inactivity. |
@jetstack-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Describe the bug:
At some point, it seems that the communication between the cert-manager-cainjector and ServerAPI stopped working (we received few EOF logs and subsequently "Successfully Reconciled" logs in the cert-manager-cainjector). However, after the communication restarted, we started receiving:
After another while (like 10s), the controller moved further in the processing of items, but outputted this log for all the previous logs:
Afterward, the generation of this certificate stopped altogether.
In the Kubernetes environment, we could see that multiple CertificateRequest objects have been generated for stan-client-tls Certificate with the same revision number. So probably, the client interface (https://github.com/jetstack/cert-manager/blob/cdc53b65cbd344dbef64f0c5c22e6070e79c5b5c/pkg/controller/certificates/requestmanager/requestmanager_controller.go#L339) was fully working and creating new instances, while certificateRequestLister was unable to get proper current state (https://github.com/jetstack/cert-manager/blob/cdc53b65cbd344dbef64f0c5c22e6070e79c5b5c/pkg/controller/certificates/requestmanager/requestmanager_controller.go#L165).
Expected behaviour:
The controller should probably delete the unused CertificateRequests objects and continue with creating new ones until one of them succeeds.
Environment details::
/kind bug
The text was updated successfully, but these errors were encountered: