New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes a bug where a previous failed CertificateRequest was picked up during next issuance #4688
Conversation
…oller Issuing controller should only look at 'current' CertificateRequests Signed-off-by: irbekrm <irbekrm@gmail.com>
…are re-issuing for the same revision Signed-off-by: irbekrm <irbekrm@gmail.com>
/test pull-cert-manager-venafi |
/test pull-cert-manager-issuers-venafi-tpp |
/milestone v1.7 |
pkg/controller/certificates/requestmanager/requestmanager_controller.go
Outdated
Show resolved
Hide resolved
Signed-off-by: irbekrm <irbekrm@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, looks like a gap in the state machine!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: irbekrm, jakexks The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hm, I am digging into why a failed certificate request is re-created with the same private key as the previous issuance attempt despite |
// revision. Leave it to the certificate-requests controller to delete the | ||
// CertificateRequest and create a new one. | ||
if req.Status.FailureTime != nil && | ||
req.Status.FailureTime.Before(certIssuingCond.LastTransitionTime) && crReadyCond.Reason == cmapi.CertificateRequestReasonFailed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty nervous about this time comparison.. I don't think these fields are robust enough to use as part of the state machine. Why can't we inspect the actual revision
on the failed CertificateRequest (stored as an annotation) to determine whether it's from a previous issuance?
As an example of how this could trip up, if two requests fail in a row then the LastTransitionTime
won't be updated as the condition's state has not transitioned.
Would utilising the actual revision
annotation work for this?
What this PR does / why we need it:
This PR fixes a bug where after a certificate request fails and the next one succeeds, the
Certificate
's 'Ready' condition remains false and the contents of theSecret
are not updated.See #4642 for detailed bug description and instructions how to reproduce it.
Which issue this PR fixes
fixes #4642
Special notes for your reviewer:
This bug was caused by a race condition where during an issuance where the previous issuance failed, the failed
CertificateRequest
gets picked up by the issuing controller and used to updateCertificate
's status.This issue is fixed in ff67b2a by making the issuing controller compare timestamp on the
Issuing
condition on theCertificate
and the failure time on theCertificateRequest
and ignore theCertificateRequest
if it has failed before theIssuing
condition was set on theCertificate
as this means that it belongs to previous issuance.Additionally, this PR also changes the way how
certificates-request-manager
controller determines if it should delete a failedCertificateRequest
for this revision (see #4130 for context). Previously there was a check that the failure happened at least 1 hour (default backoff period) ago. This would mean that if a user runscmctl renew
command earlier, the failedCertificateRequest
won't get deleted so I've changed this to check that theCertificateRequest
has failed before theIssuing
condition was applied to theCertificate
.Looking at the timestamps is not the most reliable way how to determine if an action should be performed and resource statuses can be lost i.e after a backup and restore. However, in this case, the first step would always be to apply 'Issuing' condition to the
Certificate
in the trigger controller which is done after looking at the contents of theSecret
. I cannot think of other edge cases.Verify the fix (with Venafi TPP issuer):
CertificateRequest
andCertificate
's Ready condition set to true(See here for how to reproduce the bug)
Alternative approaches:
Same alternative approaches as those in the description of #4130 and same reasons as to why they weren't chosen
Related PRs:
#4130
Release note:
/kind bug