Recognise finalized ACME Orders and gracefully recover by updating the Order's status when they are already in a "valid" state #2765

devfans · 2020-03-30T12:11:34Z

Describe the bug:
A clear and concise description of what the bug is.
I got:

E0330 11:50:35.237661       1 sync.go:447] cert-manager/controller/orders "msg"="failed to finalize Order resource due to bad request, marking Order as failed" "error"="403 urn:ietf:params:acme:error:orderNotReady: Order's status (\"valid\") is not acceptable for finalization" "resource_kind"="Order" "resource_name"="domain-com-1755579778-2867024102" "resource_namespace"="xx"
I0330 11:50:35.237704       1 sync.go:56] cert-manager/controller/orders "msg"="updating Order resource status" "resource_kind"="Order" "resource_name"="domain-com-1755579778-2867024102" "resource_namespace"="xx" 
E0330 11:50:35.249918       1 sync.go:59] cert-manager/controller/orders "msg"="failed to update status" "error"=null "resource_kind"="Order" "resource_name"="domain-com-c-1755579778-2867024102" "resource_namespace"="xx" 
E0330 11:50:35.249945       1 controller.go:140] cert-manager/controller/orders "msg"="re-queuing item  due to error processing" "error"="Operation cannot be fulfilled on orders.acme.cert-manager.io \"domain-com-1755579778-2867024102\": the object has been modified; please apply your changes to the latest version and try again" "key"="xx/domain-com-1755579778-2867024102"

Looks similiar as one old bug: #758
Expected behaviour:
Order should be completed

Steps to reproduce the bug:

after cert manager 0.14.1 is installed with helm. create cluster issuer with dns01 resolver and using route53 to generate a cert for *.domain.com

Anything else we need to know?:
It works well for single domain name like for a.domain.com, and. i specified dns01 with route53 as the only resolver in cluster issuer.

Environment details::

Kubernetes version (e.g. v1.10.2): v1.15.x
Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): eks on aws
cert-manager version (e.g. v0.4.0): 0.14.1
Install method (e.g. helm or static manifests): helm

/kind bug

The text was updated successfully, but these errors were encountered:

gigaSproule · 2020-04-16T11:03:15Z

I'm also seeing this same error for the http01 resolver and using a single domain. https://community.letsencrypt.org/t/orders-status-valid-is-not-acceptable-for-finalization/103419 suggests that everything is fine, but cert-manager isn't handling it properly.

munnerz · 2020-04-23T12:52:00Z

This should only happen if cert-manager has a cached status value of Ready when the Order is actually in a Valid (i.e. already finalized) state: https://github.com/jetstack/cert-manager/blob/49e1a7a51c71307718d29c4275bf3916906011c2/pkg/controller/acmeorders/sync.go#L442-L452

We could definitely look at trying to handle this better, e.g. by updating the Order's status to reflect the current state of the Order to allow for a later reconciliation to fetch the already issued certificate.

I'm interested to understand what has caused this case to come up in the first place however - it shouldn't really occur unless a previous attempt to finalize the Order succeeded but updating the Order's status to reflect this change failed. We could also consider using a PATCH instead of an UPDATE operation on the Order resource to try and avoid this case.

Can you see any logs that might indicate this has happened, and if so, any ideas what could cause it? Are you running multiple instances of cert-manager at once, or anything like that?

Nuru · 2020-05-08T04:45:41Z

@munnerz I have seen this, too. I believe it is due to #2741.

It appears that cert-manger is using some kind of hash of the SANs to use to name the CertificateRequest and Order; in any case, if you delete and re-create a certificate, the re-created CertificateRequest, Order, and Challenge all get the same names they had before.

I do not have the sequence exactly nailed down, but it goes something like this:

You fiddle around long enough creating and deleting certificates that you hit the Duplicate Certificate limit of 5 per week
You create another duplicate. Because of Certificate creation is stuck with Order having no state and no challenges are created for a specific URL #2741, that request just lingers in the background
You delete your Certificate because it was not issued and you are trying to goose it. This does not cancel the pending order (POST to new-order API) because it is stuck in a retry loop.
While the certificate is deleted, the rate limit window passes, and the order is accepted
You create another Certificate, trying to get it to work. Now you have the same order in your system twice. The abandoned order from earlier and the new one. Because the order is very recent, Let's Encrypt will return the same order with the same authorizations as before, and you end up finalizing it twice.

That does not seem exactly right to me, as I do not know all the details about how cert-manager processing works, but I think it is something like that. The key being that you can easily create duplicate orders.

jetstack-bot · 2021-09-15T23:25:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

Nuru · 2021-09-16T00:08:32Z

/remove-lifecycle stale

PSanetra · 2021-11-29T15:08:22Z

We observed exactly the same issue with v1.6.1 when we tried to renew the certificate with kubectl cert-manager -n my-namespace renew my-cert. It worked in most cases but failed with this issue in some.

We have recovered from this by deleting the certificate resource and issuing the renew command again.

jetstack-bot · 2022-02-27T17:20:23Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

irbekrm · 2022-03-11T10:37:04Z

This should have been fixed in cert-manager v1.7, see #4697

jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 30, 2020

munnerz added area/acme Indicates a PR directly modifies the ACME Issuer code priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Apr 23, 2020

munnerz changed the title ~~Route53 resolver shows Order's status (\"valid\") is not acceptable for finalization~~ Recognised finalized ACME Orders and gracefully recover by updating the Order's status when "already finalized" errors occur Apr 23, 2020

munnerz changed the title ~~Recognised finalized ACME Orders and gracefully recover by updating the Order's status when "already finalized" errors occur~~ Recognised finalized ACME Orders and gracefully recover by updating the Order's status when they are already in a "valid" state Apr 23, 2020

munnerz changed the title ~~Recognised finalized ACME Orders and gracefully recover by updating the Order's status when they are already in a "valid" state~~ Recognise finalized ACME Orders and gracefully recover by updating the Order's status when they are already in a "valid" state Apr 23, 2020

krezovic mentioned this issue Nov 8, 2020

ACME certificate obtain failure due to completed order caddyserver/caddy#3857

Closed

JoshVanL added triage/support Indicates an issue that is a support question. and removed triage/support Indicates an issue that is a support question. labels Jan 28, 2021

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 15, 2021

jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2021

PSanetra mentioned this issue Nov 29, 2021

Order stuck in errored state #4441

Closed

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2022

irbekrm closed this as completed Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recognise finalized ACME Orders and gracefully recover by updating the Order's status when they are already in a "valid" state #2765

Recognise finalized ACME Orders and gracefully recover by updating the Order's status when they are already in a "valid" state #2765

devfans commented Mar 30, 2020 •

edited

gigaSproule commented Apr 16, 2020 •

edited

munnerz commented Apr 23, 2020

Nuru commented May 8, 2020

jetstack-bot commented Sep 15, 2021

Nuru commented Sep 16, 2021

PSanetra commented Nov 29, 2021 •

edited

jetstack-bot commented Feb 27, 2022

irbekrm commented Mar 11, 2022

Recognise finalized ACME Orders and gracefully recover by updating the Order's status when they are already in a "valid" state #2765

Recognise finalized ACME Orders and gracefully recover by updating the Order's status when they are already in a "valid" state #2765

Comments

devfans commented Mar 30, 2020 • edited

gigaSproule commented Apr 16, 2020 • edited

munnerz commented Apr 23, 2020

Nuru commented May 8, 2020

jetstack-bot commented Sep 15, 2021

Nuru commented Sep 16, 2021

PSanetra commented Nov 29, 2021 • edited

jetstack-bot commented Feb 27, 2022

irbekrm commented Mar 11, 2022

devfans commented Mar 30, 2020 •

edited

gigaSproule commented Apr 16, 2020 •

edited

PSanetra commented Nov 29, 2021 •

edited