Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP response status 503 must be handled as retry after a sleep #4530

Closed
mhoffrog opened this issue Mar 1, 2023 · 8 comments
Closed

HTTP response status 503 must be handled as retry after a sleep #4530

mhoffrog opened this issue Mar 1, 2023 · 8 comments

Comments

@mhoffrog
Copy link
Contributor

mhoffrog commented Mar 1, 2023

Currently acme.sh is failing on HTTP status 503.
As of https://community.letsencrypt.org/t/new-service-busy-responses-beginning-during-high-load/184174 this is to be handled with a re-send of the current request after a certain period of sleep.

Steps to reproduce

This issue can only be reproduced, if Let's Encrypt (LE) server is temporary overloaded. This most likely happens at the beginning of a month.
As you can see in the debug log - this situation is currently ending up with a Challenge error: ....

Debug log

...
[Wed Mar  1 21:31:13 UTC 2023] The txt record is added: Success.
[Wed Mar  1 21:31:13 UTC 2023] Sleep 800 seconds for the txt records to take effect
[Wed Mar  1 21:44:33 UTC 2023] ok, let's start to verify
[Wed Mar  1 21:44:33 UTC 2023] Verifying: my-domain.de
[Wed Mar  1 21:44:33 UTC 2023] d='my-domain.de'
[Wed Mar  1 21:44:33 UTC 2023] keyauthorization='6aDkv949NQt6XLaDMeb2BdqyjAdvHRxid2L-GbZ5d2M.kiChmmDdQVZ_qKAGwE8q-fIK0HMUF9VwVPPaVECfuFk'
[Wed Mar  1 21:44:33 UTC 2023] uri='https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/5565958713/UQFNQA'
[Wed Mar  1 21:44:33 UTC 2023] _currentRoot='dns_netcup'
[Wed Mar  1 21:44:33 UTC 2023] url='https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/5565958713/UQFNQA'
[Wed Mar  1 21:44:33 UTC 2023] payload='{}'
[Wed Mar  1 21:44:33 UTC 2023] POST
[Wed Mar  1 21:44:33 UTC 2023] _post_url='https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/5565958713/UQFNQA'
[Wed Mar  1 21:44:33 UTC 2023] _CURL='curl --silent --dump-header /home/runner/work/nc_wildcerts/nc_wildcerts/acme.sh/config/http.header  -L  -g '
[Wed Mar  1 21:44:33 UTC 2023] _ret='0'
[Wed Mar  1 21:44:33 UTC 2023] code='503'
[Wed Mar  1 21:44:33 UTC 2023] my-domain.de:Challenge error: {"type": "urn:ietf:params:acme:error:rateLimited", "detail": "Service busy; retry later."}
[Wed Mar  1 21:44:33 UTC 2023] Skip for removelevel:
[Wed Mar  1 21:44:33 UTC 2023] pid
[Wed Mar  1 21:44:33 UTC 2023] No need to restore nginx, skip.
[Wed Mar  1 21:44:33 UTC 2023] _clearupdns
[Wed Mar  1 21:44:34 UTC 2023] dns_entries='my-domain.de,_acme-challenge.my-domain.de,,dns_netcup,SxyZ3mAffFg2Ze7sRE3wstLRoVqRQ9GLb46ok49gRyE,/home/runner/work/nc_wildcerts/nc_wildcerts/acme.sh/dnsapi/dns_netcup.sh
my-domain.de,_acme-challenge.my-domain.de,,dns_netcup,u2oyCG6Pm8fdcGdVxlLtFwGf1s-gqflwW-2XFpjnaRk,/home/runner/work/nc_wildcerts/nc_wildcerts/acme.sh/dnsapi/dns_netcup.sh'
[Wed Mar  1 21:44:34 UTC 2023] Removing DNS records.
...

Potential fix

This issue is simple to be fixed.
PR #4531 will fix this issue.

@github-actions
Copy link

github-actions bot commented Mar 1, 2023

Please upgrade to the latest code and try again first. Maybe it's already fixed. acme.sh --upgrade If it's still not working, please provide the log with --debug 2, otherwise, nobody can help you.

@mhoffrog
Copy link
Contributor Author

mhoffrog commented Mar 2, 2023

Please upgrade to the latest code and try again first. Maybe it's already fixed. acme.sh --upgrade If it's still not working, please provide the log with --debug 2, otherwise, nobody can help you.

The logs are provided and this issue can be solved by merging PR #4531.

Neilpang added a commit that referenced this issue Mar 2, 2023
Neilpang added a commit that referenced this issue Mar 2, 2023
@Th0masL
Copy link

Th0masL commented Mar 15, 2023

Hello,

The changes in the PR #4531 is a breaking change for most people that are using acme.sh in automation mode.

I'm renewing a lot of certs, and it seems like when you hit the limit on the number of certificate that you can generate for a specific domain, the CA responds with an error, then is stuck.

The change that has been implemented in the PR above is basically making my script wait for hours for the CA to accept to renew my cert, when in fact it should have thrown an error and continue with the next certificate.

[Wed Mar 15 23:08:31 EET 2023] response='{
  "type": "urn:ietf:params:acme:error:rateLimited",
  "detail": "Error creating new order :: too many certificates (5) already issued for this exact set of domains in the last 168 hours: mycert.domain.com, retry after 2023-03-17T01:40:49Z: see https://letsencrypt.org/docs/duplicate-certificate-limit/",
  "status": 429
}'
[Wed Mar 15 23:08:31 EET 2023] It seems the CA server is currently overloaded, let's wait and retry. Sleeping 102737 seconds.

Can we re-open this issue and fix it differently ?

Thanks

Thomas

@Th0masL
Copy link

Th0masL commented Mar 16, 2023

Actually the problem is the line :

if [ "$code" = '503' ] || [ "$_retryafter" ]; then

but if I look at the Git blame, it points to this PR, so I'm not sure what/who changed the line, but that's a problem

@mhoffrog
Copy link
Contributor Author

I'm not sure what/who changed the line, but that's a problem

@Th0masL This was changed by @Neilpang after my PR #4531 - see commit cb8b341.
So this is exactly what I'd like to get fixed by this PR.

@Neilpang
Copy link
Member

I will fix the code like: if the sleeping time is too large( eg: larger than 600 seconds), we will not sleep, and just fallback as an error just like before.

do you think this is ok?

@Th0masL
Copy link

Th0masL commented Mar 17, 2023

In my case I think it would work and fix my problem, yes

@mhoffrog
Copy link
Contributor Author

I will fix the code like: if the sleeping time is too large( eg: larger than 600 seconds), we will not sleep, and just fallback as an error just like before.

do you think this is ok?

This is fine to me as well. I would just like to get logged out for information the cause responded from LE as well as the status code and the sleep time (which is already logged). Then we know all time what exactly is going on. That would be perfect!

Neilpang added a commit that referenced this issue Mar 17, 2023
youngmohsen added a commit to youngmohsen/acme.sh that referenced this issue Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants