Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent "badNonce" errors #627

Closed
ppaeps opened this issue Feb 16, 2017 · 15 comments
Closed

Frequent "badNonce" errors #627

ppaeps opened this issue Feb 16, 2017 · 15 comments

Comments

@ppaeps
Copy link

ppaeps commented Feb 16, 2017

Similar issues seem to have been reported in other ACME clients and the suggestion appears to be "just retry a couple of times when you get a badNonce". Maybe acme.sh should just retry a few times too?

Steps to reproduce

acme.sh --issue -d example.com --dns dns_custom --dnssleep 600

I have verified that my dns_custom script correctly adds and removes the correct records from the DNS and that I can query the added records from the internet.

Debug log

[Thu Feb 16 16:03:45 CET 2017] Sleep ESC[1;31;32m600ESC[0m seconds for the txt records to take effect
[Thu Feb 16 16:14:59 CET 2017] ok, let's start to verify
[Thu Feb 16 16:14:59 CET 2017] Verifying:example.com
[Thu Feb 16 16:14:59 CET 2017] d='example.com'
[Thu Feb 16 16:14:59 CET 2017] keyauthorization='xxx'
[Thu Feb 16 16:14:59 CET 2017] uri='https://acme-v01.api.letsencrypt.org/acme/challenge/xxx/yyy'
[Thu Feb 16 16:14:59 CET 2017] _currentRoot='dns_custom'
[Thu Feb 16 16:14:59 CET 2017] url='https://acme-v01.api.letsencrypt.org/acme/challenge/xxx/yyy'
[Thu Feb 16 16:14:59 CET 2017] payload='{"resource": "challenge", "keyAuthorization": "xxx"}'
[Thu Feb 16 16:14:59 CET 2017] POST
[Thu Feb 16 16:14:59 CET 2017] url='https://acme-v01.api.letsencrypt.org/acme/challenge/xxx/yyy'
[Thu Feb 16 16:14:59 CET 2017] _CURL='curl -L --silent --dump-header /home/certbot/.acme.sh/http.header '
[Thu Feb 16 16:14:59 CET 2017] _ret='0'
[Thu Feb 16 16:14:59 CET 2017] code='400'
[Thu Feb 16 16:14:59 CET 2017] example.com:Challenge error: {"type":"urn:acme:error:badNonce","detail":"JWS has invalid anti-replay nonce xxx","status": 400
}
[Thu Feb 16 16:14:59 CET 2017] Skip for removelevel:
[Thu Feb 16 16:14:59 CET 2017] pid
[Thu Feb 16 16:14:59 CET 2017] No need to restore nginx, skip.
[Thu Feb 16 16:14:59 CET 2017] _clearupdns
[Thu Feb 16 16:15:00 CET 2017] txt='xxx'
@ppaeps
Copy link
Author

ppaeps commented Feb 16, 2017

More debugging output:

[Thu Feb 16 16:45:19 CET 2017] Sleep 600 seconds for the txt records to take effect                                                                                                      [80/9848]
[Thu Feb 16 16:55:42 CET 2017] ok, let's start to verify
[Thu Feb 16 16:55:42 CET 2017] Verifying:example.com
[Thu Feb 16 16:55:42 CET 2017] d='example.com'
[Thu Feb 16 16:55:42 CET 2017] keyauthorization='xxx'
[Thu Feb 16 16:55:42 CET 2017] uri='https://acme-v01.api.letsencrypt.org/acme/challenge/xxx/yyy'
[Thu Feb 16 16:55:42 CET 2017] _currentRoot='dns_custom'
[Thu Feb 16 16:55:42 CET 2017] url='https://acme-v01.api.letsencrypt.org/acme/challenge/xxx/yyy'
[Thu Feb 16 16:55:42 CET 2017] payload='{"resource": "challenge", "keyAuthorization": "xxx"}'
[Thu Feb 16 16:55:42 CET 2017] Use cached jwk for file: /home/certbot/.acme.sh/ca/acme-v01.api.letsencrypt.org/account.key
[Thu Feb 16 16:55:42 CET 2017] Use _CACHED_NONCE='JzUc_BMcKN-mzCHSJH-j3rAjR4h_ya0QZbIc9iLOlhw'
[Thu Feb 16 16:55:42 CET 2017] nonce='JzUc_BMcKN-mzCHSJH-j3rAjR4h_ya0QZbIc9iLOlhw'
[Thu Feb 16 16:55:42 CET 2017] POST
[Thu Feb 16 16:55:42 CET 2017] url='https://acme-v01.api.letsencrypt.org/acme/challenge/xxx/yyy'
[Thu Feb 16 16:55:42 CET 2017] body='{"header": {"alg": "RS256", "jwk": {"e": "AQAB", "kty": "RSA", "n": "xxx"}}, "protected": "xxx", "payload": "xxx"}'
[Thu Feb 16 16:55:42 CET 2017] _CURL='curl -L --silent --dump-header /home/certbot/.acme.sh/http.header  --trace-ascii /tmp/tmp.1xjkTGBh '
[Thu Feb 16 16:55:43 CET 2017] _ret='0'
[Thu Feb 16 16:55:43 CET 2017] original='{
  "type": "urn:acme:error:badNonce",
  "detail": "JWS has invalid anti-replay nonce JzUc_BMcKN-mzCHSJH-j3rAjR4h_ya0QZbIc9iLOlhw",
  "status": 400
}'
[Thu Feb 16 16:55:43 CET 2017] responseHeaders='HTTP/1.1 100 Continue                                                                                                                    [44/9848]
Expires: Thu, 16 Feb 2017 15:55:43 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache

HTTP/1.1 400 Bad Request
Server: nginx
Content-Type: application/problem+json
Content-Length: 149
Boulder-Request-Id: 949vUHw3EeBZ5TxOArzJ4ckQ8ekI6SJ6znEk-VXQHAs
Boulder-Requester: 9670546
Replay-Nonce: 0oK7lHHc4s3NTJZXc8s5Gd5z5OyF1djqL2J0zjO_xms
Expires: Thu, 16 Feb 2017 15:55:43 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Thu, 16 Feb 2017 15:55:43 GMT
Connection: close
'
[Thu Feb 16 16:55:43 CET 2017] response='{"type":"urn:acme:error:badNonce","detail":"JWS has invalid anti-replay nonce JzUc_BMcKN-mzCHSJH-j3rAjR4h_ya0QZbIc9iLOlhw","status": 400}'
[Thu Feb 16 16:55:43 CET 2017] code='400'
[Thu Feb 16 16:55:43 CET 2017] example.com:Challenge error: {"type":"urn:acme:error:badNonce","detail":"JWS has invalid anti-replay nonce JzUc_BMcKN-mzCHSJH-j3rAjR4h_ya0QZbIc9iLOlhw","status": 400
}
[Thu Feb 16 16:55:43 CET 2017] Skip for removelevel:
[Thu Feb 16 16:55:43 CET 2017] pid
[Thu Feb 16 16:55:43 CET 2017] No need to restore nginx, skip.
[Thu Feb 16 16:55:43 CET 2017] _clearupdns

I think checking for code=400 and retrying with the new nonce if you get badNonce is worth a try?

@cpu
Copy link

cpu commented Feb 16, 2017

I think checking for code=400 and retrying with the new nonce if you get badNonce is worth a try?

You'd probably want to check explicitly for the ACME problem with type urn:acme:error:badNonce instead - a 400 could occur for other reasons. Oops, I reread your comment and I think that's what you were suggesting, apologies for restating!

Retrying on badNonce errors is definitely a best practice I would recommend, especially for clients that have hooks for updating DNS, 👍 to the suggestion.

Under the load Let's Encrypt experiences these days a nonce given to a client can end up rotated out/expired by the time the client tries to use it again if enough time has passed (e.g. because of waiting for secondary nameservers to synchronize, etc). It should be treated as a non-fatal error because its easy to get a fresh nonce :-)

@ppaeps
Copy link
Author

ppaeps commented Feb 16, 2017

I think checking for code=400 and retrying with the new nonce if you get badNonce is worth a try?

You'd probably want to check explicitly for the ACME problem with type urn:acme:error:badNonce instead - a 400 could occur for other reasons. Oops, I reread your comment and I think that's what you were suggesting, apologies for restating!

Yes: this is what I meant.

Retrying on badNonce errors is definitely a best practice I would recommend, especially for clients that have hooks for updating DNS, 👍 to the suggestion.

As an optimisation, would it make sense to simply not try the _CACHED_NONCE if --dnssleep is longer than some value? Saves a round-trip.

For testing, I tried forcibly setting _CACHED_NONCE="" before the DNS verification and that seems to work. Obviously, a retry mechanism would be better. It's not immediately obvious to me where to add that though. Someone more familiar with the code will probably know.

@cpu
Copy link

cpu commented Feb 16, 2017

As an optimisation, would it make sense to simply not try the _CACHED_NONCE if --dnssleep is longer than some value? Saves a round-trip.

I'd be wary of this since deciding on a value that works reliably might be tricky and could change over time.

@Neilpang
Copy link
Member

Hi, @cpu
Is there any timeout for the "nonce" header on the response ?

Thanks.

@Neilpang
Copy link
Member

@cpu Does the Expires: Thu, 16 Feb 2017 15:55:43 GMT header work ?

@Neilpang
Copy link
Member

@ppaeps
Can you please give a full --debug 2 log ? I need to examine more.

@Neilpang
Copy link
Member

Neilpang commented Feb 17, 2017

@ppaeps
Please with the new branch nonce, if it works for you, I will merge it to master:

export BRANCH=nonce
acme.sh --upgrade

Then you can issue cert, please make sure you see the error message.

@ppaeps
Copy link
Author

ppaeps commented Feb 17, 2017

Thanks @Neilpang! The nonce branch appears to work well for me. I will do some more testing but unless you hear me screaming later today, I think you can safely merge this into master.

@Neilpang
Copy link
Member

@ppaeps
Did you see the "Invalid nonce" error ?

@ppaeps
Copy link
Author

ppaeps commented Feb 17, 2017

Did you see the "Invalid nonce" error ?

Yes I did. But I am seeing them less frequently today than yesterday, so perhaps the servers have become less busy. The requests succeeded after the first retries.

@Neilpang
Copy link
Member

@ppaeps Cool. I'm going to merge now.

Thanks.

@cpu
Copy link

cpu commented Feb 17, 2017

Hi @Neilpang,

@cpu Is there any timeout for the "nonce" header on the response ?

There isn't - the Boulder nonce implementation has a fixed size bucket of nonces. As new nonces are required the old ones will fall out and expire but the timing is based on the overall schedule of nonce requests and isn't a fixed timeout.

@cpu Does the Expires: Thu, 16 Feb 2017 15:55:43 GMT header work ?

The Expires header will be for the response body and not for the nonce header so unfortunately it doesn't help here.

Hope that helps clear things up,

@Neilpang
Copy link
Member

Hi @cpu

Thank you. I understand now.

We have added the retry logic.

It seems working now.

Thanks.

@cpu
Copy link

cpu commented Feb 17, 2017

Great! Glad to hear it. Thanks @Neilpang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants