Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/modules/security/acme: unnecessary lego run when lego renew fails #86184

Closed
datafoo opened this issue Apr 28, 2020 · 5 comments · Fixed by #91121
Closed

nixos/modules/security/acme: unnecessary lego run when lego renew fails #86184

datafoo opened this issue Apr 28, 2020 · 5 comments · Fixed by #91121

Comments

@datafoo
Copy link
Contributor

datafoo commented Apr 28, 2020

Describe the bug

A timeout when running lego renew automatically triggers a lego run.

First, it is not necessary in the case there already exists a valid certificate but worse, it is even detrimental because it leads to generate a brand new certificate which brings the number of generations closer to the Duplicate Certificate limit of 5 per week.

Here is the timeout I am talking about: net/http: TLS handshake timeout

Apr 27 10:24:23 fas systemd[1]: Starting Renew ACME Certificate for example.com...
Apr 27 10:24:38 fas uepfyeqskx3sibpug7wryot64ie4vgys-acme-start[1569]: 2020/04/27 10:24:38 Could not create client: get directory at 'https://acme-v02.api.letsencrypt.org/directory': Get "https://acme-v02.api.letsencrypt.org/directory": net/http: TLS handshake timeout
Apr 27 10:24:39 fas uepfyeqskx3sibpug7wryot64ie4vgys-acme-start[1569]: 2020/04/27 10:24:39 [INFO] [example.com] acme: Obtaining bundled SAN certificate
Apr 27 10:24:39 fas uepfyeqskx3sibpug7wryot64ie4vgys-acme-start[1569]: 2020/04/27 10:24:39 [INFO] [example.com] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/1111111111
Apr 27 10:24:39 fas uepfyeqskx3sibpug7wryot64ie4vgys-acme-start[1569]: 2020/04/27 10:24:39 [INFO] [example.com] acme: authorization already valid; skipping challenge
Apr 27 10:24:39 fas uepfyeqskx3sibpug7wryot64ie4vgys-acme-start[1569]: 2020/04/27 10:24:39 [INFO] [example.com] acme: Validations succeeded; requesting certificates
Apr 27 10:24:42 fas uepfyeqskx3sibpug7wryot64ie4vgys-acme-start[1569]: 2020/04/27 10:24:42 [INFO] [example.com] Server responded with a certificate.
Apr 27 10:24:42 fas systemd[1]: Started Renew ACME Certificate for example.com.
Apr 27 18:08:45 fas systemd[1]: acme-example.com.service: Succeeded.
Apr 27 18:08:45 fas systemd[1]: Stopped Renew ACME Certificate for example.com.
Apr 27 18:08:45 fas systemd[1]: acme-example.com.service: Consumed 0 CPU time, received 12.8K IP traffic, sent 6.0K IP traffic.

To Reproduce
N/A

Expected behavior
A timeout when lego renew runs should not lead to lego run.

Notify maintainers
@abbradar @fpletz @globin @m1cr0man

Metadata
Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"x86_64-linux"`
 - host os: `Linux 5.4.28, NixOS, 20.09pre-git (Nightingale)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.3`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Maintainer information:

# a list of nixos modules affected by the problem
module:
  - acme
@m1cr0man
Copy link
Contributor

Can probably fix this by checking for the existence of the cert file. Will add it to the big list of changes needed for the rewrite.

@m1cr0man m1cr0man self-assigned this Jun 6, 2020
@datafoo
Copy link
Contributor Author

datafoo commented Jun 17, 2020

I have found another scenario to reproduce this problem.

On my server, I am setting the networking configuration statically which leads to the network-online.target being reached quickly, even before the network is actually operational (read here to understand why).

So in my case, at every boot, I have a ACME service that starts before the network is operational. That translates into:

[root@myhost:~]# journalctl -u acme-example.com.service -b
-- Logs begin at Fri 2020-01-31 09:26:25 UTC, end at Wed 2020-06-17 14:45:49 UTC. --
Jun 17 14:40:28 myhost systemd[1]: Starting Renew ACME Certificate for example.com...
Jun 17 14:40:45 myhost acme-example.com[1746]: 2020/06/17 14:40:45 Could not create client: get directory at 'https://acme-staging-v02.api.letsencrypt.org/directory': Get "https://acme-staging-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-staging-v02.api.letsencrypt.org: device or resource busy
Jun 17 14:40:51 myhost acme-example.com[2016]: 2020/06/17 14:40:51 [INFO] [example.com] acme: Obtaining bundled SAN certificate
Jun 17 14:40:51 myhost acme-example.com[2016]: 2020/06/17 14:40:51 [INFO] [example.com] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/123
Jun 17 14:40:51 myhost acme-example.com[2016]: 2020/06/17 14:40:51 [INFO] [example.com] acme: authorization already valid; skipping challenge
Jun 17 14:40:51 myhost acme-example.com[2016]: 2020/06/17 14:40:51 [INFO] [example.com] acme: Validations succeeded; requesting certificates
Jun 17 14:40:52 myhost acme-example.com[2016]: 2020/06/17 14:40:52 [INFO] [example.com] Server responded with a certificate.
Jun 17 14:40:52 myhost systemd[1]: acme-example.com.service: Succeeded.
Jun 17 14:40:52 myhost systemd[1]: Finished Renew ACME Certificate for example.com.
Jun 17 14:40:52 myhost systemd[1]: acme-example.com.service: Consumed 334ms CPU time, received 13.9K IP traffic, sent 6.0K IP traffic.

So at every boot, the lego renew fails:

Jun 17 14:40:45 myhost acme-example.com[1746]: 2020/06/17 14:40:45 Could not create client: get directory at 'https://acme-staging-v02.api.letsencrypt.org/directory': Get "https://acme-staging-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-staging-v02.api.letsencrypt.org: device or resource busy

… and then the lego run runs:

Jun 17 14:40:51 myhost acme-example.com[2016]: 2020/06/17 14:40:51 [INFO] [example.com] acme: Obtaining bundled SAN certificate
Jun 17 14:40:51 myhost acme-example.com[2016]: 2020/06/17 14:40:51 [INFO] [example.com] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/123
Jun 17 14:40:51 myhost acme-example.com[2016]: 2020/06/17 14:40:51 [INFO] [example.com] acme: authorization already valid; skipping challenge
Jun 17 14:40:51 myhost acme-example.com[2016]: 2020/06/17 14:40:51 [INFO] [example.com] acme: Validations succeeded; requesting certificates
Jun 17 14:40:52 myhost acme-example.com[2016]: 2020/06/17 14:40:52 [INFO] [example.com] Server responded with a certificate.

Needless to say, I reach the Duplicate Certificate limit of 5 per week very quickly.

@datafoo
Copy link
Contributor Author

datafoo commented Jun 17, 2020

Can probably fix this by checking for the existence of the cert file.

That would sure help.

Will add it to the big list of changes needed for the rewrite.

  • Where can I consult this list?
  • Is anyone working on the rewrite you mentioned?
  • Why is a rewrite needed anyway?

@datafoo
Copy link
Contributor Author

datafoo commented Jun 17, 2020

@datafoo
Copy link
Contributor Author

datafoo commented Jun 18, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@veprbl @m1cr0man @datafoo and others