New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACME needs to check revocation for unexpired certificates. #129838
Comments
This behavior was introduced in #114752 to work around network problems during or perhaps shortly after NixOS activation, causing deployment failures for no good reason. Ideally we improve the activation (switch) logic but this is a hard problem, as pointed out by @arianvp #85794 (comment). |
That sounds reasonable enough to me. It seems that it ideally would be some sort of failure still, but I guess you would get an email from the CA anyways so silently failing isn't a major issue. I agree that it shouldn't block deploys so this is probably the best we can do at the moment. For now I think that checking an OSCP response if available every 24h is fine. We can revisit and use the CA-recommended values if the ACME Renewal Information (ARI) API goes anywhere. |
Ideally, we don't want to be implementing API calls to CAs ourselves in the acme renew service script. It is complicated enough as is, and finding ways to solve issues like these without regressing continues to be a difficult task. At some point, we should probably start opening PRs to lego to get features we need implemented (Something I'm considering doing myself, at least). For now, we can probably leave checking OCSP to lego (this would explain why it even needs an internet connection when certs are not expired), but we would still need to manually check before hand if the internet connection is working and the cert is not expired. Relying on systemd service ordering is not sufficient as explained in #85794 (comment) Here's a little truth table I've come up with:
As you can see, there's only one scenario here where we want to fail the service. We should continue to ignore OCSP if there is no active internet connection to avoid rebuild bugs as @roberth explained. However if the cert is expired we should run lego anyway so that users are informed of the issue. |
I wonder if we could call Lego, and ignore failures if the cert still has expiry time left. That isn't optimal but maybe is good enough? It may be worth reaching out to the Lego devs and seeing if there is some sort of different error code for transient conditions like this. I'll also look more into the service ordering. It seems like there should be a way to strongly bind to the DNS service. But maybe this isn't possible through a target and it would still result in the acme service being killed which would need to be handled somewhat gracefully. |
Honestly I'm quite a fan of this idea! Why worry about cert failures pre-validMinDays? But yes it would be nice if lego could do this itself (via a flag) or returned a different error code.
If you find a solution to that one please update #106862 :) my last comment there has some thoughts on the matter. I did explore the dependency options quite thoroughly at the time but couldn't find a good match for our requirements. |
Describe the bug
Right now the acme service never contacts the outside world if the certificate expires more than
validMinDays
into the future. This behaviour is incorrect as the certificate may be revoked, or the CA may plan to revoke the certificate at any point before the actual expiry date. Therefore the CA must be contacted regularly so that the certificate can be renewed if required. This ensures that the certificate can be refreshed without downtime for a planned revocation or for minimal downtime upon unplanned revocation.Additional context
https://github.com/NixOS/nixpkgs/blob/nixos-unstable/nixos/modules/security/acme.nix#L283-305
Planned revocation isn't yet specced out or implemented. But checking for revoked certs should be done.
https://mailarchive.ietf.org/arch/msg/acme/b-RddSX8TdGYvO3f9c7Lzg6I2I4/#
Notify maintainers
@roberth
@aanderse @andrew-d @arianvp @Emily @flokli @m1cr0man
Maintainer information:
The text was updated successfully, but these errors were encountered: