-
-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACME renew fails if there is a local DNS server configured as the system resolver #106862
Comments
A quick but hacky solution to this to whoever may arrive here is to simply add a dependency on your DNS server to systemd.services."acme-fixperms".wants = [ "bind.service" ];
systemd.services."acme-fixperms".after = [ "bind.service" ]; This works because all cert renewals already depend on |
This is another case where our nixos-rebuild switch abstraction breaks cc @NixOS/systemd We really need to start taking indirect dependencies into account I think. These bugs keep popping up. As this is just another variant of #105354, #106336 We already fixed this during bootup with #99901 but Switching to a socket-activated dns server can fix this problem. E.g. #101218 was introduced to fix the issue mentioned here. because for socket-activated units dependencies are automagic. I would really like us to come up with a bit more generic solution |
@m1cr0man The fix you suggested doesn't work on my system :( |
I don't believe a fix like that can actually work, since when you do (Which basically repeats what Arian said in #106862 (comment)) |
I can confirm this issue, here's how the failure looks like when the NixOS Matrix's
The system recovers shortly after when the systemd |
@m1cr0man Is that still the case? I just got this problem and see:
So |
It is still a dependency of all renewals, as shown by acme-m1cr0man.com.service
....
● ├─acme-fixperms.service However @mweinelt was right in that what I suggested doesn't work. acme-fixperms.service is a one shot with RemainAfterExit set to true. If you make a config change that restarts your DNS server and trigger a renewal like in your example, it won't trigger acme-fixperms so it won't wait for the DNS service. You would need to add the dependencies to each renewal service instead. |
Do you know what's the best way (nix expression) to do that programmatically? |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/make-acme-renew-systemd-service-depend-on-dns-nss-lookup/7412/12 |
Sorry - I thought I had replied already! Yeah you can do this quite easily with a bit of mapping. Put this in a file e.g. { config, lib, ... }:
{
systemd.services = let
dependency = ["bind.service"];
in lib.mapAttrs' (name: _: lib.nameValuePair "acme-${name}" {
requires = dependency;
after = dependency;
}) config.security.acme.certs;
} |
Seems to work! I added {
# Module that fixes LetsEncrypt renewals failing in a startup race with `bind`.
# From: https://github.com/NixOS/nixpkgs/issues/106862#issuecomment-860192745
acmeBindRaceFixModule = { config, lib, ... }: lib.mkIf config.services.bind.enable {
systemd.services =
let
dependency = [ "bind.service" ];
in
lib.mapAttrs'
(name: _: lib.nameValuePair "acme-${name}" {
requires = dependency;
after = dependency;
})
config.security.acme.certs;
}; and then using it: {
imports = [
acmeBindRaceFixModule
];
} The race and errors are gone that way. @m1cr0man Should we just do that by default in the ACME modules, for all name server modules in nixpkgs? |
Isn't there the same problem with unbound, dnsmasq, and other DNS servers? |
@sbourdeauducq Most likely yes, with
I meant that we'd write a |
Also, a DNS server may not be the system resolver, or there can be several DNS servers. For example, a wifi/LAN internet gateway can have both unbound and dnsmasq running, with dnsmasq using unbound as the upstream resolver. Maybe it is better to let the user define which DNS server to wait on. |
The intention of this ticket was to find some way to do so. Adding the dependency to the renew services isn't the problem, it is knowing what the system resolver is as @sbourdeauducq points out.
I don't want the burden of maintaining and tracking that on ACME, nor do I want to transparently create a dependency which DNS server module maintainers might not be aware of. We also aren't the only thing that suffers from this issue. Ideally, I want to put the solution in the DNS modules. The simplest solution I've been able to think of is a I don't know how @arianvp's suggestion of a socket activated DNS server would work - mostly because I don't know enough about systemd sockets to understand how that could be used to wait on a local resolver, nor how much work it would be to implement per DNS server. It does sound like the best solution on principal. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
Describe the bug
During a
nixos-rebuild
, ACME renewal can fail because the enabled local DNS server (confirmed with bind and dnsmasq) is not ready to serve requests.To Reproduce
Steps to reproduce the behavior:
Expected behavior
The acme renew service should not fail.
The solution here would require a generalised way to reliably wait on the DNS server to be online. Changes need to be made in the mentioned DNS server modules more than the ACME module, despite it appearing as an ACME failure.
Screenshots
See this renew service output
Notify maintainers
@nixos/acme
Metadata
Maintainer information:
The text was updated successfully, but these errors were encountered: