New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for Nginx restart to complete when using Nginx plugin #7740
Conversation
Hi @mustanggb, I remember having had some code that depended on |
The history of our removing Is there any other way to do this without reintroducing this dependency? Perhaps we could use |
@schoen thanks for commenting, I did try to look for why I deliberately didn't squash my commits so people could have a look at the approaches I tried.
However the issue I had was the tests complaining that Also I didn't want to spend too much time on it if the overall approach wasn't going to be acceptable. If people are happy with the approach a cross-platform method of achieving |
Thanks! I realize this is a thorny problem but so far I haven't come across a consensus about what to do about it. It does seem like the maintainers don't want to re-add the |
Hello! If you want, I can search for Windows native alternatives for If there is a solution, we will need to implement it in the |
@adferrand, I think that would probably be helpful. A related question is whether there is a portable Unix equivalent that works on non-Linux Unix systems (which I think was a problem with @bmw, do you have a thought on an appropriate solution to #7422 or what characteristics such a solution ought to have? |
OK. My feedback so far after some digging. I think that without On Windows specifically, we can have a decent solution thanks to I would like to propose to study another approach, that would avoid to manipulate PIDs and so to require a deep knowledge of the actual platform that is running Nginx. Why not trying to make HTTP requests to the local Nginx instance ? If the results of these requests are usable to know when Nginx is effectively restarted, I think this solution will be way more maintainable in the long term. |
Appreciate the replies.
Mainly just because I thought the original method would be a more robust solution, but I've updated the pull request to do it this way instead, please let me know your feedback. |
I think this is tricky and I'm not sure there is a solution that will work 100% of the time. While it's probably rare, the server might not be listening on all interfaces and will not be accessible through localhost. As for the
If we thought it was significantly more reliable than connecting to localhost, I'd be potentially open to adding the If we cannot do this 100% reliably, I personally think the most important thing here in general is to make sure we have good behavior when this is not going to work. A couple suggestions I have related to this are:
|
I'm not familiar with such a setup, do you mean a non-default port, listening on a socket (I don't think Nginx has this feature), or something else; you could elaborate? If it's just a different port could we add an argument to set this if it needs to be changed?
Agree. |
Most people's listen directives look like:
This configures nginx to listen to port 80 on all interfaces. Using the output of a tool like
I wasn't thinking of a problem of using a non-default port, but we should probably make that work too. The nginx plugin should currently respect the value of I also wasn't thinking of listening on a socket, although that is also supported by nginx: https://nginx.org/en/docs/http/ngx_http_core_module.html#listen |
Just a shower thought: you don't need to verify that the challenge path is accessible (and FWIW I think this is a lot harder than it seems due to all the weird and wonderful things you can do in vhosts), you only need to verify that nginx reloaded. To that end, one could just get the configurator to add a sentinel vhost as part of the challenge solver, such as: server {
listen unix:/var/lib/letsencrypt/nginx-sentinel.sock;
location / { return 200 "<random sentinel value>"; }
} This might dodge some of the issues that @bmw is worried about. |
Bingo! This seems like the best of both worlds, I like this idea better as well as it's a closer check of an actual reload, like you say, a vhost could be giving a 200 response even before a reload has completed. EDIT: Actually, will a new server/listen directive be picked up on reload, or does it require a full restart, because that would be a deal breaker I'd say. |
Well it seems a reload does work fine. Updated with the sentinel approach; not expecting tests to pass yet. |
Okay, tests did better than expected, however I have no idea why the |
Any further input from maintainers, possible to have a milestone? |
@bmw, could you opine on this? It seems that @alexzorin has had a clever idea and @mustanggb has implemented it. (One thing I wonder about is whether this could reduce our Windows compatibility, since I didn't think Windows supported Unix domain sockets.) |
That is a clever idea from alexzorin. In addition to making Windows support harder though, I'm a little concerned about file system permissions or mandatory access control issues like those described in #4716. We used the directory more in the past, but we currently don't configure nginx to use our working directory at all except when the config previously wasn't listening on necessary port for the challenge, we conditionally reference the nonexistent directory http_01_nonexistent. I'm sorry I keep popping in here trying to poke holes in the proposed approaches, but we currently have nearly half a million nginx users for whom what we currently have is working. That's a lot of nginx configurations doing who knows what and I don't want to break their ability to automatically renew certificates. While it'd be great for us to also support the few configurations who have reported hitting #7422, landing anything here with any significant risk of regressions personally makes me a little nervous. I don't really care what approach is taken here if we're confident it'll work, but out of what's been proposed so far, I personally think I like using localhost the best with the changes I described at #7740 (comment). Two other ideas are:
|
I would like to give back my reasoning here, trying to avoid further nginx configuration, to include windows compatibility, and various situations. Do not hesitate to explain me why I am wrong. So, given the port used for the HTTP-01 challenge, as defined by the I suppose that we could test the availability of the socket (and so state that nginx is currently running or not) by trying to bind from certbot to Now, the process to see that nginx restarted. We can be in the situation where nginx was listening to this port before the restart or not.
In theory, we avoid any further configuration to Nginx, like the unix socket, increasing compatibility for Windows. Considering that the current situation (the plugin does not wait) is mostly working, I think the all process should be wrapped by a global short timeout, typically 30s. If timeout is reached, the plugin assume nginx has restarted and continue. |
@bmw If the directory permissions are a concern could always put it somewhere else, |
This is available since Windows 10 1803. I could not find information about the Server flavor of Windows 10, which is Windows Server 2019. And we need to cover all non-EOL versions of Windows, so down to Windows Server 2012. |
@adferrand, how would you propose that your recent suggestion be implemented in a platform independent way and why do you like it better than your previous suggestion of trying to connect to nginx on localhost? |
Well, about trying to connect to Nginx on localhost, it may not work depending on how Nginx is configured: it may just not listen to the loopback interface (although very unlikely in a standard configuration, but possible). Given VirtualHost features, the behavior of an HTTP request could also be very different depending on the actual However, I may be wrong, but I would expect that trying to bind a port to all interfaces will fail if any interface is already bound for this port: this failure would be a good indication that Nginx has actually restarted, independantly from the config, so whatever the interface is and the server name set up for the associated virtual host. About how to implement that, the Python |
I also could be mistaken, but the two problems I see there are:
|
Won't fix isn't an option, because as it stands the nginx plugin is useless. So what solution would you like to use? |
@mustanggb, please keep your comments civil and constructive if you are going to continue posting to this repo. Calling an open source project that has been worked on by hundreds of people and whose nginx plugin is successfully being used by hundreds of thousands of people useless is not appropriate. As for a path forward, I'm personally just leaning towards an If other people who have offered suggestions in this thread have differing opinions, I'd be interested to hear them. |
As described in certbot#7422, reloading nginx is an asynchronous process and Certbot does not know when it is complete. In an environment where this reload takes a long time, the nginx plugin suffers from an issue where it responds to and fails the ACME challenge before the nginx server is ready to serve it. Following the discussion in a previous PR certbot#7740, this commit introduces a new flag, --nginx-sleep-seconds, which may be used to increase the duration that Certbot will wait for nginx to reload, from its previously hard-coded value of 1s. Fixes certbot#7422
* nginx: add --nginx-sleep-seconds As described in #7422, reloading nginx is an asynchronous process and Certbot does not know when it is complete. In an environment where this reload takes a long time, the nginx plugin suffers from an issue where it responds to and fails the ACME challenge before the nginx server is ready to serve it. Following the discussion in a previous PR #7740, this commit introduces a new flag, --nginx-sleep-seconds, which may be used to increase the duration that Certbot will wait for nginx to reload, from its previously hard-coded value of 1s. Fixes #7422 * update CHANGELOG * nginx: update docstring for nginx_restart
Thanks for getting things started here and triggering the discussion around this issue. |
Fixes #7422.