Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual delay hook #6607

Closed
wants to merge 2 commits into from
Closed

Manual delay hook #6607

wants to merge 2 commits into from

Conversation

LeviticusMB
Copy link

DNS verification with the manual plugin can be painfully slow if multiple domains or hostnames are to be included in the certificate.

I see that various suggestions have been made to improve this, like #4628, #5484 and #5805.

This PR proposes a new hook that can be used to wait for the DNS propagation. The auth hooks can now just trigger the DNS updates in the background and when all updates have been performed, the delay hooks wait for the updates to propagate before the actual verification continues.

@LeviticusMB
Copy link
Author

Note that this is just a POC. If there’s interest in adding such a hook, I could fix up a proper implementation.

@adferrand
Copy link
Collaborator

adferrand commented Dec 16, 2018

I think your proposition is very interesting. As a contributor in Lexicon, that backs a lot of Certbot DNS plug-in, and as the maintainer of a Docker that is handling DNS challenges from Certbot with Lexicon, it is not the first time I am confronted to this problematic.

Since wildcard certificates are out there, DNS challenges became a biggest part of the challenges that need to be handled. But on the contrary to HTTP challenges, that can be treated as synchronous operations as it is basically put locally a file in a specific path, DNS challenges require most of the time asynchronous calls to Rest APIs, making this process highly asynchronous, and heavily slow if processed in a synchronous way.

Your idea is focused on the main need, that is to wait DNS records modifications to be correctly propagated by the DNS provider, before asking the ACME CA server to trigger the challenge verification. I think your asynchronous approach can be generalized.

Conceptually, the point is to have four logical blocks that are synchronously executed, but where every operation in a given block can be executed in the order you want, including as massively in parallel. This is first, challenges committing, then challenges propagation validation, then challenges check by ACME CA, and finally challenges cleanup. In theory, what you say about DNS challenges could be also applied to HTTP challenges, even if it is less critical: after the files have been written in the HTTP server, you could want to check first that theses files are effectively served before asking the ACME CA to launch its logic.

@adferrand
Copy link
Collaborator

So if I recall correctly, currently the provided authenticator hook (and same for clean up hook), is executed in a blocking loop for each domain described in the SANs of the current certificate.

To take fully advantage of your delay hook (that I would call more check propagation hook), theses authenticator/cleanup hooks should be executed in a non blocking loop, certainly as processes, and in the middle, the check propagation hook would run, it also as non blocking, to trigger the ACME CA call when all processes returned green.

@adferrand
Copy link
Collaborator

adferrand commented Dec 16, 2018

However in this case, an independent check propagation hook may not be necessary. If we consider that an authenticator hook async process returning green means "challenge is written AND propagation has been checked", then everything can be done in the authenticator hook.

This does not allow a specific and global logic to be applied for checking the propagation though. An example I have in my mind, is that for DNS challenges for domains all in the same DNS zone, you do not need to check each individual domain. You can just poll the entire TXT records of the zone, and wait for all expected records to appear in it.

@adferrand
Copy link
Collaborator

And of course, retro-compatibility needs to be taken care of. What about existing hooks that effectively relies on synchronous operations?

Here it would be another argument for a separated class of hooks. Without any check propagation hook specified, the synchronous process is triggered. With it specified, then the asynchronous process is triggered.

OK, finished to put my thinking here ^^

@LeviticusMB
Copy link
Author

So currently, all auth hooks are executed sequentially, then the DNS is checked by the servers and finally the cleanup hooks can be used to clean up whatever the auth hooks did, if desired.

The problem is, of course, that for this to be reliable, each auth hook needs to wait for propagation to complete, which is very very slow (at least a couple of minutes per hook invocation.

Changing certbot to run the hooks in parallel might work (I'm not really familiar enough with the code), but it would at least be a major change, since then you have to ensure that concurrent hooks do not overwrite each others modifications. Also not that even for a trivial wildcard certificate, the hook will run twice (once for example.com and once for *.example.com).

What this delay hook proposal allows you to do is to skip the propagation check in the auth hook. Just update your zone records and trigger a refresh/reload. This can be done fairly quick, like in a couple of seconds rather than minutes. Note that the auth hook will still execute at least twice (for example.com and *.example.com).

Then, once all domains have been processed, the delay hook kicks in and waits until all domain propagation is complete.

It would be a minor addition to the code and fully backward compatible too. Use your current auth hoops as-is, to split them into a separate auth and delay hook if desired.

@bmw
Copy link
Member

bmw commented Oct 11, 2019

Instead of this design, what do people think of an environment variable like CERTBOT_REMAINING_CHALLENGES that is set for both --manual-auth-hook and --manual-cleanup-hook. The value would be the number of additional times Certbot will call the hook for the current certificate. When the hook is called for the last challenge, CERTBOT_REMAINING_CHALLENGES would be 0.

The reason I personally prefer this design is it avoids adding a 7th type of hook to Certbot to potentially confuse people on what they should use and while I admit I do not know what a user would want to do with this functionality, it's a little more flexible to know where you are in the list of challenges you're performing than getting called only on the last challenge.

Inspiration for this proposal came from #5805.

@adferrand
Copy link
Collaborator

adferrand commented Oct 11, 2019

So if I understand correctly, with that design, every auth hook (and similarly clean up hook) would make non blocking requests to give hand back to certbot immediately, except the last one that could make a blocking loop to check correct DNS propagation?

It could work, if a given hook has access to the list of all domains that are challenged, to be able to check DNS propagation for all tokens.

@bmw
Copy link
Member

bmw commented Oct 11, 2019

To make sure we're talking about the same kind of blocking, I'm not proposing letting Certbot run hooks in parallel as was discussed earlier in this thread. The change I'm proposing here is quite small. We simply would add a decrementing counter in an environment variable. Each hook would be executed sequentially and Certbot will block on the hook to finish execution before continuing like it does now.

As for what the hook itself does though, yes, it wouldn't have to wait separately for each DNS change to propagate and it could only do after making the last one. I suspect most people would just have the script sleep for a long time. That's what all of Certbot's DNS plugins do.

Alternatively, they could in theory check each record. By the time CERTBOT_REMAINING_CHALLENGES is 0, the script will have been called once with each domain and token value. It would be up to the script to save these values somewhere to be reused during the last execution though.

@adferrand
Copy link
Collaborator

Yes, with your design I had not a parallel execution in mind. Just a sequential execution that is fast, because the hook does not wait for anything except the last one.

It can work. I am OK with it. However, even if it not blocking with that design, I think it is beneficial in general to get the list of all domains in an env variable.

@bmw
Copy link
Member

bmw commented Oct 11, 2019

I think there is value to a variable like that, but I think most people won't use it, it's another variable to learn about and understand when writing hook scripts, and you can get the same functionality by writing a little code yourself.

You OK leaving that out until a request for that feature arises from multiple users?

@bmw
Copy link
Member

bmw commented Oct 24, 2019

Adrien and I talked about this out of band and he's OK with my design which I wrote up in #5484 (comment). @LeviticusMB, if you're interested in implementing this, I'll review the PR.

@bmw
Copy link
Member

bmw commented Dec 10, 2019

I'm closing this PR because we'd like a slightly different design for this and this PR hasn't seen any response in a little over a month. If you're still interested in implementing this, feel free to open a PR with the new design at #5484 (comment). If you have any questions/concerns about the design, feel free to bring them up in that issue.

@bmw bmw closed this Dec 10, 2019
@varun-da
Copy link

varun-da commented Mar 4, 2020

@bmw @adferrand I would live CERTBOT_REMAINING_CHALLENGES, this would speed up cert generation for us. Currently with manual-auth-hook for 28 subdomain wildcard SAN, 1 root domain wildcard SAN, 1 root domain, we are at 45 - 50 mins to generate a cert.

A implementation like this will greatly help us. Can you please consider opening this issue?

https://community.letsencrypt.org/t/parallel-run-of-certbot-manual-auth-hook-manual-cleanup-hook-for-validation-of-multidomain-cert/115352

@varun-da
Copy link

varun-da commented Mar 4, 2020

moving the comment to #5484.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants