Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NixOS 24.05 creates a new letsencrypt account and breaks CAA #316608

Closed
sbourdeauducq opened this issue Jun 2, 2024 · 17 comments · Fixed by #317257
Closed

NixOS 24.05 creates a new letsencrypt account and breaks CAA #316608

sbourdeauducq opened this issue Jun 2, 2024 · 17 comments · Fixed by #317257

Comments

@sbourdeauducq
Copy link
Contributor

After updating to 24.05, it appears that the ACME client (with the default infrastructure: security.acme/services.nginx.xxx.enableACME etc.) registers a new account for obtaining letsencrypt certificates. If you had CAA with account binding enabled on your domains, the certificate renewals will now fail.

If you are affected by the problem, here is how you can figure out the new account URI in order to update your CAA entries:

  1. ls -l /var/lib/acme/.lego/accounts/ and take the most recent directory
  2. cat /var/lib/acme/.lego/accounts/xxxxx/acme-v02.api.letsencrypt.org/email@example.com/account.json

This should probably be added to the release notes.

@sbourdeauducq
Copy link
Contributor Author

And you will, of course, very rapidly hit the ultra-frustrating letsencrypt rate limits again. They really ought to revise those policies as it makes relatively minor bugs like this a much bigger pain in the ass than they should be..

@flokli
Copy link
Contributor

flokli commented Jun 3, 2024

cc @NixOS/acme

@arianvp
Copy link
Member

arianvp commented Jun 3, 2024

Did you update from 23.11?

@sbourdeauducq
Copy link
Contributor Author

Yes.

@stephank
Copy link
Contributor

stephank commented Jun 3, 2024

Suspect 4494fca, because the two different directories I have now match the SHA256 of "https://acme-v02.api.letsencrypt.org/directory ec256 <email>" and " ec256 <email>". (Note the space at the start, which is the toString null of the old default for the server option.)

@arianvp
Copy link
Member

arianvp commented Jun 3, 2024

I honestly have no idea why this happened. The module hasn't had any significant changes for the past few years and nothing stands out in the recent history to me.

@arianvp
Copy link
Member

arianvp commented Jun 3, 2024

Wow @stephank good catch. Yes sounds like that is the issue

@flokli
Copy link
Contributor

flokli commented Jun 3, 2024

Do what's the plan forward? IMHO we should fix this whitespace and backport it to stable, assuming there's more people who didn't yet upgrade to the latest release vs that did.

Also, there might be a chance people who already did the upgrade still have their old accounts available, no?

@arianvp
Copy link
Member

arianvp commented Jun 3, 2024

Apparently this bug was brought up during review: #270221 (comment) :(

@sbourdeauducq
Copy link
Contributor Author

Is this a solid assumption? With a maintainance window of just one month, there is quite some pressure to update NixOS.

Also, there might be a chance people who already did the upgrade still have their old accounts available, no?

The issue is having to update the CAA records in the DNS, being able to still use the old account does not really matter.

@stephank
Copy link
Contributor

stephank commented Jun 3, 2024

Maybe we can keep the current method, but if the correct accounts directory doesn't exist, check if one exists under the old hash and move it?

One thing I'm unclear about: only the account hash should've changed, I believe? So why is it renewing all domains as well? Is there a link between domains and account ID somewhere, I'm not seeing? (EDIT: looks like server is part of the certDir hash as well, my bad.)

@flokli
Copy link
Contributor

flokli commented Jun 3, 2024

What's really bad is neither @NixOS/acme got highlighted on the issue, nor did the release note update happen. This currently is a tire fire for everyone switching to it, either having them run into rate limits or having to update CAA records (after they notice, at all).

@arianvp
Copy link
Member

arianvp commented Jun 3, 2024

We could create a patch that migrates the directory like @stephank proposes. Suggestion in the NixOS ACME/LetsEncrypt channel was similar. Something along the lines of:

if [ -d $oldHash ]; then 
  if [ ! -d $newHash ]; then
   mv $oldHash $newHash
  else
   echo "You are dedge please fix"
   exit 1
  fi
fi

In any case we still should also add a note to the release notes as this migration doesn't help people who have already upgraded.

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Jun 3, 2024

This currently is a tire fire for everyone switching to it

I've updated probably 25 VMs all with certs and didn't run into an problems and already forgot that this was a problem 4 months ago. I assume this is only a problem if you have the account ID in the caa record or if you have way to many http challenge certs with different accounts.

IMHO we should fix this whitespace and backport it to stable, assuming there's more people who didn't yet upgrade to the latest release vs that did.

We cannot just do that, as that would cause the opposite direction for anyone on unstable for the last 4 months, that already migrated or that recently installed 24.05.
We probably need some state convergence.

@arianvp
Copy link
Member

arianvp commented Jun 3, 2024

Since #106857 landed the rate-limit issue is probably less of an issue in practise. I think we really only should care about breaking people's CAA setup.

In that case I think we can get away with just an entry in the release notes and state convergence can be seen as a nice to have but not a must have.

@osnyx
Copy link
Contributor

osnyx commented Jun 4, 2024

the rate-limit issue is probably less of an issue in practise.

At the Flying Circus, there's a high likelihood we're going to be affected by this nonetheless, despite the head-of-line registration blocker setup you mentioned, @arianvp. We have several hundred machines that'll request a new account when rolling out 24.05.

You can create a maximum of 10 Accounts per IP Address per 3 hours. You can create a maximum of 500 Accounts per IP Range within an IPv6 /48 per 3 hours.

https://letsencrypt.org/docs/rate-limits/

When having to fix this downstream, I'd probably adopt a state convergence activation script approach.
Alternatively, @stephank had proposed to re-allow null as the account URL again, which would allow getting back the previous account hash string. If we adopt this here in NixOS upstream, that's a decent solution as well.

@arianvp
Copy link
Member

arianvp commented Jun 4, 2024

I'll work on a patch to re-allow null tonigh. Will post here for review in a bit!

arianvp added a commit to arianvp/nixpkgs that referenced this issue Jun 4, 2024
…p old accounts directory

The accounts directory is based on the hash of the settings.

NixOS#270221 changed the  default of
security.acme.defaults.server from null to the default letsencrypt URL
however as an unwanted side effect this means the accounts directory
changes and the ACME module will create a new a new account.

This can cause issues with people using CAA records that pin the
account ID or people who have datacenter-scale NixOS deployments

We allow setting this option to null again for people who want
to keep the old account and migrate at their own leisure.

Fixes NixOS#316608

Co-authored-by: Arian van Putten <arian.vanputten@gmail.com>
github-actions bot pushed a commit that referenced this issue Jun 4, 2024
…p old accounts directory

The accounts directory is based on the hash of the settings.

#270221 changed the  default of
security.acme.defaults.server from null to the default letsencrypt URL
however as an unwanted side effect this means the accounts directory
changes and the ACME module will create a new a new account.

This can cause issues with people using CAA records that pin the
account ID or people who have datacenter-scale NixOS deployments

We allow setting this option to null again for people who want
to keep the old account and migrate at their own leisure.

Fixes #316608

Co-authored-by: Arian van Putten <arian.vanputten@gmail.com>
(cherry picked from commit d1f07e6)
@mweinelt mweinelt unpinned this issue Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants