Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caddytls: Support custom GetCertificate modules (like Tailscale) #4541

Merged
merged 10 commits into from
Feb 17, 2022
Merged

Conversation

mholt
Copy link
Member

@mholt mholt commented Jan 22, 2022

Sometimes there are external tools or services that are managing certificates where we just need to call GetCertificate() during a handshake. The implementer of that function takes care of caching, renewal logic, etc. This adds support for CertificateGetter modules, in the tls.get_certificate namespace.

These differ from Issuer modules in that Issue() takes a CSR, whereas GetCertificate() takes a *tls.ClientHelloInfo. Also, using an Issuer implies that Caddy/CertMagic also generate and manage the private key, AND store the certificate and key in storage, since we're the ones maintaining the cert. In the case of GetCertificate, we don't want to be bothered about private keys or persisting in storage; we just want a cert. So basically, we use Issuers when we claim that we're managing the certificate. And we use CustomGetCertificate only to get a certificate from something we know is managing it for us.

The return value of our GetCertificate() is optional. If it returns nil+nil, it is passed by as if it was not specified at all; thus could fall back to traditional Issuer management for a name.

Specifying a certificate getter module implicitly enables On-Demand TLS for that policy (as if on_demand: true was set) because certificates given by GetCertificate() have to be called during a handshake.

Our GetCertificate() signature is slightly different from tls.Config.GetCertificate in that it adds a bool middle return value, which if true, tells CertMagic to add the certificate to its cache. Once cached, the module's GetCertificate() will only be called again once the cert nears expiration. Modules should return false for this value if they want to be called for every single TLS handshake.

This initial implementation integrates natively with Tailscale, although it requires being run as root or as a regular user that has access to the Tailscale socket. I'm still learning about this.

Here's an example automation policy for using Tailscale certs:

{
	"subjects": ["yourhost.your-alias.ts.net"],
	"get_certificate": { "via": "tailscale" }
}

Or as a Caddyfile:

tls {
    get_certificate tailscale
}

A follow-up commit may also add support for an HTTP getter, to download the cert over HTTP.

This feature will remain experimental (and subject to change) even after being released, as we learn about it from experience.

Will be adding Caddyfile support soon, too.

Huge thanks to Tailscale for making this feature possible! Excited to release this and share more about it. /cc @bradfitz

TODO:

  • Figure out why go.sum blew up (why is smallstep adding kms now, on this branch? 🤔 )

Builds on caddyserver/certmagic#163

@mholt mholt added feature ⚙️ New feature or request in progress 🏃‍♂️ Being actively worked on labels Jan 22, 2022
@mholt mholt added this to the v2.5.0 milestone Jan 22, 2022
@bradfitz
Copy link
Contributor

Figure out why go.sum blew up

Because of the dep to the tailscale.com module. That's why I'd said you might want us to pull that out separately to something smaller. Some people are offended by go.sum spam, even though it's rarely an actual problem.

@mholt mholt self-assigned this Jan 23, 2022
@francislavoie
Copy link
Member

FYI, linter failure:

  level=warning msg="
[runner] Can't run linter goanalysis_metalinter: bodyclose: failed prerequisites: 
[buildssa@github.com/caddyserver/caddy/v2/modules/caddytls: analysis skipped: errors in package:
[/home/runner/work/caddy/caddy/modules/caddytls/certgetters.go:7:2: could not import tailscale.com/client/tailscale (
/home/runner/go/pkg/mod/tailscale.com@v1.20.1/client/tailscale/tailscale.go:28:2: could not import tailscale.com/client/tailscale/apitype (
/home/runner/go/pkg/mod/tailscale.com@v1.20.1/client/tailscale/apitype/apitype.go:8:8: could not import tailscale.com/tailcfg (
/home/runner/go/pkg/mod/tailscale.com@v1.20.1/tailcfg/tailcfg.go:19:2: could not import tailscale.com/types/key (
/home/runner/go/pkg/mod/tailscale.com@v1.20.1/types/key/disco.go:177:23: cannot convert ciphertext (variable of type []byte) to *[24]byte)
)))]]"

Also fix AP provisioning in case of empty subject list (persist loaded
module on struct, much like Issuers, to surive reprovisioning).

And implement start of HTTP cert getter, still WIP.
@mholt
Copy link
Member Author

mholt commented Jan 25, 2022

Ah, all the tests for Go 1.16 are failing; I wonder if Go 1.17 is required.

Added Caddyfile support and most of the HTTP certificate getter.

This is pretty much done now, except for a few things:

  • Exact behavior of Tailscale's GetCertificate if it doesn't have a cert for the subject
  • Tailscale socket permissions
  • Finishing the HTTP cert getter: what should the response body be?
  • Dependency bloat

modules/caddytls/automation.go Outdated Show resolved Hide resolved
modules/caddytls/certgetters.go Outdated Show resolved Hide resolved
Co-authored-by: Francis Lavoie <lavofr@gmail.com>
bradfitz added a commit to tailscale/tailscale that referenced this pull request Jan 25, 2022
… access

So you can run Caddy etc as a non-root user and let it have access to
get certs.

Updates caddyserver/caddy#4541

Change-Id: Iecc5922274530e2b00ba107d4b536580f374109b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/tailscale that referenced this pull request Jan 25, 2022
… access

So you can run Caddy etc as a non-root user and let it have access to
get certs.

Updates caddyserver/caddy#4541

Change-Id: Iecc5922274530e2b00ba107d4b536580f374109b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@bradfitz
Copy link
Contributor

Ah, all the tests for Go 1.16 are failing; I wonder if Go 1.17 is required.

Tailscale only supports the most recent version of Go. So, yes, ...

Error: /home/runner/go/pkg/mod/tailscale.com@v1.20.1/types/key/disco.go:177:22: cannot convert ciphertext (type []byte) to type *[24]byte
Error: /home/runner/go/pkg/mod/tailscale.com@v1.20.1/types/key/node.go:137:22: cannot convert ciphertext (type []byte) to type *[24]byte

We assume we can do that conversion now.

If I made a separate Go module/package for fetching certs, I can make sure it works with older Go versions.

@francislavoie
Copy link
Member

FWIW, we generally go with "current - 1" for Go versions for Caddy, mainly because our RPM packaging maintainers need to rely on the Go version they have available to them.

Thanks for splitting it out @bradfitz 😀

bradfitz added a commit to tailscale/tailscale that referenced this pull request Jan 25, 2022
… access

So you can run Caddy etc as a non-root user and let it have access to
get certs.

Updates caddyserver/caddy#4541

Change-Id: Iecc5922274530e2b00ba107d4b536580f374109b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit to tailscale/tailscale that referenced this pull request Jan 25, 2022
… access

So you can run Caddy etc as a non-root user and let it have access to
get certs.

Updates caddyserver/caddy#4541

Change-Id: Iecc5922274530e2b00ba107d4b536580f374109b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@bradfitz
Copy link
Contributor

bradfitz commented Jan 25, 2022

And I've pushed tailscale/tailscale#3809 to add support for a TS_PERMIT_CERT_UID env variable that users can add to /etc/default/tailscaled to grant the caddy user's UID access to fetch certs.

@bradfitz
Copy link
Contributor

bradfitz commented Jan 25, 2022

Tailscale client code is now split out into https://github.com/tailscale/tscert

Tested back to Go 1.15, but it probably supports earlier.

@bradfitz
Copy link
Contributor

BTW, the tscert.GetStatus func can also tell you which (0, 1, or more) subjects that the local tailscaled can handle.

@mholt
Copy link
Member Author

mholt commented Jan 25, 2022

Brilliant! Thanks Brad.

I pushed a commit which adds context to the GetCertificate call and checks the Status to see if the domain in the handshake is in the list. If it's not a cert provisioned by Tailscale, false+nil is returned to CertMagic so that CertMagic knows to continue with its own management logic. I also changed to using the tscert package (oops, I said tsclient in my commit message; will try to remember to fix when squashing) which is much lighter. 💯

We'll see how the tests do now.

I'll probably figure out some simple response body for the HTTP getter just as a cherry on top, though I don't think we'll need it at this time.

@mholt
Copy link
Member Author

mholt commented Jan 25, 2022

You know, it just occurred to me, that we could probably make this a default, implicit behavior. If Tailscale is running AND happens to have a certificate for the hostname, just use it. If not, then continue with our own cert.

Returning nil, false, nil from GetCertificate() causes CertMagic to carry on normally. So it already supports that.

If we can guarantee the latency of checking the Tailscale status is very minimal (maybe we cache the results for a certain amount of time? or set a context with a timeout of ~5ms?), maybe this is safe to do. We could even restrict any of this implicit behavior to activate only on names ending in .ts.net.

Of course, if get_certificate tailscale was explicitly configured, we could ease some of these restrictions and actually return errors when errors happen (like the current implementation), since if it's explicitly configured then we expect it to work.

I'm just thinking, maybe it doesn't hurt to try Tailscale implicitly even if not configured, then it really will "just work". Caddy +Tailscale, ladies and gentlemen. 🤵‍♂️

@bradfitz
Copy link
Contributor

We could even restrict any of this implicit behavior to activate only on names ending in .ts.net.

That's a safe heuristic. SGTM. We have no plans to offer other domain name suffixes. But we may permit people to BYODomain in the future, so if you make it automatic on *.ts.net, just let people explicitly enable it as well. Also, https://github.com/juanfont/headscale will likely add such support in the future and each user using that would necessarily have their own domain name as well.

And use reuse CertMagic's PEM functions for private keys.
@mholt
Copy link
Member Author

mholt commented Jan 30, 2022

The HTTP cert getter should be working now. It expects PEM blocks and it will decode them into a TLS certificate and private key pair.

I think we're ready to merge this, and I'll try to implement implicit Tailscale cert-getting as a cherry on top in another PR, if I can make it work. (It wasn't a requirement, I just think it'd be cool.)

@mholt mholt removed the in progress 🏃‍♂️ Being actively worked on label Jan 30, 2022
Tailscale does its own caching and we don't need the added complexity...
for now, at least.
@mholt
Copy link
Member Author

mholt commented Feb 16, 2022

What happened to the CI checks... (Edit: they're working now; were stuck for like 3 days)

- Option to disable cert automation in auto HTTPS
- Support multiple cert managers
- Remove cache feature from cert manager modules
- Minor improvements to auto HTTPS logging
Only for domains ending in .ts.net.

I think this is really cool!
@BioEvo
Copy link

BioEvo commented Mar 3, 2023

The HTTP cert getter should be working now. It expects PEM blocks and it will decode them into a TLS certificate and private key pair.

I think we're ready to merge this, and I'll try to implement implicit Tailscale cert-getting as a cherry on top in another PR, if I can make it work. (It wasn't a requirement, I just think it'd be cool.)

It seems like HTTP cert gatter is ignored on caddy v2.6.4. I set the get_certificate section in caddyfile, caddy verified all parameters and directly goes to HTTP-01 challenge. Log of cert.mysite shows no http request from web.mysite.

https://web.mysite {
    tls {
        get_certificate http http://cert.mysite/cert
    }
    ...
}

@mholt
Copy link
Member Author

mholt commented Mar 3, 2023

@BioEvo Could you open a new issue please? Be sure to enable debug logging and include all the logs. (And please don't redact the domain names, we'll need the actual values to troubleshoot.) Also the last Caddy version it does work on. The more specific you are, the more specific we can be. Thank you!

@BioEvo
Copy link

BioEvo commented Mar 3, 2023

@BioEvo Could you open a new issue please? Be sure to enable debug logging and include all the logs. (And please don't redact the domain names, we'll need the actual values to troubleshoot.) Also the last Caddy version it does work on. The more specific you are, the more specific we can be. Thank you!

Thank you! I try to reproduce this in docker: #5415 (comment) .

I'm not sure how will caddy openration with HTTP CertManager, but I think if the config is like:

https://web.mysite {
    tls {
        get_certificate http http://cert.mysite/cert
    }
    ...
}

then caddy might try to request http://cert.mysite/cert?server_name=web.mysite&signature_schemes=...&cipher_suites=...
then certmgr might responds something like below with code 200:

-----BEGIN CERTIFICATE-----
[SOME_BASE64]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[SOME_BASE64]
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
[SOME_BASE64]
-----END CERTIFICATE-----
-----BEGIN EC PRIVATE KEY-----
[SOME_BASE64]
-----END EC PRIVATE KEY-----

#5415 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants