Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caddytls: Evict internal certs from cache based on issuer #6266

Merged
merged 7 commits into from Apr 30, 2024

Conversation

mholt
Copy link
Member

@mholt mholt commented Apr 23, 2024

During a config reload, we would keep certs in the cache if they were used by the next config. If one config uses InternalIssuer and the other uses a public CA, this behavior is problematic / unintuitive, because there is a big difference between private/public CAs: the new config would either just keep using the same certs from before the reload, which is unexpected in this case.

This change should ensure that internal issuers are considered when deciding whether to keep or evict from the cache during a reload, by making them distinct from each other and certs from public CAs.

During a config reload, we would keep certs in the cache fi they were used  by the next config. If one config uses InternalIssuer and the other uses a public CA, this behavior is problematic / unintuitive, because there is a big difference between private/public CAs.

This change should ensure that internal issuers are considered when deciding whether to keep or evict from the cache during a reload, by making them distinct from each other and certs from public CAs.
@mholt mholt added the bug 🐞 Something isn't working label Apr 23, 2024
@mholt mholt added this to the v2.8.0 milestone Apr 23, 2024
@emilylange
Copy link
Member

hmm I don't think this works yet.

Commenting tls internal in an out, with Caddy started using caddy run --watch.

On master, Caddy continues to serve whatever certificate was loaded when it Caddy loaded it's initial config, after a reload.

Now, Caddy fails to serve that vhost at all, and does not try to issue a certificate against the new issuer.

❯ cat Caddyfile
example.com
tls internal
2024/04/26 16:58:52.492	INFO	using config from file	{"file": "Caddyfile"}
2024/04/26 16:58:52.493	INFO	adapted config to JSON	{"adapter": "caddyfile"}
2024/04/26 16:58:52.494	INFO	admin	admin endpoint started	{"address": "localhost:2019", "enforce_origin": false, "origins": ["//localhost:2019", "//[::1]:2019", "//127.0.0.1:2019"]}
2024/04/26 16:58:52.494	INFO	tls.cache.maintenance	started background certificate maintenance	{"cache": "0xc0005f7500"}
2024/04/26 16:58:52.494	INFO	http.auto_https	server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS	{"server_name": "srv0", "https_port": 443}
2024/04/26 16:58:52.494	INFO	http.auto_https	enabling automatic HTTP->HTTPS redirects	{"server_name": "srv0"}
2024/04/26 16:58:52.494	INFO	http.log	server running	{"name": "remaining_auto_https_redirects", "protocols": ["h1", "h2", "h3"]}
2024/04/26 16:58:52.495	INFO	http	enabling HTTP/3 listener	{"addr": ":443"}
2024/04/26 16:58:52.495	INFO	failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
2024/04/26 16:58:52.495	INFO	http.log	server running	{"name": "srv0", "protocols": ["h1", "h2", "h3"]}
2024/04/26 16:58:52.495	INFO	http	enabling automatic TLS certificate management	{"domains": ["example.com"]}
2024/04/26 16:58:52.496	WARN	tls	stapling OCSP	{"error": "no OCSP stapling for [example.com]: no OCSP server specified in certificate", "identifiers": ["example.com"]}
2024/04/26 16:58:52.496	INFO	pki.ca.local	root certificate is already trusted by system	{"path": "storage:pki/authorities/local/root.crt"}
2024/04/26 16:58:52.496	INFO	autosaved config (load with --resume flag)	{"file": "/home/me/.config/caddy/autosave.json"}
2024/04/26 16:58:52.496	INFO	serving initial configuration
2024/04/26 16:58:52.496	INFO	watcher	watching config file for changes	{"config_file": "Caddyfile"}
2024/04/26 16:58:52.499	INFO	tls	storage cleaning happened too recently; skipping for now	{"storage": "FileStorage:/home/me/.local/share/caddy", "instance": "a599c91f-e919-4c92-aaa2-8827da162957", "try_again": "2024/04/27 16:58:52.499", "try_again_in": 86399.999999557}
2024/04/26 16:58:52.499	INFO	tls	finished cleaning storage units
❯ curl --resolve example.com:443:127.0.0.1 https://example.com -ki
HTTP/2 200 
alt-svc: h3=":443"; ma=2592000
server: Caddy
content-length: 0
date: Fri, 26 Apr 2024 16:58:56 GMT
❯ cat Caddyfile
example.com
#tls internal
2024/04/26 16:59:05.498	INFO	watcher	config file changed; reloading	{"config_file": "Caddyfile"}
2024/04/26 16:59:05.498	INFO	admin	admin endpoint started	{"address": "localhost:2019", "enforce_origin": false, "origins": ["//[::1]:2019", "//127.0.0.1:2019", "//localhost:2019"]}
2024/04/26 16:59:05.499	INFO	http.auto_https	server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS	{"server_name": "srv0", "https_port": 443}
2024/04/26 16:59:05.499	INFO	http.auto_https	enabling automatic HTTP->HTTPS redirects	{"server_name": "srv0"}
2024/04/26 16:59:05.499	INFO	http	enabling HTTP/3 listener	{"addr": ":443"}
2024/04/26 16:59:05.499	INFO	http.log	server running	{"name": "srv0", "protocols": ["h1", "h2", "h3"]}
2024/04/26 16:59:05.499	INFO	admin	stopped previous server	{"address": "localhost:2019"}
2024/04/26 16:59:05.499	INFO	http.log	server running	{"name": "remaining_auto_https_redirects", "protocols": ["h1", "h2", "h3"]}
2024/04/26 16:59:05.499	INFO	http	enabling automatic TLS certificate management	{"domains": ["example.com"]}
2024/04/26 16:59:05.499	INFO	http	servers shutting down with eternal grace period
2024/04/26 16:59:05.499	INFO	autosaved config (load with --resume flag)	{"file": "/home/me/.config/caddy/autosave.json"}
❯ curl --resolve example.com:443:127.0.0.1 https://example.com -ki
curl: (35) quictls/3.1.4: error:0A000438:SSL routines::tlsv1 alert internal error

On the current master, curl is presented with the tls internal certificate before and after the reload.

❯ curl --resolve example.com:443:127.0.0.1 https://example.com -ki
HTTP/2 200 
alt-svc: h3=":443"; ma=2592000
server: Caddy
content-length: 0
date: Fri, 26 Apr 2024 17:06:21 GMT
 
❯ curl --resolve example.com:443:127.0.0.1 https://example.com -ki
HTTP/2 200 
alt-svc: h3=":443"; ma=2592000
server: Caddy
content-length: 0
date: Fri, 26 Apr 2024 17:06:26 GMT

@mholt
Copy link
Member Author

mholt commented Apr 26, 2024

Thanks for the reproducer, currently investigating!

@mholt
Copy link
Member Author

mholt commented Apr 26, 2024

@emilylange I was able to see the behavior with those instructions and pushed a fix that worked for me. Please verify it works for your use case and then we'll merge this! 😃

@emilylange
Copy link
Member

Now the following chain of configs will leave the vhost without any certificate at all 🫣

example.com
tls internal
example.com
# tls internal

Certificate cannot be obtained from LE, because

2024/04/26 22:01:50.567	ERROR	tls.obtain	could not get certificate from issuer	{"identifier": "example.com", "issuer": "acme-v02.api.letsencrypt.org-directory", "error": "HTTP 400 urn:ietf:params:acme:error:rejectedIdentifier - Error creating new order :: Cannot issue for \"example.com\": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy"}
2024/04/26 22:01:50.567	ERROR	tls.obtain	will retry	{"error": "[example.com] Obtain: [example.com] creating new order: attempt 1: https://acme-v02.api.letsencrypt.org/acme/new-order: HTTP 400 urn:ietf:params:acme:error:rejectedIdentifier - Error creating new order :: Cannot issue for \"example.com\": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy (ca=https://acme-v02.api.letsencrypt.org/directory)", "attempt": 1, "retrying_in": 60, "elapsed": 1.111494555, "max_duration": 2592000}

(which is to be expected)

*Manually rolling back to tls internal*

example.com
tls internal

now throws

2024/04/26 22:02:03.451	ERROR	tls	job failed	{"error": "example.com: obtaining certificate: context canceled"}

and checking against it with curl yields:

❯ curl --resolve example.com:443:127.0.0.1 https://example.com -ki
curl: (35) quictls/3.1.4: error:0A000438:SSL routines::tlsv1 alert internal error

@mholt
Copy link
Member Author

mholt commented Apr 26, 2024

@emilylange Ah, indeed.

However it only seems to do that if getting the cert from the production CA (in that second config) fails. If it succeeds, going back to tls internal works for me. (Can you confirm?)

Dinner time, but I'll try to figure out why soon.

@mholt
Copy link
Member Author

mholt commented Apr 27, 2024

@emilylange Ok, I got it working in the case that the first reload (so, the second config) doesn't have an error getting or loading the certificate. But I'm still stumped by the context canceled error when a cert error is had, I need to keep working that, but am out of time for tonight.

@mholt
Copy link
Member Author

mholt commented Apr 27, 2024

@emilylange Actually the context canceled is a red herring -- that is the previous config saying it's giving up trying to get a production cert for example.com because a new config was loaded in its place. After trying to connect to Caddy, it actually still responds with the internal version of the cert.

So I think things are working and ready for your final check!

@mholt
Copy link
Member Author

mholt commented Apr 29, 2024

Going to merge this soon so it gets in beta 1.

@mholt mholt merged commit d129ae6 into master Apr 30, 2024
23 checks passed
@mholt mholt deleted the tls-cert-cache-internal-reload branch April 30, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants