-
Notifications
You must be signed in to change notification settings - Fork 25
Add SSL support via ACME / LetsEncrypt #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bin/dehydrated
Outdated
| @@ -0,0 +1,1116 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather than copying this or pulling it down in the Dockerfile, we could make this a git submodule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible, but all we need is the script from that module, and we will want to pin it to a version. So the build will either curl -O a url (with version) or cloning a submodule, checking out the right version there, and copying the script from there to out bin path.
etc/dehydrated/hook.sh
Outdated
| @@ -0,0 +1,45 @@ | |||
| #!/usr/bin/env bash | |||
|
|
|||
| CONSUL_ROOT="http://localhost:8500/v1/kv/nginx" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ContainerPilot configuration supports non-localhost Consul paths depending on the value of CONSUL_AGENT variable. We should probably try to translate the same configuration to bash here:
"consul": "{{ if .CONSUL_AGENT }}localhost{{ else }}{{ .CONSUL }}{{ end }}:8500",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I'll do this. This PR is very WIP, so some of what you see if just a result of me testing this out locally, outside of containers.
etc/dehydrated/hook.sh
Outdated
| @@ -0,0 +1,45 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A TODO as you refine this: given how many HTTP calls we have to make in this script, we might want to consider setting set -e here to fail fast or have a trap on the error so that we get some nice noisy logging or problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think set -e worked that way for bash function calls. I could be mistaken. This script defines a handful of functions. Then dehydrated sources this file and invokes those functions at the proper time. If set -e would work in this context, I'm not sure what impact it would have on dehydrated. Perhaps the better approach here would be to && the curl commands? Something like:
curl blah1 &&
curl blah2 &&
curl blah3Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think you're right on that. Chaining the calls with && might be better.
etc/dehydrated/hook.sh
Outdated
| function deploy_challenge { | ||
| local DOMAIN="${1}" TOKEN_FILENAME="${2}" TOKEN_VALUE="${3}" | ||
|
|
||
| curl -sX PUT -d "${TOKEN_FILENAME}" ${CONSUL_ROOT}/domains/${DOMAIN}/acme/challenge/token-filename > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest avoiding piping these to /dev/null particularly during the development process. Consul isn't very chatty as it is and it'd be good to know that something went wrong. Maybe we pipe them to a log function somewhere in this same script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I actually put those there during development because they were spamming true to STDOUT and cluttering up other stuff I was trying to read. You'd hope alarms would go to STDERR, but I know that's not always the case.
…main for now), return from hook funcs early on failure, limit retries on verifying challenge token has been distributed
…ponsible for renewing certs)
…eaning certificates (if leader). Also added a co-process responsible for writing acme challenge tokens as well as certs provided by the leader (leader uses this too)
…s (default to staging so as not to exceed limits in prod)
etc/containerpilot.json
Outdated
| "--domain", "{{ .ACME_DOMAIN }}", | ||
| "--hook", "/etc/dehydrated/hook.sh", | ||
| "--config", "/etc/dehydrated/config"], | ||
| "--config", "/etc/dehydrated/config.{{ .ACME_ENV }}"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good way to default this to staging in the case it's omitted from docker-compose.yml? cc @tgross
…pting ACME challenge
…on. Add init command to be used by health check to insure certs are installed before ssl health check is a success. This may change in the future, see: TritonDataCenter/containerpilot#227
…sed on existance of keys
…init, for startup reasons
etc/nginx/nginx-ssl.conf.ctmpl
Outdated
| alias /var/www/acme/challenge; | ||
| } | ||
| {{ end }} | ||
| listen 443 ssl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be under separate server blocks? With this setup we're accepting requests for the default (_) hostname on port 80 and our secured domain on port 443 together. I think we want to have the port 80 server block only serve the items we absolutely need it to and then redirect to the port 443 server block for everything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch. Will do that.
| service: nginx | ||
| build: . | ||
| mem_limit: 128g | ||
| mem_limit: 128m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that one. 😊
README.md
Outdated
|
|
||
| ### Configuring LetsEncrypt (ACME) | ||
|
|
||
| Setting the `ACME_DOMAIN` environment variable will enable LetsEncrypt within |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to hand-wrap markdown at 80 cols.
README.md
Outdated
| `ACME_ENV` environment variable to `production`. | ||
|
|
||
| You must ensure the domain resolves to your Nginx containers so that they can | ||
| respond to the ACME http challenges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should link to a doc showing that you can CNAME to your Triton CNS service name, for Triton users?
etc/acme/dehydrated/hook.sh
Outdated
| local DOMAIN="${1}" TOKEN_FILENAME="${2}" TOKEN_VALUE="${3}" | ||
|
|
||
| curl -sX DELETE ${CONSUL_KEY_ROOT}/acme/challenge/token-filename | log | ||
| test ${PIPESTATUS[0]} -eq 0 || (log "${FUNCNAME} failed" && return) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the result of the discussion we had here, right? This makes this kinda hard to find the logic amongst all the error handling. I think we could clean this up if we use set -o pipefail and a trap in this script for errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. No objections to changing it to pipefail. Will be cleaner. For the same reason in the last discussion though, I'm not sure about using a trap due to the fact that this script is sourced by dehydrated. I don't want to exit on errors in this script. I want dehydrated to capture the error and do whatever it needs to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but doesn't dehydrated itself have set -euo pipefail though? It looks to me like if we're returning anything other than 0 from the function then we're just bailing out.
Edit for clarity: the only reason I'm suggesting a trap at all is to massage the error messages. ContainerPilot will log things with the appropriate timestamps and ERROR block, etc. so if you're happy with that then so am I.
…ocation. This prevents empty cert/key files from being written and simplifies logic pertaining to their existence
|
LGTM. |
This PR leverages dehydrated to request and renew certs. The
dehydratedscript offers a hook system which is used to send challenge tokens, certs, and keys to consul. Consul Template is used to generate the tokens, certs, and keys on disk from Consul Agent.Only one Nginx container will be responsible for renewing certificates and writing to Consul. A simple consul lock acquisition is used to determine which container is the leader. All nginx containers, including the leader, watch consul for changes and generate the corresponding files as changes occur. After certs/keys are generated, Nginx is instructed to reload the config.
In order to activate this functionality, a single environment variable,
ACME_DOMAINmust be passed to the Nginx container at runtime who's value is the domain the container(s) should request a certificate for. It's up to the user to have the domain resolve to the containers so that ACME challenges can be responded to.By default, the LetsEncrypt staging endpoint will be used, which allows the developer to iterate and get this working without hitting the API limits imposed on the production endpoints. Once the user is ready to move to production, and get real certificates, a 2nd environment variable must be passed:
ACME_ENV=production(this is consistent withNODE_ENVand others).Further implementation details:
consul-templatedoes not support writing to a dynamic file name which is required for acme challenge responses, so we write to a temporary fileconsul-template, then spawn a process that generates the properly named token.reload.shscript was modified to choose the correct one based on whether certs/keys are found on disk.docker-compose.yml.acmescript was added (to/usr/local/bin) which contains container specific log for ACME operations. This works in conjunction with thedehydratedscript and hooks (located at/etc/acme/dehydrated/hook.sh).acmescript.acme-checkin- Renews consul session and attempts to become leader. Runs every 5m.acme-renew-certs- Acquires or renews certificate forACME_DOMAIN. Runs every 12h.clean-unused-certs- Cleans old certs. Runs once per day.Other miscellaneous improvements:
jqwas added to the container for more reliable JSON parsing inbash