Skip to content

Conversation

@jasonpincin
Copy link
Contributor

@jasonpincin jasonpincin commented Sep 30, 2016

This PR leverages dehydrated to request and renew certs. The dehydrated script offers a hook system which is used to send challenge tokens, certs, and keys to consul. Consul Template is used to generate the tokens, certs, and keys on disk from Consul Agent.

Only one Nginx container will be responsible for renewing certificates and writing to Consul. A simple consul lock acquisition is used to determine which container is the leader. All nginx containers, including the leader, watch consul for changes and generate the corresponding files as changes occur. After certs/keys are generated, Nginx is instructed to reload the config.

In order to activate this functionality, a single environment variable, ACME_DOMAIN must be passed to the Nginx container at runtime who's value is the domain the container(s) should request a certificate for. It's up to the user to have the domain resolve to the containers so that ACME challenges can be responded to.

By default, the LetsEncrypt staging endpoint will be used, which allows the developer to iterate and get this working without hitting the API limits imposed on the production endpoints. Once the user is ready to move to production, and get real certificates, a 2nd environment variable must be passed: ACME_ENV=production (this is consistent with NODE_ENV and others).

Further implementation details:

  • Unfortunately, consul-template does not support writing to a dynamic file name which is required for acme challenge responses, so we write to a temporary file consul-template, then spawn a process that generates the properly named token.
  • For startup purposes, there's now two nginx config templates. One with SSL directives and one without. The reload.sh script was modified to choose the correct one based on whether certs/keys are found on disk.
  • Port 443 is being exposed by default in docker-compose.yml.
  • An acme script was added (to /usr/local/bin) which contains container specific log for ACME operations. This works in conjunction with the dehydrated script and hooks (located at /etc/acme/dehydrated/hook.sh).
  • Three Container Pilot tasks are added when ACME is enabled. The logic for these tasks is contained within the acme script.
    • acme-checkin - Renews consul session and attempts to become leader. Runs every 5m.
    • acme-renew-certs - Acquires or renews certificate for ACME_DOMAIN. Runs every 12h.
    • clean-unused-certs - Cleans old certs. Runs once per day.

Other miscellaneous improvements:

  • in order to reduce logging noise health check 200 responses are no longer logged
  • corrected memory limit in local-compose.yml
  • Upgraded to Consul v0.7.0 in Nginx, Consul, and example backend containers
  • jq was added to the container for more reliable JSON parsing in bash

bin/dehydrated Outdated
@@ -0,0 +1,1116 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather than copying this or pulling it down in the Dockerfile, we could make this a git submodule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible, but all we need is the script from that module, and we will want to pin it to a version. So the build will either curl -O a url (with version) or cloning a submodule, checking out the right version there, and copying the script from there to out bin path.

@@ -0,0 +1,45 @@
#!/usr/bin/env bash

CONSUL_ROOT="http://localhost:8500/v1/kv/nginx"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ContainerPilot configuration supports non-localhost Consul paths depending on the value of CONSUL_AGENT variable. We should probably try to translate the same configuration to bash here:

"consul": "{{ if .CONSUL_AGENT }}localhost{{ else }}{{ .CONSUL }}{{ end }}:8500",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I'll do this. This PR is very WIP, so some of what you see if just a result of me testing this out locally, outside of containers.

@@ -0,0 +1,45 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TODO as you refine this: given how many HTTP calls we have to make in this script, we might want to consider setting set -e here to fail fast or have a trap on the error so that we get some nice noisy logging or problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think set -e worked that way for bash function calls. I could be mistaken. This script defines a handful of functions. Then dehydrated sources this file and invokes those functions at the proper time. If set -e would work in this context, I'm not sure what impact it would have on dehydrated. Perhaps the better approach here would be to && the curl commands? Something like:

curl blah1 &&
curl blah2 &&
curl blah3

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you're right on that. Chaining the calls with && might be better.

function deploy_challenge {
local DOMAIN="${1}" TOKEN_FILENAME="${2}" TOKEN_VALUE="${3}"

curl -sX PUT -d "${TOKEN_FILENAME}" ${CONSUL_ROOT}/domains/${DOMAIN}/acme/challenge/token-filename > /dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest avoiding piping these to /dev/null particularly during the development process. Consul isn't very chatty as it is and it'd be good to know that something went wrong. Maybe we pipe them to a log function somewhere in this same script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I actually put those there during development because they were spamming true to STDOUT and cluttering up other stuff I was trying to read. You'd hope alarms would go to STDERR, but I know that's not always the case.

"--domain", "{{ .ACME_DOMAIN }}",
"--hook", "/etc/dehydrated/hook.sh",
"--config", "/etc/dehydrated/config"],
"--config", "/etc/dehydrated/config.{{ .ACME_ENV }}"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good way to default this to staging in the case it's omitted from docker-compose.yml? cc @tgross

@jasonpincin jasonpincin changed the title [WIP] Add SSL support via ACME / LetsEncrypt Add SSL support via ACME / LetsEncrypt Oct 6, 2016
alias /var/www/acme/challenge;
}
{{ end }}
listen 443 ssl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be under separate server blocks? With this setup we're accepting requests for the default (_) hostname on port 80 and our secured domain on port 443 together. I think we want to have the port 80 server block only serve the items we absolutely need it to and then redirect to the port 443 server block for everything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch. Will do that.

service: nginx
build: .
mem_limit: 128g
mem_limit: 128m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that one. 😊

README.md Outdated

### Configuring LetsEncrypt (ACME)

Setting the `ACME_DOMAIN` environment variable will enable LetsEncrypt within
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to hand-wrap markdown at 80 cols.

README.md Outdated
`ACME_ENV` environment variable to `production`.

You must ensure the domain resolves to your Nginx containers so that they can
respond to the ACME http challenges.
Copy link
Contributor

@tgross tgross Oct 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should link to a doc showing that you can CNAME to your Triton CNS service name, for Triton users?

local DOMAIN="${1}" TOKEN_FILENAME="${2}" TOKEN_VALUE="${3}"

curl -sX DELETE ${CONSUL_KEY_ROOT}/acme/challenge/token-filename | log
test ${PIPESTATUS[0]} -eq 0 || (log "${FUNCNAME} failed" && return)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the result of the discussion we had here, right? This makes this kinda hard to find the logic amongst all the error handling. I think we could clean this up if we use set -o pipefail and a trap in this script for errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. No objections to changing it to pipefail. Will be cleaner. For the same reason in the last discussion though, I'm not sure about using a trap due to the fact that this script is sourced by dehydrated. I don't want to exit on errors in this script. I want dehydrated to capture the error and do whatever it needs to do.

Copy link
Contributor

@tgross tgross Oct 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but doesn't dehydrated itself have set -euo pipefail though? It looks to me like if we're returning anything other than 0 from the function then we're just bailing out.

Edit for clarity: the only reason I'm suggesting a trap at all is to massage the error messages. ContainerPilot will log things with the appropriate timestamps and ERROR block, etc. so if you're happy with that then so am I.

@tgross
Copy link
Contributor

tgross commented Oct 7, 2016

LGTM.

@jasonpincin jasonpincin merged commit e132eff into autopilotpattern:master Oct 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants