Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #248 from cljdoc/blue-green
cljdoc ops 2.0
- Loading branch information
Showing
44 changed files
with
942 additions
and
558 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Use Nomad For Deployment | ||
|
||
## Status | ||
|
||
Accepted | ||
|
||
## Context | ||
|
||
cljdoc's deployment story has been simplistic but effective. To recap: | ||
|
||
- During CI a zip file is pushed to S3 that contains all files to run the application | ||
- On the live server there is systemd service that will download an archive and run it. The | ||
version of the downloaded archive is specified via a file on the server. | ||
|
||
Updating simply required updating a file on the server and restarting the service. | ||
|
||
The issue with this approach however is that every time a new release was pushed to the server | ||
the restart of the systemd service would incur up to a minute of downtime. While this generally | ||
isn't a huge deal it discourages certain development practices that may be desirable such as | ||
Continuous Deployment. | ||
|
||
Our existing deployment setup (and tools used by it) are poorly equipped to handle this kind of | ||
deployment scenario. A large amount of bash scripts would be required to start a new cljdoc server, | ||
wait for it to become available, update nginx's upstream port, and kill old cljdoc server instances | ||
in a repeatable, automated manner. Likely these bash scripts would be error-prone and turn into | ||
something nobody likes to touch. | ||
|
||
## Decision | ||
|
||
Implement a *canary deploy* mechanism for the cljdoc server application using | ||
[Nomad](https://nomadproject.io) and [Traefik](https://traefik.io). | ||
|
||
While both of these tools are probably aimed at much more complex workloads they provide the | ||
following benefits over the existing systemd/Nginx setup: | ||
|
||
- Automatic SSL certificates via Lets Encrypt | ||
- Declarative specification of jobs and their desired update semantics | ||
- APIs to schedule new jobs and cycle old/new deployments | ||
- Health checks to verify new deployments work as expected | ||
- Machine images become much simpler since they only need Nomad, Consul, Docker | ||
|
||
This simplifies a lot of cljdoc's operational tasks while also enabling Continuous Deployment. | ||
|
||
## Consequences | ||
|
||
#### Inefficient Resource Allocation | ||
|
||
The way Nomad [handles canary deployments](https://www.nomadproject.io/guides/operating-a-job/update-strategies/blue-green-and-canary-deployments.html) requires that there are sufficient resources available to run two sets of tasks side by side. | ||
|
||
This results in instances only operating at half their capacity. In practice the cljdoc | ||
server never ran into resource constraints so this likely won't cause any actual problems | ||
but it's an imperfection nonetheless. | ||
|
||
> **Note:** The scaling plan for cljdoc has always been to put a CDN in front of the backend. | ||
#### More Complexity | ||
|
||
Nomad and Consul are complex tools, designed for multi-instance orchestration. Compared to | ||
shell scripts this also forces additional complexity upon developers trying to work with these | ||
tools on their own machines. | ||
|
||
#### Atypical Nomad Usage | ||
|
||
Running a single node "cluster" is also an atypical usage scenario and thus may receive limited | ||
support or improvements in the future. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# cljdoc deploy | ||
|
||
Please see [`ops/README.adoc`](/ops/README.adoc) for details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{:paths ["src" "resources"] | ||
:deps {clj-http {:mvn/version "3.9.1"} | ||
cli-matic {:mvn/version "0.2.10"} | ||
aero {:mvn/version "1.1.3"} | ||
cheshire/cheshire {:mvn/version "5.8.1"} | ||
org.clojure/tools.logging {:mvn/version "0.4.1"} | ||
spootnik/unilog {:mvn/version "0.7.22"}, | ||
com.jcraft/jsch {:mvn/version "0.1.55"}}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
{:Job | ||
{:Datacenters ["dc1"], | ||
:ID "cljdoc", | ||
:Name "cljdoc", | ||
:TaskGroups | ||
[{:Count 1, | ||
:Name "cljdoc", | ||
:RestartPolicy {:Attempts 2, | ||
:Delay 15000000000, | ||
:Interval 1800000000000, | ||
:Mode "fail"}, | ||
:Tasks | ||
[{:Artifacts nil, | ||
:Config {:image #join ["cljdoc/cljdoc:" #cljdoc.deploy/opt :docker-tag] | ||
:port_map [{:http 8000}], | ||
:volumes ["secrets:/etc/cljdoc" | ||
"/data/cljdoc:/var/cljdoc"]}, | ||
:Driver "docker", | ||
:Env {:CLJDOC_SECRETS "/etc/cljdoc/secrets.edn" | ||
:CLJDOC_PROFILE "prod"}, | ||
:KillSignal "", | ||
:Name "backend", | ||
:Resources | ||
{:CPU 800, | ||
:MemoryMB 848, | ||
:Networks [{:DynamicPorts [{:Label "http", :Value 0}], | ||
:MBits 10}]}, | ||
:Services [{:Name "cljdoc", | ||
:PortLabel "http", | ||
:Checks [{:Name "alive" | ||
:PortLabel "http" | ||
:Interval #nomad/seconds 10 | ||
:Timeout #nomad/seconds 2 | ||
:Type "tcp"}] | ||
:Tags | ||
["traefik.tags=cljdoc" | ||
"traefik.frontends.blue.rule=PathPrefix:/" | ||
"traefik.frontends.blue.entryPoints=http,https"]}], | ||
:Templates [{:DestPath "secrets/secrets.edn", | ||
:EmbeddedTmpl "{{key \"config/cljdoc/secrets-edn\"}}"}]}], | ||
:Update {:Canary 1 | ||
:MaxParallel 1 | ||
:HealthyDeadline #nomad/seconds 180 | ||
:MinHealthyTime #nomad/seconds 10}} | ||
{:Name "lb", | ||
:Tasks | ||
[{:Config {:image "traefik:1.7.4-alpine" | ||
:network_mode "host" | ||
:port_map [{:api 8080, :http 80, :https 443}] | ||
:volumes ["local:/etc/traefik" | ||
"/data/traefik:/data"]} | ||
:Driver "docker" | ||
:Name "traefik" | ||
:Resources | ||
{:CPU 100 | ||
:MemoryMB 128 | ||
:Networks [{:MBits 10 | ||
:ReservedPorts [{:Label "api", :Value 8080} | ||
{:Label "http", :Value 80} | ||
{:Label "https", :Value 443}]}]} | ||
:Services [{:Name "traefik" | ||
:PortLabel "http" | ||
:Checks [{:Name "alive" | ||
:Interval #nomad/seconds 10 | ||
:Timeout #nomad/seconds 2 | ||
:Type "tcp"}]}] | ||
:Templates [{:DestPath "local/traefik.toml" | ||
:EmbeddedTmpl "{{key \"config/traefik-toml\"}}"}]}]}] | ||
:Type "service"}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{:circle-ci {:api-token #env! CIRCLE_API_TOKEN | ||
:builder-project #env! CIRCLE_BUILDER_PROJECT} | ||
:sentry {:dsn #env! SENTRY_DSN} | ||
:telegram {:bot-token #env! TELEGRAM_BOT_TOKEN | ||
:chat-id #env! TELEGRAM_CHAT_ID}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
[accessLog] | ||
filePath = "/data/access.log" | ||
|
||
[entryPoints] | ||
[entryPoints.http] | ||
address = ":80" | ||
[entryPoints.http.redirect] | ||
entryPoint = "https" | ||
[entryPoints.https] | ||
compress = true | ||
address = ":443" | ||
[entryPoints.https.tls] | ||
[entryPoints.https.redirect] | ||
regex = "^https://.*cljdoc.xyz/(.*)" | ||
replacement = "https://.*cljdoc.org/$1" | ||
permanent = true | ||
|
||
[acme] | ||
email = "martinklepsch@googlemail.com" | ||
storage = "/data/acme.json" | ||
entryPoint = "https" | ||
onHostRule = true | ||
[acme.tlsChallenge] | ||
|
||
[[acme.domains]] | ||
main = "test2.cljdoc.org" | ||
[[acme.domains]] | ||
main = "test2.cljdoc.xyz" | ||
[[acme.domains]] | ||
main = "cljdoc.org" | ||
[[acme.domains]] | ||
main = "cljdoc.xyz" | ||
|
||
[api] | ||
[ping] | ||
|
||
[consulCatalog] | ||
prefix = "traefik" | ||
constraints = ["tag==cljdoc"] | ||
watch = true |
Oops, something went wrong.