Skip to content

Commit

Permalink
Merge pull request #248 from cljdoc/blue-green
Browse files Browse the repository at this point in the history
cljdoc ops 2.0
  • Loading branch information
martinklepsch committed Dec 12, 2018
2 parents a8c7148 + 93bc066 commit 64faec3
Show file tree
Hide file tree
Showing 44 changed files with 942 additions and 558 deletions.
21 changes: 16 additions & 5 deletions .circleci/config.yml
Expand Up @@ -21,6 +21,7 @@ jobs:
- v1-npm-dependencies- # fallback if cache not found

- run: npm ci
- run: ./script/package

- restore_cache:
keys:
Expand All @@ -30,14 +31,9 @@ jobs:
- run: clojure -Spath

- run: ./script/cljdoc ingest --project bidi --version 2.1.3
- run: ./script/analyze.sh bidi 2.1.3 ~/.m2/repository/bidi/bidi/2.1.3/bidi-2.1.3.jar ~/.m2/repository/bidi/bidi/2.1.3/bidi-2.1.3.pom
- run: ./.circleci/run_if_changed.sh modules/analysis-runner/ "cd modules/analysis-runner/; clojure extended-test.clj"

- run: clojure -A:test
- store_test_results:
path: target/

- run: ./script/package

- persist_to_workspace:
root: .
Expand Down Expand Up @@ -85,12 +81,27 @@ jobs:
name: Deploy to S3
command: aws s3 sync workspace/target s3://$RELEASES_BUCKET_NAME/build-$CIRCLE_SHA1/ --delete

docker-deploy:
machine: true
steps:
- checkout
- attach_workspace:
at: .
- run: docker login -u $DOCKER_USER -p $DOCKER_PASS
# because target/ has been put into place `make image` can be
# ran without running ./script/package (which would require npm)
- run: cd ops/docker && make image
- run: docker push cljdoc/cljdoc

workflows:
version: 2
build-and-deploy:
jobs:
- build
- prettier
- docker-deploy:
requires:
- build
- deploy:
requires:
- build
Expand Down
5 changes: 3 additions & 2 deletions .gitignore
Expand Up @@ -3,9 +3,10 @@ target/
docker/docker-target
log/
data/
ops/.terraform
ops/terraform.tfstate*
ops/infrastructure/.terraform
ops/infrastructure/terraform.tfstate*
ops/image/image-id
ops/image/nomad-image-id
ops/prod-backup/
.idea
*.iml
Expand Down
65 changes: 65 additions & 0 deletions doc/adr/0017-use-nomad-for-deployment.md
@@ -0,0 +1,65 @@
# Use Nomad For Deployment

## Status

Accepted

## Context

cljdoc's deployment story has been simplistic but effective. To recap:

- During CI a zip file is pushed to S3 that contains all files to run the application
- On the live server there is systemd service that will download an archive and run it. The
version of the downloaded archive is specified via a file on the server.

Updating simply required updating a file on the server and restarting the service.

The issue with this approach however is that every time a new release was pushed to the server
the restart of the systemd service would incur up to a minute of downtime. While this generally
isn't a huge deal it discourages certain development practices that may be desirable such as
Continuous Deployment.

Our existing deployment setup (and tools used by it) are poorly equipped to handle this kind of
deployment scenario. A large amount of bash scripts would be required to start a new cljdoc server,
wait for it to become available, update nginx's upstream port, and kill old cljdoc server instances
in a repeatable, automated manner. Likely these bash scripts would be error-prone and turn into
something nobody likes to touch.

## Decision

Implement a *canary deploy* mechanism for the cljdoc server application using
[Nomad](https://nomadproject.io) and [Traefik](https://traefik.io).

While both of these tools are probably aimed at much more complex workloads they provide the
following benefits over the existing systemd/Nginx setup:

- Automatic SSL certificates via Lets Encrypt
- Declarative specification of jobs and their desired update semantics
- APIs to schedule new jobs and cycle old/new deployments
- Health checks to verify new deployments work as expected
- Machine images become much simpler since they only need Nomad, Consul, Docker

This simplifies a lot of cljdoc's operational tasks while also enabling Continuous Deployment.

## Consequences

#### Inefficient Resource Allocation

The way Nomad [handles canary deployments](https://www.nomadproject.io/guides/operating-a-job/update-strategies/blue-green-and-canary-deployments.html) requires that there are sufficient resources available to run two sets of tasks side by side.

This results in instances only operating at half their capacity. In practice the cljdoc
server never ran into resource constraints so this likely won't cause any actual problems
but it's an imperfection nonetheless.

> **Note:** The scaling plan for cljdoc has always been to put a CDN in front of the backend.
#### More Complexity

Nomad and Consul are complex tools, designed for multi-instance orchestration. Compared to
shell scripts this also forces additional complexity upon developers trying to work with these
tools on their own machines.

#### Atypical Nomad Usage

Running a single node "cluster" is also an atypical usage scenario and thus may receive limited
support or improvements in the future.
3 changes: 3 additions & 0 deletions modules/deploy/README.md
@@ -0,0 +1,3 @@
# cljdoc deploy

Please see [`ops/README.adoc`](/ops/README.adoc) for details.
8 changes: 8 additions & 0 deletions modules/deploy/deps.edn
@@ -0,0 +1,8 @@
{:paths ["src" "resources"]
:deps {clj-http {:mvn/version "3.9.1"}
cli-matic {:mvn/version "0.2.10"}
aero {:mvn/version "1.1.3"}
cheshire/cheshire {:mvn/version "5.8.1"}
org.clojure/tools.logging {:mvn/version "0.4.1"}
spootnik/unilog {:mvn/version "0.7.22"},
com.jcraft/jsch {:mvn/version "0.1.55"}}}
69 changes: 69 additions & 0 deletions modules/deploy/resources/cljdoc.jobspec.edn
@@ -0,0 +1,69 @@
{:Job
{:Datacenters ["dc1"],
:ID "cljdoc",
:Name "cljdoc",
:TaskGroups
[{:Count 1,
:Name "cljdoc",
:RestartPolicy {:Attempts 2,
:Delay 15000000000,
:Interval 1800000000000,
:Mode "fail"},
:Tasks
[{:Artifacts nil,
:Config {:image #join ["cljdoc/cljdoc:" #cljdoc.deploy/opt :docker-tag]
:port_map [{:http 8000}],
:volumes ["secrets:/etc/cljdoc"
"/data/cljdoc:/var/cljdoc"]},
:Driver "docker",
:Env {:CLJDOC_SECRETS "/etc/cljdoc/secrets.edn"
:CLJDOC_PROFILE "prod"},
:KillSignal "",
:Name "backend",
:Resources
{:CPU 800,
:MemoryMB 848,
:Networks [{:DynamicPorts [{:Label "http", :Value 0}],
:MBits 10}]},
:Services [{:Name "cljdoc",
:PortLabel "http",
:Checks [{:Name "alive"
:PortLabel "http"
:Interval #nomad/seconds 10
:Timeout #nomad/seconds 2
:Type "tcp"}]
:Tags
["traefik.tags=cljdoc"
"traefik.frontends.blue.rule=PathPrefix:/"
"traefik.frontends.blue.entryPoints=http,https"]}],
:Templates [{:DestPath "secrets/secrets.edn",
:EmbeddedTmpl "{{key \"config/cljdoc/secrets-edn\"}}"}]}],
:Update {:Canary 1
:MaxParallel 1
:HealthyDeadline #nomad/seconds 180
:MinHealthyTime #nomad/seconds 10}}
{:Name "lb",
:Tasks
[{:Config {:image "traefik:1.7.4-alpine"
:network_mode "host"
:port_map [{:api 8080, :http 80, :https 443}]
:volumes ["local:/etc/traefik"
"/data/traefik:/data"]}
:Driver "docker"
:Name "traefik"
:Resources
{:CPU 100
:MemoryMB 128
:Networks [{:MBits 10
:ReservedPorts [{:Label "api", :Value 8080}
{:Label "http", :Value 80}
{:Label "https", :Value 443}]}]}
:Services [{:Name "traefik"
:PortLabel "http"
:Checks [{:Name "alive"
:Interval #nomad/seconds 10
:Timeout #nomad/seconds 2
:Type "tcp"}]}]
:Templates [{:DestPath "local/traefik.toml"
:EmbeddedTmpl "{{key \"config/traefik-toml\"}}"}]}]}]
:Type "service"}}
5 changes: 5 additions & 0 deletions modules/deploy/resources/secrets.edn
@@ -0,0 +1,5 @@
{:circle-ci {:api-token #env! CIRCLE_API_TOKEN
:builder-project #env! CIRCLE_BUILDER_PROJECT}
:sentry {:dsn #env! SENTRY_DSN}
:telegram {:bot-token #env! TELEGRAM_BOT_TOKEN
:chat-id #env! TELEGRAM_CHAT_ID}}
40 changes: 40 additions & 0 deletions modules/deploy/resources/traefik.toml
@@ -0,0 +1,40 @@
[accessLog]
filePath = "/data/access.log"

[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
compress = true
address = ":443"
[entryPoints.https.tls]
[entryPoints.https.redirect]
regex = "^https://.*cljdoc.xyz/(.*)"
replacement = "https://.*cljdoc.org/$1"
permanent = true

[acme]
email = "martinklepsch@googlemail.com"
storage = "/data/acme.json"
entryPoint = "https"
onHostRule = true
[acme.tlsChallenge]

[[acme.domains]]
main = "test2.cljdoc.org"
[[acme.domains]]
main = "test2.cljdoc.xyz"
[[acme.domains]]
main = "cljdoc.org"
[[acme.domains]]
main = "cljdoc.xyz"

[api]
[ping]

[consulCatalog]
prefix = "traefik"
constraints = ["tag==cljdoc"]
watch = true

0 comments on commit 64faec3

Please sign in to comment.