Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cljdoc ops 2.0 #248

Merged
merged 51 commits into from Dec 12, 2018
Merged

cljdoc ops 2.0 #248

merged 51 commits into from Dec 12, 2018

Conversation

martinklepsch
Copy link
Member

Initial groundwork for #184 and general revisit of cljdoc's devops tooling.

In short this PR:

  • Defines a new CentOS based image with Nomad, Consul and Docker.
  • Adds Clojure code to deploy cljdoc to Nomad via it's HTTP API
  • Uses Traefik as reverse proxy and for service discovery via Consul
  • Implements a zero-downtime swap when deploying new versions of cljdoc

Take a look at the new readme ops/README.adoc, the code responsible for deploying to Nomad and the decision record for this change.

["traefik.tags=cljdoc"
"traefik.frontends.blue.rule=Host:test.cljdoc.org"
"traefik.frontends.blue.rule=Host:test.cljdoc.xyz"
"traefik.frontends.blue.rule=PathPrefix:/"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjustments needed here for production.

@danielcompton
Copy link
Contributor

The ADR explains the motivations well. I do worry a little bit about how much extra complexity this will add. The nginx + bash scripts route has the advantage of being easier up front as it is made up of things people understand well, though you make a good point about bit-rot.

Most of cljdoc is read-only, if there was a caching CDN (or even just nginx) in front that could serve URLS out of the cache, that would seem to take away most of the downsides of more frequent deployments, though it wouldn't be as complete of a solution as what you're describing here.

workflows:
version: 2
build-and-deploy:
jobs:
- build
- prettier
- docker-deploy:
requires:
- build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a branch filter like deploy I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The used Docker tag contains the branch name if it isn't master but yeah, we probably just shouldn't push to Docker unless we're on master.

@@ -65,6 +78,15 @@ resource "aws_route53_record" "dokku" {
records = ["167.99.133.5"]
}

resource "aws_route53_record" "test_xyz" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to add a record for test.cljdoc.xyz, if it's never been used before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I used this for testing mostly, will removed again eventually.

ops/nomad.sh Show resolved Hide resolved
ops/nomad.sh Show resolved Hide resolved
@danielcompton
Copy link
Contributor

I mostly just clicked the request changes button to see if they'd changed the icon, as the UI looked a little different :). This seems pretty reasonable after looking over the code and seeing the explanation. It's a little bit terrifying seeing all of the extra things coming along for the ride (Nomad, Consul, Traefik), but I think the end goal of automatically deploying to production is a worthy one.

One other thought I had is about the database. It has been a bit awkward to do this kind of work because we have a stateful sqlite db on the application server. If we had Postgres on a separate server then we could easily recycle app servers without losing any state. Though that adds it's own set of complexity... This seems pretty good to me.

@martinklepsch
Copy link
Member Author

It's a little bit terrifying seeing all of the extra things coming along for the ride (Nomad, Consul, Traefik)

I was equally terrified when I started going down this road but especially Nomad and Consul have proven very predictable and are exceptionally well documented. I'm still a little bit worried about Traefik as it seems to have various minor bugs but it's working for now. Various people mentioned fabio but it doesn't handle SSL certs automatically and so we'd loose some of the niceties of Traefik in that regard.

One other thought I had is about the database.

This came into my mind a few times while working on this too. I still very much like the single instance setup that we have now (i.e. avoiding distributed systems problems) but if we later decide we want a separate instance with a database the Nomad/Consul setup will only make this easier.

The nginx + bash scripts route has the advantage of being easier up front as it is made up of things people understand well, though you make a good point about bit-rot.

Besides bit-rot I also found this setup much easier to test. Basically you just add another server_instance:

module "test_server" {
  source     = "./server_instance"
  # ...
}

And then run cljdoc.deploy against the IP of this instance. This is what I did with the test. instance and it allows devs to spin up a new environment in a production setting (SSL and all) in a matter of minutes.


PS. I realize that a lot of what I'm writing is fairly argumentative despite not having to convince anybody. As you might be guess I'm fairly invested in this particular approach just by the sheer amount of time I spent on it. If you (not just Daniel) think this is going in the wrong direction, make yourself heard please. I'm also available on Slack as @martinklepsch.

@martinklepsch martinklepsch force-pushed the blue-green branch 2 times, most recently from 1e122f3 to 1048ff3 Compare December 6, 2018 16:41
@danielcompton
Copy link
Contributor

PS. I realize that a lot of what I'm writing is fairly argumentative despite not having to convince anybody. As you might be guess I'm fairly invested in this particular approach just by the sheer amount of time I spent on it.

Just saw this now, I missed your earlier comment. I didn't take your comments to be argumentative, just explaining clearly why you went this route. I think it's a good choice, I'm excited to see it get into prod!

@martinklepsch martinklepsch deleted the blue-green branch March 22, 2019 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants