Skip to content

Better late than never, right? As we get ready to upgrade our servers I thought it’d be a good time to upgrade our deployment process. Currently pushing out a new version of GitHub takes upwards of 15 minutes. Ouch. My goal: one minute deploys (excluding server restart time).

We currently use Capistrano with a 400 line deploy.rb file. Engine Yard provides a handful of useful Cap tasks (in gem form) that we use along with many of the built-in features. We also use the fast_remote_cache deployment strategy and have written a handful (400 lines or so) of our own tasks to manage things like our service hooks or SVN importer.

As you may know, Capistrano keeps a releases directory where it creates timestamped versions of your app. All your daemons and processes then assume your app lives under a directory called current which is actually a symlink to the latest timestamped version of your app in releases. When you deploy a new version of your app, it’s put into a new timestamped directory under releases. After all the heavy lifting is done the current symlink is switched to it.

Which was really great. Before Git. So I went digging.

First I investigated Vlad the Deployer, the Capistrano alternative in Ruby. I like that it’s built on Rake but it seems to make the same assumptions as Capistrano. Basically both of these tools are modular and built in such a way that they work the same whether you’re using Subversion, Perforce, or Git. Which is great if you’re using SVN but unfortunate if you’re using Git.

For example, this is from Vlad’s included Git deployment strategy:

When you deploy a new copy of your app, Vlad removes the existing copy and does a full clone to get a new version. Capistrano does something similar by default but has a bundled “remote_cache” strategy that is a bit smarter: it caches the Git repo and does a fetch then a reset. It still has to then copy the updated version of your app into a timestamped directory and switch the symlink, but it’s able to cut down on time spent pulling redundant objects. It even knows about the depth option.

The next thing I looked at was Heroku’s rush. It lets you drive servers (even clusters of them) using Ruby over SSH, which looked very promising. Maybe I’d write a little git-deploy script based on it.

Unfortunately for me Rush needs to be installed on every server you’re managing. It also needs a running instance of rushd. Which makes sense – it’s a super powerful library – but that wouldn’t work for deploying GitHub.

Fabric is a library I first heard about back in February. It’s like Capistrano or Vlad but with more emphasis on being a framework/tool for remote management of servers. Easy deployment scripts are just a side effect of that mentality.

It’s very powerful and after playing with it for a while I was extremely pleased. I’ll definitely be using it in all my Python projects. However, I wasn’t looking forward to porting all our custom Capistrano tasks to Python. Also, though I love Python, we’re mostly a Ruby shop and everyone needs to be able to add, debug, and modify our deploy scripts with ease.

Playing with Fabric did inspire me, though. Capistrano is basically a tool for remote server management, too, if you think about it. We may have outgrown its ideas about deployment but I can always write my own deployment code using Capistrano’s ssh and clustering capabilities. So I did.

It turned out to be pretty easy. First I created a config/deploy directory and started splitting up the deploy.rb into smaller chunks:

$ ls -1 config/deploy
gem_eval.rb
import.rb
notify.rb
queue.rb
services.rb
settings.rb
sudo_everywhere.rb
symlinks.rb

Then I pulled them in. Careful here: Capistrano override both load and require so it’s probably best to just use load.

This separation kept the deploy.rb and each specific file small and focused.

Next I thought about how I’d do Git-based deployment. Not too different from Capistrano’s remote_cache, really. Just get rid of all the timestamp directories and have the current directory contain our clone of the Git repo. Do a fetch then reset to deploy. Rollback? No problem.

The best part is that because Engine Yard’s gemified tasks and our own code both call standard Capistrano tasks like deploy and deploy:update, we can just replace them and not change the dependent code.

Here’s what our new deploy.rb looks like. Well, the meat of it at least:

Great. I like this – very Gitty and simple. But copying and removing directories wasn’t the only slow part of our deploy process.

Every Capistrano task you run adds a bit of overhead. I don’t know exactly why, but I imagine each task opens a fresh SSH connection to the necessary servers. Maybe. Either way, the less tasks you run the better.

We were running about eight symlink related tasks during each deploy. Config files and cache directories that only live on the server need to be symlinked into the app’s directory structure after the reset. Cutting these actions down to a single task made everything much, much faster.

Here’s our symlinks.rb:

Finally, bundling CSS and JavaScript. I’d like to move us to Sprockets but we’re not using it yet and this adventure is all about speeding up our existing setup.

Since the early days we’ve been using Uladzislau Latynski’s jsmin.rb to minimize our JavaScript. Our Cap task looked something like this:

Spot the problem? We’re minimizing the JS locally, on every deploy, then uploading it to each server individually. We also do this same process for Gist’s JavaScript and the CSS (using YUI’s CSS compressor). So with N servers, this is basically happening 3N times on each deploy. Yowza.

Solution? Do the minimizing and bundling on the servers. The beefy, beefy servers:

As long as the bundle Rake tasks don’t need to load the Rails environment (which ours don’t), this is much faster.

Conclusion

We moved to a more Git-like deployment setup, cut down the number of tasks we run, and moved bundling and minimizing JS and CSS from our localhost to the server. Did it help?

As I said before, a GitHub deploy can take 15 minutes (not counting server restarts). My goal was to drop it down to 1 minute. How’d we do?

$ time cap production deploy
  * executing `production'
  * executing `deploy'
    triggering before callbacks for `deploy:update'
  * executing `notify:campfire'
  * executing `deploy:update'
  * executing `deploy:update_code'
    triggering after callbacks for `deploy:update_code'
  * executing `symlinks:make'
  * executing `deploy:bundle'
  * executing `deploy:restart'
  * executing `mongrel:restart'
  * executing `deploy:cleanup'

real	0m14.361s
user	0m2.049s
sys	0m0.560s

15 minutes down to 14 seconds. Not bad.

Have feedback on this post? Let @github know on Twitter.

Need help or found a bug? Contact us.

Something went wrong with that request. Please try again.