Yesterday we moved to a new queue, Shopify’s delayed_job (or dj).
After trying a few different solutions in the early days, we settled on Ara Howard’s Bj. It was fine for quite a while, but some of the design decisions haven’t been working out for us lately. Bj allows you to spawn exactly one worker per machine – we want a machine dedicated to workers. Bj loads a new Rails environment for every job submitted – we want to load a new Rails environment one time only. Both of these decisions carry performance implications.

If we were to run one Bj per machine, we’d only have four workers running as GitHub consists of four, ultra-beefy app slices. Unlike most contemporary web apps, the fewer the slices we have the better – it means less machines connected to our networked file system, and less machines create less network chatter and lock contention. As some of the jobs take a while to run (60+ seconds), four workers is a very low number. We want something like 20, but we’d settle for as few as 8.
We did hack Bj to allow multiple instances to run on a machine, but that ended up being counterproductive due to design decision #2: loading a new Rails environment for each job.
See, Rails takes a while to start up. Not only do you have to load all the associated libraries, but each require statement needs to look through the entire, massive load path – a load path that includes the Rails app, Rubygems, the Rails source code, and all of our plugins. Doing this over and over, multiple times a minute, burns a lot of CPU and takes a lot of time. In some cases, the Rails load time is 99% of the entire background job’s lifetime. Spawning a whole bunch of Bjs on a single machine meant we effectively DoS’d the poor CPU.
I started working on a solution, but it was at this point we realized we were doing something wrong. These are not flaws in Bj, they are design decisions – these two ideas make Bj a pleasure to work with and perfect for simple sites. It’s working great on FamSpam. We had simply outgrown it, and hacking Bj would have been error prone and time consuming. Luckily, we had seen people praising Dj in the past and a solid recommendation from technoweenie was all we needed.
The transition took about an hour and a half – from installing the plugin to successfully running Dj on the production site, complete with local and staging trial runs (and bug fixes). Because we had changed queues so many times in the past, we were using a simple interface to submitting a job.
RockQueue meant we didn’t have to change any application code, just infrastructure. I highly recommend an abstraction like this for vendor-specific APIs that would normally be littered all throughout your app, as changing vendors can become a major pain.
Anyway, Dj lets us spawn as many workers on a machine as we want. They’re just rake tasks running a loop, after all. It deals with locking and retries in a simple way, and works much like Bj. The queue is much faster now that we don’t have to pay the Rails startup tax.
We now have a single machine dedicated to running background tasks. We’re running 20 Dj workers on it with great success. There is no science behind this number.
Since people have already started asking “why didn’t you use queue X” or “you should use queue Y,” it seems reasonable to address that: we were very happy with Bj and wanted a similar system, albeit with a few different opinions. Dj is that system. It is simple, required no research beyond the short README, works wonderfully with Rails, is fast, is hackable, solves both the queue and the worker problems, and has no external dependecies. Also, it’s hosted on GitHub!
Dj is our 5th queue. In the past we’ve used SQS, ActiveMQ, Starling, and Bj. Dj is so far my favorite.
In a future post I’ll discuss the ways in which we use (and abuse) our queue. Count on it.


Awesome, love to see good solutions to interesting problems.
I’d be interested to see what your thoughts and critiques of the other queues were (SQS, ActiveMQ, and Bj). I know I saw in a tweet about ActiveMQ specifically that it wasn’t a complete system because it left the workers entirely up to you… if I recall correctly.
Great stuff, thanks for the write up. I used a combination of the Ruby daemons gem and god for worker queues on Exceptional and Qwitter, will definitely have to look in to Dj.
Oh… no starling? Got half a minute to elaborate why you moved away from it?
It’s sort of on topic. I’ve been poking around with http://memcachedb.org/memcacheq/ for about half a day.
It’s nice to have a queue that speaks memcache, though it is a little rough around the edges, could have taken more than an hour and half to get going depending on how things went.
I’ve gotten really excited about queues recently, I’ll have to give Dj a look. Thanks for pointing it out.
I’m interested to see your comments on SQS and ActiveMQ. RabbitMQ is also interesting (and using nanite with it).
Great post! Definitely going to watch Dj if I ever need to use it.
I still quite disagree that memcached support in a job queue is good.
What github’s done that is really good for them is abstracted away from the API. That makes the technology a bit less important. dj works well for them now, but if they decide it doesn’t, they can evaluate something else without being afraid of the impact of implementing it everywhere.
Did you give Beanstalk a try as well? I’ve heard nothing but praise about it…
What don’t you guys like about Starling? I’ve been using it and it seems to work pretty well.
Great, always interesting to see more talk about queues. I too would love to see your comments on all of the previous queue systems you have tried.
Have you guys written RockQueue, can you release the source?