The scope of this comparison is to figure out how modern Ruby application servers perform against a simple Rack application.
The Ruby application computes the sum of a range of prime numbers, fetched by the Prime standard library.
The range of the first prime numbers to compute is configurable via a HTTP parameter to stretch computational time.
Ruby 2.3 version is used for all of the tests.
JRuby 9.1.2.0 is used to test the Puma application server in order to compare the threads-pool model versus the pre-forking one.
I only focused on standalone Ruby servers solutions: no external balancers and/or reverse proxies.
For the above reason i removed Thin from the pack, since it does not include a balancer for the spawned processes.
The pack includes:
Puma is a concurrent application server crafted by Evan Phoenix.
The original idea from Mongrel HTTP Parser was extended to make it compatible with Rack-era.
Puma offers the threads-pool and the pre-forking models to grant parallelism on both MRI and JRuby.
bundle exec puma -w 7 --preload
jruby --server -S bundle exec puma
Phusion Passenger is the only Ruby application server existing as a commercial solution (Enterprise version).
Passenger supports both the pre-forking and threads-pool models, the latter is only available for the commercial version (not tested).
Pre-forking automatically spawn a new process on demand (no need to specify the number of workers).
bundle exec passenger start -p 9292
Unicorn is an application server using the pre-forking processes model to elegantly delegate most of the load balancing to the underlaying operating system.
It has been proved to be a reliable deployment option for large Rails application (e.g. Github).
bundle exec unicorn -c config/unicorn.rb
I registered these benchmarks with a MacBook PRO 15 late 2011 having these specs:
- OSX El Captain
- 2,2 GHz Intel Core i7 (4 cores)
- 8 GB 1333 MHz DDR3
I measured memory peak consumption by using Xcode's Instruments.
I used wrk as the loading tool.
I measured each application server three times, picking the best lap.
The following script command is used:
wrk -t 4 -c 100 -d 30s --timeout 2000 http://127.0.0.1:9292/?count=1000
App server | Throughput (req/s) | Latency in ms (avg/stdev/max) | RAM peak (MB) |
---|---|---|---|
Unicorn | 548.71 | 41.66/24.76/207.39 | ~183 |
Passenger | 10036.23 | 9.95/1.35/36.67 | ~138 |
Puma (MRI) | 27442.68 | 3.43/1.82/73.06 | ~226 |
Puma (JVM) | 30372.77 | 0.51/0.11/9.83 | 531.69 |
No crash was registered during the benchmarks.
When HTTP pipe-lining is enabled Puma outperforms other application servers by a large margin.
Passenger was simply not able to perform on par with Puma, although it offers better latency.
Unicorn seems to not support HTTP keep alive option in standalone mode: that's why its throughput is so disappointing.
Memory consumption seems to be inversely proportional to throughput: Passenger and Unicorn are the less memory-hungry application servers, followed by Puma MRI and, with a large gap, Puma JVM.
All of the application servers, but for Unicorn, depends on the Rack gem.
Puma and Passenger have no other runtime dependencies.
Passenger could run in production without any particular changes. Integration with both Nginx and Apache is a breeze thanks to the wizard installation.
Passenger provides commands to start and stop the server, while Puma relies on a separate bin (pumactl).
Unicorn configuration is the more hardcore of the bucket: it explicitly demands for a configuration file, while the rest of the pack can be configured directly by command line.
Puma on JVM proved to be the fastest of the tested solutions, although MRI implementation is also very close in throughput.
JRuby latency is also better than MRI, despite JVM confirmed to be a memory-hungry piece of software.