Created by Stephen McDonald
virtualboxing is a set of utilities for comparing timings on bulk operations across various databases in a distributed environment. It consists of routines for controlling multiple VirtualBox instances and implementations of some initial operations for MongoDB and Riak.
The benchmarking performed by virtualboxing is in no way scientific or conclusive. It was created for the purpose of exploring distributed database setups, their libraries and comparing operations.
Getting everything set up involves creating a VirtualBox VM, configuring it and then cloning it for as many instances as you'd like to use. Some manual configuration of each instance is also required. Here are the steps involved with Ubuntu used as the VM's OS:
- Create an initial VirtualBox VM
- Install Ubuntu
- Install VirtualBox Guest Additions
- Configure key-based SSH authentication
- Install MongoDB
- Install Riak
Once your base VM is set up, clone its hard disk and create a new VM for each hard disk for as many extra VMs as you'd like to use.
$ cd ~/.VirtualBox/HardDisks/
$ VBoxManage clonehd original.vdi another.vdi
You'll then need to configure the host names for Riak on each VM. This
should be as simple adding the host name as the node name in
/etc/riak/vm.args
. For example if the host's IP is 192.168.1.80:
-name riak@192.168.1.80
and as the host for HTTP and Protocol Buffers in /etc/riak/app.config
:
{http, [ {"192.168.1.80", 8098 } ]},
...
{pb_ip, "192.168.1.80" }
It's also recommended to SSH once onto each VM so that you can be prompted to add the VM to your known hosts on your host machine.
All configuration is provided via config.yml
. The main configuration
required is each of the VM names (as entered when creating the VMs in
VirtualBox Manager) and IP addresses for each VM. Here's the example
config.yml
provided:
# VirtualBox VM names mapped to their IP addresses.
vms:
Ubuntu Green: 192.168.1.80
Ubuntu Red: 192.168.1.81
Ubuntu Blue: 192.168.1.70
# SSH username for each of the VMS. SSH key authentication is assumed.
ssh-user: steve
# Number of processes to fork when benchmarking.
processes: 10
# Number of records to create PER PROCESS when filling with data.
records: 10000
# Database name to use for Mongo.
mongo-db-name: steve
Ensure you have the required Ruby libraries installed using Bundler:
$ bundle install
Actual run-time occurs via the run.rb
script:
$ ./run.rb
Whilst running, of particular interest will be the riak-admin member_status
command on the first node. Combined with watch
you can monitor the
Riak cluster as it forms and is torn down:
$ ssh steve@192.168.1.80
$ watch -n .2 riak-admin member_status
The following list of items are suggested for further exploration:
- Trial different database configurations, particularly for MongoDB which is renowned for fast yet unsafe defaults
- Use a load balancer as the entry point for connecting to the Riak cluster
- Set up MongoDB Replica Sets
- Benchmark different storage backends for Riak
- Benchmark existing test with indexes created
- Benchmark searching
- Benchmark deleting