This document is a guide for people who want to run Riak. It discusses downloading, installing, configuring, and running Riak, as well as basic client interaction.
As always, questions are welcome on the Riak mailing list, riak-users@basho.com. The latest version of this document, and other Riak information is available at http://wiki.basho.com/.
Riak can be downloaded as a pre-built, binary release for many popular platforms, or as source code, ready to build on most platforms supporting Erlang releases R14B02 and later.
Riak can be downloaded in a binary form negating the need to to install the latest Erlang and compile from source. The binary distribution of Riak doesn’t include innostore. Innostore must be compiled and installed manually. You can find out more about innostore at http://wiki.basho.com/Setting-Up-Innostore.html.
Currently Riak is offered in binary form for Solaris/OpenSolaris, Linux systems based on RPM and Deb, and as a compiled tarball for OS X.
Riak can be installed in binary form on many RPM based systems, the current target platforms are RHEL and CentOS. Riak can be download in both 64bit and 32bit binaries in rpm format here: http://downloads.basho.com/riak/CURRENT/
Note: On 32bit systems you may need to disable SE Linux if it is enabled.
Riak can be installed in binary form on many systems that use the Debian packaging system, the current target platforms are Ubuntu and Debian. Riak can be download in both 64bit and 32bit binaries in deb format here: http://downloads.basho.com/riak/CURRENT/
Riak can be downloaded and compiled from a source tarball for OS X. You can download the tarball here: http://downloads.basho.com/riak/CURRENT/
To build Riak from source, you will need Erlang/OTP version R14B02 or later installed. Erlang is available at http://erlang.org/.
The source for tagged Riak releases is available as tarballs at https://github.com/basho/riak/tarball/riak-0.14.2. Once downloaded, unpack the tarball in a convenient location.
If you prefer to follow the latest Riak development, use git to clone the public repository:
git clone https://github.com/basho/riak
Building Riak should be as simple as running make rel
in the top
level of the source directory.
$ make rel ./rebar compile generate ==> mochiweb (compile) Compiled src/mochifmt_records.erl Compiled src/mochifmt_std.erl Compiled src/mochihex.erl ...snipped.. ==> riak (compile) Compiling src/riakserver.proto Compiled src/gen_nb_server.erl Compiled src/gen_server2.erl ...snipped... ==> rel (generate) $
(The ...snipped...
lines represent several lines of similar
output, removed for display in this document.)
If no errors are printed, Riak was built successfully. See the Troubleshooting section of this document for help with build errors.
This process created an Erlang “embedded node” in rel/riak. You should now be able to run Riak from that directory, or copy the directory to any other path on this machine, or to any other machine with matching architecture. See the Installation section for more details.
If you downloaded a pre-built, binary release of Riak, or if you have made it through building the release from source, you should have an Erlang embedded node ready to run Riak in-place. No further installation is needed.
To run Riak on other machines, simply copy the entire embedded node directory to those machines. See the Configuration section for details about altering configurations for each machine.
Riak will run on a wide variety of hardware. Determining the shape of hardware your use case requires is best done by benchmarking a development system with sample data. However, a few guidelines may help narrow the playing field.
In order to provide fault tolerance, Riak stores multiple copies of each piece of data in the cluster. The number of copies is determined by the “N-value” with which each piece of data is stored. So, as a minimum bound, the cluster, as a whole, must have at least N times the average size of a piece of data (plus a little overhead) bytes of available disk space. For example, 1 million objects, average size 1 kilobyte, stored with the default N-value of 3, would require a minimum of 3 to 4 gigabytes of disk space in the cluster.
Every node in the cluster is considered equal, so data is spread across the cluster fairly evenly. This means that each node in the cluster can expect to hold an amount of data equal to the total amount of data in the cluster (N*average object size) divided by the number of nodes in the cluster. So, in the same example as before, in a cluster storing 4 gigabytes of data total, each of four nodes could expect to store 1 gigabyte of data.
Riak is not, as yet, aware of multiple disks per machine except when using the Innostore backend. When using Innostore, or when doing any other logging on a Riak node, storing the log on a different disk than the data will dramatically increase performance, as the multiple processes accessing the disk will not compete for seek time.
In general, it’s a good idea to use smaller disks, operating at higher RPM, instead of larger, slower disks. This will help minimize seek time when disk access is necessary.
More memory is always better, as usual. Plan 1 gigabyte of memory for general Riak operation. Beyond that, the ultimate goal should be to fit as much of a node’s local data in RAM as possible, because doing so will reduce the number of disk-seeks necessary to fulfill read operations. So, in the earlier example, where each node was storing 1 gigabyte of data, planning for the node to have at least 2 gigabytes of RAM for Riak to use would be a good idea.
Riak needs access to the ncurses
, libgcc
, and openssl
libraries. These are likely already in your path, but if not, you
will want to add them to the LD_LIBRARY_PATH
environment
variable.
Read performance in Riak can be improved by disabling file access time (atime) tracking for the disk or directory on which Riak’s data is stored.
On Linux, use the ‘noatime’ mount parameter in /etc/fstab
.
On Solaris using ZFS, use pfexec zfs set atime=off <pool>
.
Especially when using the Innostore backend, it may be necessary to increase the maximum allowed number of open file descriptors. Innostore currently uses one file descriptor per bucket, per partition. If you have 4 nodes in your cluster, and your cluster has 64 partitions, then each node is responsible for 16 partitions. If this same cluster stores data in 10 buckets, Innostore will use 160 (=16*10) file descriptors per node.
To set the file descriptor count on Linux or Solaris, use the
command ulimit -n <COUNT>
, where <COUNT>
is the new maximum
number of file descriptors.
To run a distributed cluster, Riak needs access to many network ports. Make sure that the following ports are open, if you are running a firewall on any of the machines in the cluster:
- 4369: Erlang Port Mapper Daemon’s (EPMD) port, for connecting Riak nodes.
- 5001-6024: Port range for the listener socket of a distributed
Erlang node. This range can be modified by specifying values for
inet_dist_listen_min
andinet_dist_listen_max
in thekernel
section of Riak’sapp.config
. The network interface on which these ports are exposed can be configured by specifying a value forinet_dist_use_interface
(see example below). - 8098: Default HTTP interface port. This port can be modified by
specifying a value for
web_port
in theriak_core
section of Riak’sapp.config
(see example below). The network interface on which this port is exposed can be configured by specifying a value forweb_ip
in theriak_core
section of Riak’sapp.config
. - 8097: Default Protocol Buffers interface port. This port can be
modified by specifying a value for
pb_port
in theriak_kv
section of Riak’sapp.config
. The network interface on which this port is exposed can be configured by specifying a value forpb_ip
in the same section (see example below).
As noted, several of these ports are configurable. The default
configuration in app.config
would look something like this:
[
{riak_core, [
{web_ip, "127.0.0.1"},
{web_port, 8098},
%% More Riak Core settings...
]},
{riak_kv, [
{pb_ip, "127.0.0.1"},
{pb_port, 8087},
%% More Riak KV settings...
]},
{kernel, [
{inet_dist_use_interface, {0, 0, 0, 0}},
{inet_dist_listen_min, 5001},
{inet_dist_listen_max, 6024}
]}
%% Other application configurations...
].
The parameters web_ip
and pb_ip
accept a string representation
of an IP address. To specify that Riak should listen on all
interfaces, use 0.0.0.0
.
The value of the inet_dist_use_interface
is a tuple of four
bytes. The example above says “listen on every interface” by
specifying an IP of 0.0.0.0
. If you wanted to specify an IP like
10.0.12.3
, you would replace the {0, 0, 0, 0}
with
{10, 0, 12, 3}
.
Parameters for the Erlang node on which Riak runs are set in the
vm.args
file in the etc
directory of the embedded Erlang node.
Most of these settings can be left at their defaults until you are
ready to tune performance.
Two settings you may be interested in right away, though, are
-name
and -setcookie
. These control the Erlang node names
(possibly host-specific), and Erlang inter-node communication
access (cluster-specific), respectively.
The format of the file is fairly loose: all lines that do not begin
with the #
character are concatentated, and passed to the erl
on the command line, as is.
More details about each of these settings can be found in the
Erlang documentation for the erl
Erlang emulator.
- -name
- the name of the Erlang node (default:
riak@127.0.0.1
)The default value,
riak@127.0.0.1
will work for running Riak locally, but for distributed (multi-node) use, the value after the@
should be changed to the IP address of the machine on which the node is running.If you have properly-configured DNS, the short-form of this name can be used (for example:
riak
). The name of the node will then beriak@Host.Domain
. - -setcookie
- the cookie of the Erlang node (default:
riak
)Erlang nodes grant or deny access based on the sharing of a previously-shared cookie. You should use the same cookie for every node in your Riak cluster, but it should be a not-easily-guessed string unique to your deployment, to prevent non-authorized access.
- -heart
- enable
heart
node monitoring (default: disabled)Heart will restart nodes automatically, should they crash. However, heart is so good at restarting nodes that it can be difficult to prevent it from doing so. Enable heart once you are sure that you wish to have the node restarted automatically on failure.
- +K
- enable kernel polling (default: true)
- +A
- number of threads in the async thread pool (default: 5)
- -env
- set host environment variables for Erlang
Riak and the Erlang applications it depends on are configured by
settings in the app.config
file in the etc
directory of the
embedded Erlang node. The format of the file is similar to
Erlang’s “.app” files:
[
{riak_kv, [
{storage_backend, riak_kv_dets_backend},
{riak_kv_dets_backend_root, "data/dets"}
%% More Riak KV settings...
]}
%% Other application configurations...
].
That is, the file starts with [
, and ends with ].
. Inside the
square brackets are comma-separated application sections of the
form {ApplictionName, [Setting1, Setting2, ...]}
. Each setting
is a 2-tuple of the form {SettingName, SettingValue}
.
#+COMMENT TODO figure out verbatim escaping: =”blah”=
ring_state_dir
- the directory on-disk in which to store the
ring state (default: =”data/dets”=)
Riak’s ring state is stored on-disk by each node, such that each node may be restarted at any time (purposely, or via automatic failover) and know what its place in the cluster was before it terminated, without needing immediate access to the rest of the cluster.
ring_creation_size
- the number of partitions to divide the
hash space into (default: 64)
By default, each Riak node will own (
ring_creation_size
)/(number of nodes in the cluster) partitions. It is generally a good idea to specify aring_creation_size
a few times the number of nodes in your cluster (e.g. specify 64-256 partitions for a 4-node cluster). This gives you room to expand the number of nodes in the cluster, without worrying about underuse due to owning too few partitions. web_ip
- the ip address on which Riak’s HTTP interface
should listen (default: =”127.0.0.1”=)
Riak’s HTTP interface will not be started if this setting is not defined.
web_port
- the port on which Riak’s HTTP interface should
listen (default:
8098
)Riak’s HTTP interface will not be started if this setting is not defined.
default_bucket_props
- properties to give each bucket, by
default
Properties in this list will override the hardcoded defaults in riak_core_bucket:defaults/0. This setting is the best way to set things like:
- the default N-value for Riak objects (
n_val
) - whether or not siblings are allowed (
allow_mult
) - the function for extracting links from objects (
linkfun
)
- the default N-value for Riak objects (
handoff_port
- TCP port number for the handoff listener (default: 8099)
handoff_concurrency
- Number of vnode, per physical node, allowed to perform handoff at once.
raw_name
- the base of the path in the URL exposing Riak’s
HTTP interface (default: =”riak”=)
The default value will expose data at
/riak/Bucket/Key
. For example, changing this setting to =”bar”= would expose the interface at/bar/Bucket/Key
. pb_ip
- the IP address that the Riak Protocol Buffers interface
will bind to (default:
127.0.0.1
)If this is undefined, the interface will not run.
pb_port
- the TCP port that the Riak Protocol Buffers interface
will bind to (default:
8087
) storage_backend
- module name of the storage backend that
Riak should use (default:
riak_kv_bitcask_backend
)The storage format Riak uses is configurable. Riak will refuse to start if no storage backend is specified.
Available backends, and their additional configuration options are:
riak_kv_dets_backend
- data is stored in DETS files
riak_kv_dets_backend_root
- root directory where DETS files are stored (default: “data/dets”)
riak_kv_ets_backend
- data is stored in ETS tables (in-memory)
riak_kv_gb_trees_backend
- data is stored in Erlang gb_trees (in-memory)
riak_kv_fs_backend
- data is stored in binary files on the
filesystem
riak_kv_fs_backend_root
- root directory where files are stored
riak_kv_multi_backend
- enables storing data for different
buckets in different backends
Specify the backend to use for a bucket with
riak_core_bucket:set_bucket(BucketName, [{backend, BackendName}])
multi_backend_default
- default backend to use if
none is specified for a bucket (one of the
BackendName
atoms specified in themulti_backend
setting) multi_backend
- list of backends to provide
Format of each backend specification is
{BackendName, BackendModule, BackendConfig}
, whereBackendName
is any atom,BackendModule
is the name of the Erlang module implementing the backend (the same values you would provide asstorage_backend
settings), andBackendConfig
is a parameter that will be passed to thestart/2
function of the backend module.
riak_kv_cache_backend
- a backend that behaves as an
LRU-with-timed-expiry cache
riak_kv_cache_backend_memory
- maximum amount of memory to allocate, in megabytes (default: 100)
riak_kv_cache_backend_ttl
- amount by which to extend an object’s expiry lease on each access, in seconds (default: 600)
riak_kv_cache_backend_max_ttl
- maximum allowed lease time (default: 3600)
add_paths
- a list of paths to add to the Erlang code path
This setting is especially useful for allowing Riak to use external modules during map/reduce queries.
riak_kv_stat
- enable the statistics-aggregator (default: true)
mapred_name
- the URL to submit map/reduce requests to
(default:
mapred
) js_vm_count
- the number of Javascript virtual machines to start (default: 8)
js_max_vm_mem
- the maximum amount of memory, in megabytes, allocated to the Javascript VMs (default: 8)
js_thread_stack
- the maximum amount of thread stack, in megabyes, allocated to the Javascript VMs (default: 16)
js_source_dir
- location from which Riak will load user defined JavaScript source files (default: unset)
If you are going to be rebuilding Riak often, you will want to edit
the vm.args
and app.config
files in the rel/files
directory. The copies of those files in the release (embedded
node) directory will be overwritten by the files in the files
directory when a make rel
or rebar generate
command is issued.
Riak is controlled using the riak
and riak-admin
scripts in the
bin
directory of the release.
This script is the primary interface for starting and stopping the Riak server. It takes one parameter, the command to execute:
$ bin/riak COMMAND
Available commands are:
- console
- start a Riak node in the foreground, which the console/Erlang shell attached
- start
- start a Riak node in the background (daemonized)
Running
start
will print an warning if the Riak node is already running. - restart
- restart the Riak node
- attach
- attach a console to a daemonized Riak node
- ping
- check whether or not the Riak node is alive
The script should print out
pong
if it finds a live Riak node, or an error about not responding to pings if it does not. - stop
- stop a running Riak node
If you have a shell connected to the node, you can also use the
q()
command.(riak@example.com)1> q().
This script provides access to general administration of the Riak server. The Riak node should be running before using the riak-admin script.
Much like the riak
script, riak-admin
expects a command, plus
options on the command line.
$ bin/riak-admin COMMAND [OPTIONS]
Available commands are:
- test
- writes and reads a Riak object, to test basic
functionality
The code for the test is in
riak:client_test/1
, if you would like to evaluate it. - join
- join a running Riak cluster
This command requires one option: the node in the running cluster to which to connect. Example:
$ bin/riak-admin join riak2@example.com
- leave
- leave a running Riak cluster
This command will remove the node from a running cluster and force handoff of the partitions the node claims. Example:
$ bin/riak-admin leave
- backup
- backup the data in the cluster to a file
This command requires three options: the node in the running cluster to which to connect , the Erlang cookie for that node, and the filename to store the backup under. Example:
$ bin/riak-admin backup riak2@example.com riak backup.dets
- restore
- restore data into a cluster from a backup file
This command expects the same parameters as
backup
. - js\_reload
- reload all Javascript virtual machines
This command will reload the Javascript virtual machines on the node where the command is executed.
$ bin/riak-admin js_reload
- wait-for-service
- check a Riak service for its current state
This command will check that a given Riak service is running and prepared to receive queries.
$ bin/riak-admin wait-for-service riak_kv riak@127.0.0.1
- ringready
- check that all Riak nodes agree on partition assignments
This command will check that all nodes in a cluster agree on the assignment of paritions. This command is useful when operating larger clusters where it may take several seconds to gossip when adding or removing nodes.
$ bin/riak-admin ringready
- transfers
- list any pending partition transfers
Provides a list of nodes with pending partition transfers (i.e. any secondary vnodes) and lists any owned vnodes that are not running. This restarts the handoff timers and should be used infrequently.
$ bin/riak-admin transfers
To start a Riak node, simply install riak (by either copying the
rel/riak directory from an existing build, or compiling with make
rel
on the new machine), and then run bin/riak start
.
The node will start in the background. To attach to the running
node’s Erlang console, run bin/riak attach
. Use Control-D to
exit the console, but leave the node running.
A single node is its own cluster. To add new nodes to a cluster,
first start a new node, just as you would for solitary operation:
bin/riak start
.
Once the new node is up, ask it to connect to the existing cluster
by running bin/riak-admin join NODE@HOST
, where NODE@HOST
is
the name of a node in the existing cluster (from the -name
argument in vm.args). You should see a message of the form “Sent
join request to NODE@HOST” (see Startup Errors if you don’t).
After the node has joined the cluster, you can verify that it has claimed partitions by attaching to a console in the cluster and requesting a copy of the claim ring:
1> {ok, R} = riak_core_ring_manager:get_my_ring(). {ok,{chstate,... 2> riak_core_ring:all_members(R). ['riak@10.0.0.1','riak@10.0.0.2','riak@10.0.0.3']
You should see the names of all the nodes in your cluster in the list returned from that command.
Re-starting a node, after it has been shut down is even easier. As
long as you haven’t removed the on-disk ringfile, you should only
need to run bin/riak start
. The startup will read the ringfile,
and automatically connect to the cluster it was part of when it
shut down.
A simple way to verify a running Riak installation is with
bin/riak-admin test
:
$ bin/riak-admin test
:
=INFO REPORT==== 25-Jan-2010::14:09:08 === Successfully completed 1 read/write cycle to 'riak@127.0.0.1'
The script attempts to write a value, and then read it back. If all goes well, you should see output similar to the example above. See Client Errors for help with error messages from this script.
Stopping a Riak node can be done at any time, simply by running
bin/riak stop
:
$ bin/riak stop ok
This halts the Riak node, but it does not change any claims on the ring. That is, the rest of the cluster still believes that the node that just shut down is still responsible for storing some slice of the cluster’s data.
To change ring ownership, such that a node is no longer responsible for storing any data, run the following on the node you want to remove:
$ bin/riak leave
The node that is leaving will start transferring data to the nodes in the cluster that have claimed the partitions it previously owned.
Read and write requests immediately start hitting the nodes that have claimed the partitions that the exiting node just gave up. This means that some reads where R=N will fail for a time, until data exists everywhere. Data will exist everywhere after handoff finishes, after a successful read repair, or after a write to all partitions responsible for a value.
Riak supports only Erlang/OTP version R14B02 and later. To check your installed OTP version:
$ erl Erlang... Eshell Vx.x.x (abort with ^G) 1> erlang:system_info(otp_release). "R14B02"
If the version printed is earlier than R14B02 (for example R13B04, or R12B), you will need to upgrade your Erlang installation before being able to build Riak from source.
If you have previously generated a release, or installed a release to the OTP library path, you will receive this error if you attempt to generate a new release of the same version. Possible resolutions are:
make relclean
to clean out therel
directoryrm -rf $ERLANG_ROOT/lib/XXX
- change the application version in the
.app
andrel/reltool.config
files
If the bin/riak
or bin/riak
refuse to connect to a node, one
of several things may be going awry:
- The node may not be running. Use
ps
to check for running instances ofbeam
, the Erlang virtual machine. The arguments to that VM should include paths to your Riak installation. - If
etc/vm.args
is using the short-form of Erlang node names (without...@IP-OR-HOSTNAME
), then DNS on the machine may be configured incorrectly. The easiest fix is to explicitly set the hostname using the long form-name NODE@IP-OR-HOSTNAME
inetc/vm.args
. - Erlang distribution cookies may not match. If you started the
Riak node with
bin/riak console
, or you are able to open a console withbin/riak attach
, check the node’s cookie witherlang:get_cookie/0
:
(riak@127.0.0.1)1> erlang:get_cookie(). riak
That cookie should match the cookie in etc/vm.args
. The
bin/riak
and bin/riak-admin
scripts should be using the same
cookie.
The startup or admin scripts may be tricked into using different
cookies by specifying an ERL_FLAGS
environment variable. If
you have specified such a variable in your shell, unset it, and
move those settings to etc/vm.args
.