Skip to content

Latest commit

 

History

History
793 lines (564 loc) · 29.4 KB

admin.org

File metadata and controls

793 lines (564 loc) · 29.4 KB

Riak Administration Guide

Overview

This document is a guide for people who want to run Riak. It discusses downloading, installing, configuring, and running Riak, as well as basic client interaction.

As always, questions are welcome on the Riak mailing list, riak-users@basho.com. The latest version of this document, and other Riak information is available at http://wiki.basho.com/.

Downloading Riak

Riak can be downloaded as a pre-built, binary release for many popular platforms, or as source code, ready to build on most platforms supporting Erlang releases R14B02 and later.

Binary Releases

Overview

Riak can be downloaded in a binary form negating the need to to install the latest Erlang and compile from source. The binary distribution of Riak doesn’t include innostore. Innostore must be compiled and installed manually. You can find out more about innostore at http://wiki.basho.com/Setting-Up-Innostore.html.

Currently Riak is offered in binary form for Solaris/OpenSolaris, Linux systems based on RPM and Deb, and as a compiled tarball for OS X.

RPM Based Linux Distributions

Riak can be installed in binary form on many RPM based systems, the current target platforms are RHEL and CentOS. Riak can be download in both 64bit and 32bit binaries in rpm format here: http://downloads.basho.com/riak/CURRENT/

Note: On 32bit systems you may need to disable SE Linux if it is enabled.

Deb Based Linux Distributions

Riak can be installed in binary form on many systems that use the Debian packaging system, the current target platforms are Ubuntu and Debian. Riak can be download in both 64bit and 32bit binaries in deb format here: http://downloads.basho.com/riak/CURRENT/

Mac OS X

Riak can be downloaded and compiled from a source tarball for OS X. You can download the tarball here: http://downloads.basho.com/riak/CURRENT/

Source Releases

Prerequisites

To build Riak from source, you will need Erlang/OTP version R14B02 or later installed. Erlang is available at http://erlang.org/.

Obtaining Riak Source

The source for tagged Riak releases is available as tarballs at https://github.com/basho/riak/tarball/riak-0.14.2. Once downloaded, unpack the tarball in a convenient location.

If you prefer to follow the latest Riak development, use git to clone the public repository:

git clone https://github.com/basho/riak

Building Riak from Source

Building Riak should be as simple as running make rel in the top level of the source directory.

$ make rel
./rebar compile generate 
==> mochiweb (compile)
Compiled src/mochifmt_records.erl
Compiled src/mochifmt_std.erl
Compiled src/mochihex.erl
...snipped..
==> riak (compile)
Compiling src/riakserver.proto
Compiled src/gen_nb_server.erl
Compiled src/gen_server2.erl
...snipped...
==> rel (generate)
$

(The ...snipped... lines represent several lines of similar output, removed for display in this document.)

If no errors are printed, Riak was built successfully. See the Troubleshooting section of this document for help with build errors.

This process created an Erlang “embedded node” in rel/riak. You should now be able to run Riak from that directory, or copy the directory to any other path on this machine, or to any other machine with matching architecture. See the Installation section for more details.

Installation

If you downloaded a pre-built, binary release of Riak, or if you have made it through building the release from source, you should have an Erlang embedded node ready to run Riak in-place. No further installation is needed.

To run Riak on other machines, simply copy the entire embedded node directory to those machines. See the Configuration section for details about altering configurations for each machine.

Configuration

Capacity Planning

Riak will run on a wide variety of hardware. Determining the shape of hardware your use case requires is best done by benchmarking a development system with sample data. However, a few guidelines may help narrow the playing field.

Disk Space

In order to provide fault tolerance, Riak stores multiple copies of each piece of data in the cluster. The number of copies is determined by the “N-value” with which each piece of data is stored. So, as a minimum bound, the cluster, as a whole, must have at least N times the average size of a piece of data (plus a little overhead) bytes of available disk space. For example, 1 million objects, average size 1 kilobyte, stored with the default N-value of 3, would require a minimum of 3 to 4 gigabytes of disk space in the cluster.

Every node in the cluster is considered equal, so data is spread across the cluster fairly evenly. This means that each node in the cluster can expect to hold an amount of data equal to the total amount of data in the cluster (N*average object size) divided by the number of nodes in the cluster. So, in the same example as before, in a cluster storing 4 gigabytes of data total, each of four nodes could expect to store 1 gigabyte of data.

Riak is not, as yet, aware of multiple disks per machine except when using the Innostore backend. When using Innostore, or when doing any other logging on a Riak node, storing the log on a different disk than the data will dramatically increase performance, as the multiple processes accessing the disk will not compete for seek time.

In general, it’s a good idea to use smaller disks, operating at higher RPM, instead of larger, slower disks. This will help minimize seek time when disk access is necessary.

Memory (RAM)

More memory is always better, as usual. Plan 1 gigabyte of memory for general Riak operation. Beyond that, the ultimate goal should be to fit as much of a node’s local data in RAM as possible, because doing so will reduce the number of disk-seeks necessary to fulfill read operations. So, in the earlier example, where each node was storing 1 gigabyte of data, planning for the node to have at least 2 gigabytes of RAM for Riak to use would be a good idea.

Operating System Configuration

Libary Paths

Riak needs access to the ncurses, libgcc, and openssl libraries. These are likely already in your path, but if not, you will want to add them to the LD_LIBRARY_PATH environment variable.

noatime

Read performance in Riak can be improved by disabling file access time (atime) tracking for the disk or directory on which Riak’s data is stored.

On Linux, use the ‘noatime’ mount parameter in /etc/fstab.

On Solaris using ZFS, use pfexec zfs set atime=off <pool>.

Max File Descriptors

Especially when using the Innostore backend, it may be necessary to increase the maximum allowed number of open file descriptors. Innostore currently uses one file descriptor per bucket, per partition. If you have 4 nodes in your cluster, and your cluster has 64 partitions, then each node is responsible for 16 partitions. If this same cluster stores data in 10 buckets, Innostore will use 160 (=16*10) file descriptors per node.

To set the file descriptor count on Linux or Solaris, use the command ulimit -n <COUNT>, where <COUNT> is the new maximum number of file descriptors.

Network Ports

To run a distributed cluster, Riak needs access to many network ports. Make sure that the following ports are open, if you are running a firewall on any of the machines in the cluster:

  • 4369: Erlang Port Mapper Daemon’s (EPMD) port, for connecting Riak nodes.
  • 5001-6024: Port range for the listener socket of a distributed Erlang node. This range can be modified by specifying values for inet_dist_listen_min and inet_dist_listen_max in the kernel section of Riak’s app.config. The network interface on which these ports are exposed can be configured by specifying a value for inet_dist_use_interface (see example below).
  • 8098: Default HTTP interface port. This port can be modified by specifying a value for web_port in the riak_core section of Riak’s app.config (see example below). The network interface on which this port is exposed can be configured by specifying a value for web_ip in the riak_core section of Riak’s app.config.
  • 8097: Default Protocol Buffers interface port. This port can be modified by specifying a value for pb_port in the riak_kv section of Riak’s app.config. The network interface on which this port is exposed can be configured by specifying a value for pb_ip in the same section (see example below).

As noted, several of these ports are configurable. The default configuration in app.config would look something like this:

[
 {riak_core, [
         {web_ip, "127.0.0.1"},
         {web_port, 8098},

         %% More Riak Core settings...
        ]},
 {riak_kv, [
         {pb_ip, "127.0.0.1"},
         {pb_port, 8087},

         %% More Riak KV settings...
         ]},
 {kernel, [
           {inet_dist_use_interface, {0, 0, 0, 0}},
           {inet_dist_listen_min, 5001},
           {inet_dist_listen_max, 6024}
          ]}
 %% Other application configurations...
].

The parameters web_ip and pb_ip accept a string representation of an IP address. To specify that Riak should listen on all interfaces, use 0.0.0.0.

The value of the inet_dist_use_interface is a tuple of four bytes. The example above says “listen on every interface” by specifying an IP of 0.0.0.0. If you wanted to specify an IP like 10.0.12.3, you would replace the {0, 0, 0, 0} with {10, 0, 12, 3}.

vm.args

Parameters for the Erlang node on which Riak runs are set in the vm.args file in the etc directory of the embedded Erlang node. Most of these settings can be left at their defaults until you are ready to tune performance.

Two settings you may be interested in right away, though, are -name and -setcookie. These control the Erlang node names (possibly host-specific), and Erlang inter-node communication access (cluster-specific), respectively.

The format of the file is fairly loose: all lines that do not begin with the # character are concatentated, and passed to the erl on the command line, as is.

More details about each of these settings can be found in the Erlang documentation for the erl Erlang emulator.

Erlang Runtime Configuration Options

-name
the name of the Erlang node (default: riak@127.0.0.1)

The default value, riak@127.0.0.1 will work for running Riak locally, but for distributed (multi-node) use, the value after the @ should be changed to the IP address of the machine on which the node is running.

If you have properly-configured DNS, the short-form of this name can be used (for example: riak). The name of the node will then be riak@Host.Domain.

-setcookie
the cookie of the Erlang node (default: riak)

Erlang nodes grant or deny access based on the sharing of a previously-shared cookie. You should use the same cookie for every node in your Riak cluster, but it should be a not-easily-guessed string unique to your deployment, to prevent non-authorized access.

-heart
enable heart node monitoring (default: disabled)

Heart will restart nodes automatically, should they crash. However, heart is so good at restarting nodes that it can be difficult to prevent it from doing so. Enable heart once you are sure that you wish to have the node restarted automatically on failure.

+K
enable kernel polling (default: true)
+A
number of threads in the async thread pool (default: 5)
-env
set host environment variables for Erlang

app.config

Riak and the Erlang applications it depends on are configured by settings in the app.config file in the etc directory of the embedded Erlang node. The format of the file is similar to Erlang’s “.app” files:

[
 {riak_kv, [
         {storage_backend, riak_kv_dets_backend},
         {riak_kv_dets_backend_root, "data/dets"}

         %% More Riak KV settings...
        ]}
 %% Other application configurations...
].

That is, the file starts with [, and ends with ].. Inside the square brackets are comma-separated application sections of the form {ApplictionName, [Setting1, Setting2, ...]}. Each setting is a 2-tuple of the form {SettingName, SettingValue}.

List of Riak Core Configuration Variables

#+COMMENT TODO figure out verbatim escaping: =”blah”=

ring_state_dir
the directory on-disk in which to store the ring state (default: =”data/dets”=)

Riak’s ring state is stored on-disk by each node, such that each node may be restarted at any time (purposely, or via automatic failover) and know what its place in the cluster was before it terminated, without needing immediate access to the rest of the cluster.

ring_creation_size
the number of partitions to divide the hash space into (default: 64)

By default, each Riak node will own (ring_creation_size)/(number of nodes in the cluster) partitions. It is generally a good idea to specify a ring_creation_size a few times the number of nodes in your cluster (e.g. specify 64-256 partitions for a 4-node cluster). This gives you room to expand the number of nodes in the cluster, without worrying about underuse due to owning too few partitions.

web_ip
the ip address on which Riak’s HTTP interface should listen (default: =”127.0.0.1”=)

Riak’s HTTP interface will not be started if this setting is not defined.

web_port
the port on which Riak’s HTTP interface should listen (default: 8098)

Riak’s HTTP interface will not be started if this setting is not defined.

default_bucket_props
properties to give each bucket, by default

Properties in this list will override the hardcoded defaults in riak_core_bucket:defaults/0. This setting is the best way to set things like:

  • the default N-value for Riak objects (n_val)
  • whether or not siblings are allowed (allow_mult)
  • the function for extracting links from objects (linkfun)
handoff_port
TCP port number for the handoff listener (default: 8099)
handoff_concurrency
Number of vnode, per physical node, allowed to perform handoff at once.

List of Riak KV Configuration Variables

raw_name
the base of the path in the URL exposing Riak’s HTTP interface (default: =”riak”=)

The default value will expose data at /riak/Bucket/Key. For example, changing this setting to =”bar”= would expose the interface at /bar/Bucket/Key.

pb_ip
the IP address that the Riak Protocol Buffers interface will bind to (default: 127.0.0.1)

If this is undefined, the interface will not run.

pb_port
the TCP port that the Riak Protocol Buffers interface will bind to (default: 8087)
storage_backend
module name of the storage backend that Riak should use (default: riak_kv_bitcask_backend)

The storage format Riak uses is configurable. Riak will refuse to start if no storage backend is specified.

Available backends, and their additional configuration options are:

riak_kv_dets_backend
data is stored in DETS files
riak_kv_dets_backend_root
root directory where DETS files are stored (default: “data/dets”)
riak_kv_ets_backend
data is stored in ETS tables (in-memory)
riak_kv_gb_trees_backend
data is stored in Erlang gb_trees (in-memory)
riak_kv_fs_backend
data is stored in binary files on the filesystem
riak_kv_fs_backend_root
root directory where files are stored
riak_kv_multi_backend
enables storing data for different buckets in different backends

Specify the backend to use for a bucket with riak_core_bucket:set_bucket(BucketName, [{backend, BackendName}])

multi_backend_default
default backend to use if none is specified for a bucket (one of the BackendName atoms specified in the multi_backend setting)
multi_backend
list of backends to provide

Format of each backend specification is {BackendName, BackendModule, BackendConfig}, where BackendName is any atom, BackendModule is the name of the Erlang module implementing the backend (the same values you would provide as storage_backend settings), and BackendConfig is a parameter that will be passed to the start/2 function of the backend module.

riak_kv_cache_backend
a backend that behaves as an LRU-with-timed-expiry cache
riak_kv_cache_backend_memory
maximum amount of memory to allocate, in megabytes (default: 100)
riak_kv_cache_backend_ttl
amount by which to extend an object’s expiry lease on each access, in seconds (default: 600)
riak_kv_cache_backend_max_ttl
maximum allowed lease time (default: 3600)
add_paths
a list of paths to add to the Erlang code path

This setting is especially useful for allowing Riak to use external modules during map/reduce queries.

riak_kv_stat
enable the statistics-aggregator (default: true)
mapred_name
the URL to submit map/reduce requests to (default: mapred)
js_vm_count
the number of Javascript virtual machines to start (default: 8)
js_max_vm_mem
the maximum amount of memory, in megabytes, allocated to the Javascript VMs (default: 8)
js_thread_stack
the maximum amount of thread stack, in megabyes, allocated to the Javascript VMs (default: 16)
js_source_dir
location from which Riak will load user defined JavaScript source files (default: unset)

Rebar Overlays

If you are going to be rebuilding Riak often, you will want to edit the vm.args and app.config files in the rel/files directory. The copies of those files in the release (embedded node) directory will be overwritten by the files in the files directory when a make rel or rebar generate command is issued.

Running Riak

Riak is controlled using the riak and riak-admin scripts in the bin directory of the release.

The riak script

This script is the primary interface for starting and stopping the Riak server. It takes one parameter, the command to execute:

$ bin/riak COMMAND

Available commands are:

console
start a Riak node in the foreground, which the console/Erlang shell attached
start
start a Riak node in the background (daemonized)

Running start will print an warning if the Riak node is already running.

restart
restart the Riak node
attach
attach a console to a daemonized Riak node
ping
check whether or not the Riak node is alive

The script should print out pong if it finds a live Riak node, or an error about not responding to pings if it does not.

stop
stop a running Riak node

If you have a shell connected to the node, you can also use the q() command.

(riak@example.com)1> q().
    

The riak-admin script

This script provides access to general administration of the Riak server. The Riak node should be running before using the riak-admin script.

Much like the riak script, riak-admin expects a command, plus options on the command line.

$ bin/riak-admin COMMAND [OPTIONS]

Available commands are:

test
writes and reads a Riak object, to test basic functionality

The code for the test is in riak:client_test/1, if you would like to evaluate it.

join
join a running Riak cluster

This command requires one option: the node in the running cluster to which to connect. Example:

$ bin/riak-admin join riak2@example.com
    
leave
leave a running Riak cluster

This command will remove the node from a running cluster and force handoff of the partitions the node claims. Example:

$ bin/riak-admin leave
    
backup
backup the data in the cluster to a file

This command requires three options: the node in the running cluster to which to connect , the Erlang cookie for that node, and the filename to store the backup under. Example:

$ bin/riak-admin backup riak2@example.com riak backup.dets
    
restore
restore data into a cluster from a backup file

This command expects the same parameters as backup.

js\_reload
reload all Javascript virtual machines

This command will reload the Javascript virtual machines on the node where the command is executed.

$ bin/riak-admin js_reload
    
wait-for-service
check a Riak service for its current state

This command will check that a given Riak service is running and prepared to receive queries.

$ bin/riak-admin wait-for-service riak_kv riak@127.0.0.1
    
ringready
check that all Riak nodes agree on partition assignments

This command will check that all nodes in a cluster agree on the assignment of paritions. This command is useful when operating larger clusters where it may take several seconds to gossip when adding or removing nodes.

$ bin/riak-admin ringready
    
transfers
list any pending partition transfers

Provides a list of nodes with pending partition transfers (i.e. any secondary vnodes) and lists any owned vnodes that are not running. This restarts the handoff timers and should be used infrequently.

$ bin/riak-admin transfers               
    

Simple startup

To start a Riak node, simply install riak (by either copying the rel/riak directory from an existing build, or compiling with make rel on the new machine), and then run bin/riak start.

The node will start in the background. To attach to the running node’s Erlang console, run bin/riak attach. Use Control-D to exit the console, but leave the node running.

Cluster startup

A single node is its own cluster. To add new nodes to a cluster, first start a new node, just as you would for solitary operation: bin/riak start.

Once the new node is up, ask it to connect to the existing cluster by running bin/riak-admin join NODE@HOST, where NODE@HOST is the name of a node in the existing cluster (from the -name argument in vm.args). You should see a message of the form “Sent join request to NODE@HOST” (see Startup Errors if you don’t).

After the node has joined the cluster, you can verify that it has claimed partitions by attaching to a console in the cluster and requesting a copy of the claim ring:

1> {ok, R} = riak_core_ring_manager:get_my_ring().
{ok,{chstate,...
2> riak_core_ring:all_members(R).
['riak@10.0.0.1','riak@10.0.0.2','riak@10.0.0.3']

You should see the names of all the nodes in your cluster in the list returned from that command.

Re-starting a node, after it has been shut down is even easier. As long as you haven’t removed the on-disk ringfile, you should only need to run bin/riak start. The startup will read the ringfile, and automatically connect to the cluster it was part of when it shut down.

Verifying your Installation

A simple way to verify a running Riak installation is with bin/riak-admin test:

$ bin/riak-admin test

:

=INFO REPORT==== 25-Jan-2010::14:09:08 ===
Successfully completed 1 read/write cycle to 'riak@127.0.0.1'

The script attempts to write a value, and then read it back. If all goes well, you should see output similar to the example above. See Client Errors for help with error messages from this script.

Shutting down a node

Stopping a Riak node can be done at any time, simply by running bin/riak stop:

$ bin/riak stop
ok

This halts the Riak node, but it does not change any claims on the ring. That is, the rest of the cluster still believes that the node that just shut down is still responsible for storing some slice of the cluster’s data.

To change ring ownership, such that a node is no longer responsible for storing any data, run the following on the node you want to remove:

$ bin/riak leave

The node that is leaving will start transferring data to the nodes in the cluster that have claimed the partitions it previously owned.

Read and write requests immediately start hitting the nodes that have claimed the partitions that the exiting node just gave up. This means that some reads where R=N will fail for a time, until data exists everywhere. Data will exist everywhere after handoff finishes, after a successful read repair, or after a write to all partitions responsible for a value.

Client Interaction

Client Libraries

HTTP Interface

Using an HTTP Cache

Troubleshooting

Build Errors

Rebar requires at least erts 5.7.5; …

Riak supports only Erlang/OTP version R14B02 and later. To check your installed OTP version:

$ erl
Erlang...

Eshell Vx.x.x (abort with ^G)
1> erlang:system_info(otp_release).
"R14B02"

If the version printed is earlier than R14B02 (for example R13B04, or R12B), you will need to upgrade your Erlang installation before being able to build Riak from source.

ERROR: Release target directory “XXX” already exists!

If you have previously generated a release, or installed a release to the OTP library path, you will receive this error if you attempt to generate a new release of the same version. Possible resolutions are:

  • make relclean to clean out the rel directory
  • rm -rf $ERLANG_ROOT/lib/XXX
  • change the application version in the .app and rel/reltool.config files

Startup Errors

Node is not running! / Node ‘XXX’ not responding to pings.

If the bin/riak or bin/riak refuse to connect to a node, one of several things may be going awry:

  • The node may not be running. Use ps to check for running instances of beam, the Erlang virtual machine. The arguments to that VM should include paths to your Riak installation.
  • If etc/vm.args is using the short-form of Erlang node names (without ...@IP-OR-HOSTNAME), then DNS on the machine may be configured incorrectly. The easiest fix is to explicitly set the hostname using the long form -name NODE@IP-OR-HOSTNAME in etc/vm.args.
  • Erlang distribution cookies may not match. If you started the Riak node with bin/riak console, or you are able to open a console with bin/riak attach, check the node’s cookie with erlang:get_cookie/0:
(riak@127.0.0.1)1> erlang:get_cookie().
riak

That cookie should match the cookie in etc/vm.args. The bin/riak and bin/riak-admin scripts should be using the same cookie.

The startup or admin scripts may be tricked into using different cookies by specifying an ERL_FLAGS environment variable. If you have specified such a variable in your shell, unset it, and move those settings to etc/vm.args.

riak-admin join errors

Client Errors

riak-admin test errors

FAQ