Skip to content

tuulos/ringo

Repository files navigation

Ringo - Distributed key/value storage for immutable data

Ringo is an experimental, distributed, replicating key-value store based on consistent hashing and immutable data. Unlike many general-purpose databases, Ringo is designed for a specific use case: For archiving small (less than 4KB) or medium-size data items (<100MB) in real-time so that the data can survive K - 1 disk breaks, where K is the desired number of replicas, without any downtime, in a manner that scales to terabytes of data. In addition to storing, Ringo should be able to retrieve individual or small sets of data items with low latencies (<10ms) and provide a convenient on-disk format for bulk data access.

WARNING: Ringo should not be used yet as a primary storage for critical data. Due to the fact that Ringo treats all data immutable, data corruption or loss should be improbable. However, Ringo needs more testing in real-world settings before we can be reasonably sure that it works as advertised. To make this happen, feel free to try it out, adapt it to your needs, and report your experiences!

If you find Ringo interesting, you might also want to check out a paper about Amazon's Dynamo

http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

and another Dynamo-like system, Scalaris, which is also implemented in Erlang: http://code.google.com/p/scalaris/

Requirements

Backend system (required):

  • Erlang R12B or newer
  • C toolchain and autotools (for building bfile)
  • Lighttpd or other web server that supports SCGI

Test harness / Python interface (optional):

  • pkill command (Debian package procps)
  • Python
  • Python module pycurl (Debian package python-pycurl)
  • Python module cjson (Debian package python-cjson)

Directories

doc/ Documentation

ring/ Ringo backend ring/bfile High-performance replacement for the Erlang's standard file module ring/src Backend sources ring/test Test harness

ringogw/ Ringo web frontend ringogw/src Frontend sources ringogw/web Web interface ringogw/py Python interfaces for the Ringo frontend and for the Disco map/reduce framework

Compiling

cd ringo ./compile.sh

Starting Ringo

First you need to initialize a number of virtual nodes - at least one is required. A virtual node is defined by an empty directory whose name is a random 128-bit integer in hexadecimal notation.

A convenience script, create_node.sh, is provided that can be used to setup a virtual node. For instance,

ringo/create_node.sh trurl /data/ringo

sets up a new virtual node on the host trurl to the directory /data/ringo. The script uses ssh to log in to the host. It asks for a password unless key-based ssh authentication is properly set up (which is recommended). You may run this script on different nodes as many times as you like.

You need to list all hostnames that may possibly host a Ringo node in a file called ~/.hosts.erlang. For further information about this file, see an Erlang manual page at "man 3erl net_adm". For instance the following command creates the required file for the virtual node that was initialized above:

echo "'trurl'." > ~/.hosts.erlang

After a number of virtual nodes have been initialized, Ringo may be started. Again, a simple script is provided that starts up all virtual nodes on a specified host. For instance,

ringo/start_nodes.sh trurl /data/ringo

starts up all nodes on the host trurl. After a while, the ring should be up and running.

The web frontend provides a convenient way to monitor status of the system. An example configuration file for the Lighttpd web server is provided at ringo/ringogw/lighttpd.conf that communicates with the frontend process over SCGI. The following script starts up the web server and the frontend process:

ringo/ringogw/start_ringogw.sh

Now you should see the status page at http://localhost:15000. On the status page, you can click nodes on the leftmost panel to see domains that they contain. You can click a domain on the middle panel to see its replicas. By clicking a replica, you can see its status. You might need to wait for 10 seconds or so, and reload the page, to see new nodes and domains appear.

Usage

You can create domain, put keys and get keys from Ringo using simple HTTP requests. Assuming that you have started a ring as instructed above, you can create a new domain called "foobar" with the following POST request. Here curl is used to make a request but any other HTTP client would work as well:

curl -d "" http://localhost:15000/mon/data/foobar?create

You can put a new key-value pair to the domain with following POST request:

curl -d "testvalue" http://localhost:15000/mon/data/foobar/testkey

and retrieve the value given the key with a GET request as follows:

curl http://localhost:15000/mon/data/foobar/testkey

this returns all values assigned with the key "testkey". If only one value is required, the parameter ?single can be used:

curl http://localhost:15000/mon/data/foobar/testkey?single

A Python class Ringo is provided at ringo/ringogw/py/ringogw.py that encapsulates the above HTTP requests in Python function calls.

An experimental interface for Disco, an an open-source implementation of the Map/Reduce framework (http://discoproject.org), can be found at ringo/ringogw/py/ringodisco.py. This function, which implements the Disco's map reader interface, makes it possible to use data stored in Ringo as input to a Disco job. The function provides a particularly efficient way of accessing data directly from Ringo's live data files.

Running the Test Harness

Ringo comes with a set of tests that cover all main features of the system. You need to start the ringogw web frontend process to run the tests, as shown above. After the frontend has started, you can run the tests with the following command:

cd ringo/ring/; ./test-system.sh

Contact information

Bug reports, patches, comments etc. are welcome! Contact person is Ville Tuulos who can be reached at

ville.h.tuulos -a- nokia.com

or on the IRC channel #discoproject at Freenode.

Bitdeli badge