Writing own client for Voldemort

Here is some information about creating a new client for Voldemort. Writing a client for a master-free distributed system is definitely more difficult then one that talks only to a centralized server. Nonetheless, we have done what we can to keep it from being too hard. A complete pure-python implementation is only about 300 lines of code, most of which went to xml parsing and protocol buffer object creation (see src/python/voldemort.py for the python client).

Note that there are two options. One is to write a client from scratch in the language you want to support. The other is to embed an existing client. The Java client is mature and obviously embeds naturally in any of the JVM-based languages (jruby, groovy, scala, jpython, etc.). There is a new C++ client which should be reasonably easy to embed in any of the C-based interpreters (Python, PHP, Ruby, Perl, etc.), though it is still very new so may have rough edges.

The rest of this discussion assumes you are writing a client from scratch.

First, it is worth establishing some terminology:

Routing is what we call the process of delivering a request to the appropriate Voldemort nodes. Routing can be done either on the client side or the server side. If routing is done on the client side the client must calculate the a list of the appropriate servers on which to store or retrieve a given key and then deliver/request the value to/from those nodes. If routing is done on the server side, the client just makes the request to a random node and that node handles the routing of the request. Each request comes with a should_route flag telling whether or not the server is expected to reroute the request.
Protocols. Voldemort supports two protocols: HTTP, and bare TCP/IP. Both are used in a very simple manner to pass bytes back and forth. The TCP/IP format consists of size delimited messages and the HTTP format of serialized bytes POST’d to the server. The server support for both protocols is good, but be aware that the HTTP clients in most languages are quite bad and will limit the performance of an HTTP-based implementation.
Request Formats. The actual bytes sent back and forth from the client to Voldemort is called a RequestFormat and it is pluggable. There are currently two formats in place (1) a very simple and fast custom binary format, and (2) a protocol buffers format. To add a new format you just need to implement an interface that explains how to write and read a request. The appropriate interfaces in Voldemort are RequestHandler and ClientRequestFormat. Each new protocol, in addition to using these, should use AbstractRequestFormatTest to test that its ClientRequestFormat and RequestHandler can talk to each other. Currently the server is statically configured to use only one protocol at a time. The plan is to move to a handshake model as soon as possible though where the protocol is established when the connection is created, this will allow the simultaneous support of multiple protocols. You want to use the protocol buffer format if at all possible since then you can just auto-generate appropriate code and keep your client in sync with any protocol changes without much work or debugging.

The HTTP protocol consists of a series of POST requests to the server from the client.

The socket protocol looks like this:

client connects to server by opening a socket

clients sends 3 byte protocol proposal.

server response “ok” if it knows that protocol, otherwise “no”

client sends: <size><request_bytes>

client receives: <size><response_bytes>

repeat 4 and 5 for many requests

client disconnects from the server

Here is a list of the protocol codes

Code	Description
vp0	Version 0 of the voldemort native protocol. This is what we shipped with and is retained for backwards compatibility
vp1	Version 1 of the voldemort native protocol. This added support for server-side routing.
pb0	Protocol buffers based protocol.
ad0	Protocol for the admin client. This client exposes a set of administrative APIs.

Because the routing layer in Voldemort is not trivial (it involves computing a consistent hash, doing parallel requests, tracking node state, etc.), writing a client-side routed client is not the easiest way to go, and for anything other than C is likely to under-perform server-side routing anyway so it isn’t worth it. In what follows below I assume you are using server-side routing.

Client Responsibilities

A minimal but complete client is responsible for the following things:

Bootstrapping a view of the cluster (optional)

Making requests

Resolving version conflicts

Handling node failures

Bootstrapping the client

Bootstrapping is the process by which the client finds out about which servers are available. If the client is going to be responsible for balancing requests over the servers then you will need to know which servers are available.

To get the cluster metadata is really easy you just do a GET request to a Voldemort node against the store ‘metadata’ requesting the key ‘cluster.xml’. This will give you back the XML cluster description (you can see examples in the config/ directory of the project). This xml can be parsed to find the list of all servers.

Since any particular node might not be available when your client bootstraps it is a good idea to have the client take a list of bootstrap urls to try in case the first fails.

Making requests

If you use the protocol buffers support this is really easy. You just generate code from the protocol buffer .proto file in src/proto. There is one class for each request type and each response type. You always send a VoldemortRequest and always receive the response type for the request you issued.

Using the socket protocol you use a TCP/IP connection to the server to send the serialized bytes of the protocol buffers object to the server where they will be processed, you will then get a protocol buffer response back. Both the request and the response are preceeded by a 4-byte big-endian size giving the number of bytes in the request or response.

Handling versions

Vector clock version objects are sent with requests. Basically a vector clock is a list of versions, one for each node that has written a request. The client is responsible for incrementing the version on the clock for the node it is communicating with when a successful put occurs. If the node has never co-ordinated a write before, the version will be missing from the list and should be added as 1.

The server will use the versions to attempt to eliminate any duplicate values present in a read. In the case of a non-trivial version conflict the server will return multiple versions to the client for a get() request. The client can determine the best way for these versions to be reconciled or given back to the user (either the client can come pre-baked with a reconciliation strategy the user passes in, or it can throw an exception holding the multiple versions back to the client, or it can simply return a list of values one for each version returned).

Handling failures

If the server disconnects, fails, or the socket times out, it is the duty of the client to reconnect to a new server transparently to complete the request. The client should print a warning message but not throw any error unless either (1) all servers are tried and fail in this way, or (2) some maximum time limit passes. In other words a normal server failure should be transparent to the client.

Some decisions to make

There are a few things to consider when doing the implementation:

Is your client going to be multi-threaded or thread safe? The java client is multi-threaded (i.e. the client itself maintains a thread pool that makes parallel requests) as well as thread safe (usable from multiple threads at the same time). It is intended to be used in a model where a single client is shared among all users in a client machine. However threading is not as common in non-java environments, and doesn’t really work properly at all in some (e.g. python). In these environments it may make more sense for each client to be single threaded, and the client program to hold many of these, one for each process or thread doing requests. There is no real downside to this model aside from limiting multithreaded use and it simplifies the implementation.

How is your client going to balance load over server nodes? One model is for each client to connect to a random node in the cluster and make all requests to that node. This can lead to uneven distribution of load, and the server with the most clients may get far more load than the server with the least, so this is not advised. Another model would be to maintain a connection for each Voldemort node and round-robin requests over servers. This gives the best load distribution, but requires the client to manage potentially many connections, and increases the total number of incoming connections to the servers. An alternative model is for each client to connect to a single server at a time, make N requests, and then connect to another server. This model seems to be the simplest to implement, and may give reasonable load balancing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly