Skip to content

Key Value Operations

seancribbs edited this page Apr 24, 2012 · 3 revisions

Key-Value Operations

At the end of this guide, you should be familiar with:

  • Working with buckets and bucket properties
  • Listing buckets and keys
  • Fetching, storing, and deleting values
  • Using value metadata

This and the other guides in this wiki assume you have Riak installed locally. If you don't have Riak already, please read and follow how to install Riak and then come back to this guide. If you haven't yet installed the client library, please do so before starting this guide.

This guide also assumes you know how to connect the client to Riak. All examples assume a local variable of client which is an instance of Riak::Client and points at some Riak node you want to work with.

Key-Value

Riak is a near-pure key-value store, which means that you have no tables, collections, or databases. Each key stands on its own, independent of all the others. Keys are namespaced in Buckets, which also encapsulate properties which are common among all keys in the namespace (for example, the replication factor n_val). If you conceived of Riak as a big distributed Hash, it might look like this:

{
    (bucket + key) => value
}

What's missing from that picture:

  • Each pair is given a specific spot within a cluster of Riak nodes, so any node in the cluster can find it without having to ask. This is also known as "consistent hashing".
  • There are 3 copies of the pair by default.
  • The value has metadata that you can manipulate as well as the raw value.
  • A key may have multiple values in the case of race-conditions and network errors. (Don't worry about this for now, but do read Resolving Conflicts when you're ready.)

Enough with the exposition, let's look at some data!

Buckets

Since our keys are all grouped into buckets, in the Ruby client, we get a Riak::Bucket object before doing any key-value operations. Here's how to get one:

client.bucket('guides')
# or
client['guides']
# => #<Riak::Bucket {guides}>

This gives a Riak::Bucket object with the name 'guides' that is linked to the Riak::Client instance.

"But wait", you say, "doesn't that bucket need to exist in Riak already? How do we know which bucket to request?" Buckets are virtual namespaces as we mentioned above; they have no schema, no manifest other than any properties you set on them (see below) and so you don't need to explicitly create them. In fact, the above code doesn't even talk to Riak! Generally, we pick bucket names that have meaning to our application, or are chosen for us automatically by another framework like Ripple or Risky.

Listing buckets

If you are just starting out and don't know which buckets have data in them, you can use the "list buckets" feature. Note that this will give you a warning about its usage with a backtrace. You shouldn't run this operation in your application.

client.buckets
# Riak::Client#buckets is an expensive operation that should not be used in production.
#    (irb):5:in `irb_binding'
# ... (snipped)
# 
#  => []

Looks like we don't have any buckets stored. Why? We haven't stored any data yet! Riak gets the list of buckets by examining all the keys for unique bucket names.

Listing keys

You can also list the keys that are in the bucket to know what's there. Again, this is another operation that is for experimentation only and has horrible performance in production. Don't do it.

client['guides'].keys
# Riak::Bucket#keys is an expensive operation that should not be used in production.
#    (irb):10:in `irb_binding'
# ... (snipped)
# 
# => []

You can "stream" keys through a block (where the block will be passed an Array of keys as the server sends them in chunks), which is slightly more efficient for large key lists, but we'll skip that for now. Check out the API docs for more information.

Bucket properties

Earlier we alluded to bucket properties. If you want to grab the properties from a bucket, call the props method (which is also aliased to properties).

pp client['guides'].props

{"name"=>"guides",
 "allow_mult"=>false,
 "basic_quorum"=>false,
 "big_vclock"=>50,
 "chash_keyfun"=>{"mod"=>"riak_core_util", "fun"=>"chash_std_keyfun"},
 "dw"=>"quorum",
 "last_write_wins"=>false,
 "linkfun"=>{"mod"=>"riak_kv_wm_link_walker", "fun"=>"mapreduce_linkfun"},
 "n_val"=>3,
 "notfound_ok"=>true,
 "old_vclock"=>86400,
 "postcommit"=>[],
 "pr"=>0,
 "precommit"=>[],
 "pw"=>0,
 "r"=>"quorum",
 "rw"=>"quorum",
 "small_vclock"=>50,
 "w"=>"quorum",
 "young_vclock"=>20}

There are a lot of things in this Hash that we don't need to care about. The most commonly-used properties are detailed on the Riak wiki. Let's set the replication factor, n_val.

client['guides'].props = {"n_val" => 5}
# or
client['guides'].n_val = 5

A number of the most common properties are exposed directly on the Bucket object like shown above. Note that you can pass an incomplete Hash of properties, and only the properties that are part of the Hash will be changed.

pp client['guides'].props

{"name"=>"guides",
 "allow_mult"=>false,
 "basic_quorum"=>false,
 "big_vclock"=>50,
 "chash_keyfun"=>{"mod"=>"riak_core_util", "fun"=>"chash_std_keyfun"},
 "dw"=>"quorum",
 "last_write_wins"=>false,
 "linkfun"=>{"mod"=>"riak_kv_wm_link_walker", "fun"=>"mapreduce_linkfun"},
 "n_val"=>5,   # Here's the one we changed above
 "notfound_ok"=>true,
 "old_vclock"=>86400,
 "postcommit"=>[],
 "pr"=>0,
 "precommit"=>[],
 "pw"=>0,
 "r"=>"quorum",
 "rw"=>"quorum",
 "small_vclock"=>50,
 "w"=>"quorum",
 "young_vclock"=>20}

The other bucket property we might care about is allow_mult, which allows your application to detect and resolve conflicting writes. It is also exposed directly:

client['guides'].allow_mult = true

Fetching and storing values

Now let's fetch a key from our bucket:

client['guides'].get('key-value')
# or
client['guides']['key-value']

Depending on which protocol and backend you chose when connecting, you'll get an exception:

Riak::HTTPFailedRequest: Expected [200, 300] from Riak but received 404. not found

This means that the object does not exist (if you rescue the Riak::FailedRequest exception, its not_found? method will return true, tell you that the error represents a missing key). If you want to avoid the error for a key you're not sure about, you can check for its existence explicitly:

client['guides'].exist?('key-value')
# => false

If you don't care whether the key exists yet or not, but want to start working with the value so you can store it, use get_or_new:

kv_guide = client['guides'].get_or_new('key-value')
# => #<Riak::RObject {guides,key-value} [application/json]:nil>

This gives us a new [[Riak::RObject|http://rdoc.info/gems/riak-client/Riak/RObject]] to work with, which is a container for the value. In Riak's terminology, the combination of bucket, key, metadata and value is called an "object" -- please do not confuse this with Ruby's concept of an Object. All "Riak objects" are wrapped by the Riak::RObject class. Since this is a new object, the client assumes we want to store Ruby data as JSON in Riak and so sets the content-type for us to "application/json", which we can see in the inspect output. The default value of the object is nil. Let's set the data to something useful:

kv_guide.data = [1,2,3,4,5]
# => [1, 2, 3, 4, 5]

Now we can persist that object to Riak using the store method.

kv_guide.store
# => #<Riak::RObject {guides,key-value} [application/json]:[1, 2, 3, 4, 5]> 

If we list the keys again, we can see that the key is now part of the bucket (this time we use the bucket accessor on the object instead of going from the client object):

kv_guide.bucket.keys
# => ["key-value"]

Now let's fetch our object again:

client['guides']['key-value']
# => #<Riak::RObject {guides,key-value} [application/json]:[1, 2, 3, 4, 5]> 

Assuming we're done with the object, we can delete it:

kv_guide.delete
# => #<Riak::RObject {guides,key-value} [application/json]:[1, 2, 3, 4, 5]>
client['guides'].exist?('key-value')
# => false 

*Note: Deleting an RObject will freeze the object, making modifications to it impossible.

Working with metadata

We mentioned before that every value in Riak also has metadata, and the Riak::RObject lets you manipulate it. The only one we've really seen so far is the content type metadata, so let's examine that more closely.

Content type

For the sake of interoperability and ease of working with your data, Riak requires every value to have a content-type. Let's look at our previous object's content type:

# We deleted the object earlier, so let's store in Riak again. Skip
# this step if you didn't do the deletion.
kv_guide = kv_guide.dup.store

# Now, onto the content type.
kv_guide.content_type
# => "application/json" 

Under the covers, the Ruby client will automatically convert that to and from JSON when storing and retrieving the value. If we wanted to serialize our Ruby data as a different type, we can just change the content-type:

kv_guide.content_type = "application/x-yaml"

Now our object will be serialized to YAML. The Ruby client automatically supports JSON, YAML, Marshal, and plain-text serialization. (If you want to add your own, check out the Serializers guide.)

But what if the data we want to store is not a Ruby data type, but some binary chunk of information that comes from another system. Not to worry, you can bypass serializers altogether using the raw_data accessors. Let's say I want to store a PNG image that I have on my desktop. I could do it like so:

client['images'].new('riak.png').tap do |robject|
  robject.content_type = 'image/png'
  robject.raw_data = File.read('~/Desktop/riak.png')
  robject.store
end
# => #<Riak::RObject {images,riak.png} [image/png]:(100294 bytes)>

When the client doesn't know how to deserialize the content type, it will simply display the byte size on inspection. Now here's a fun part: since I just stored an image, I can open it with my browser:

node = client.nodes.first
# Use open on Mac OS/X, your system may have something else, like `gopen`
system "open http://#{node.host}:#{node.http_port}/buckets/images/keys/riak.png"

User metadata

You can also specify a bunch of free-form metadata on an RObject using the meta accessor, which is simply a Hash. For example, if we wanted to credit the PNG image we stored above to a specific person, we could add that and it would not affect the value of the object:

png = client['images']['riak.png']
png.meta['creator'] = "Bryan Fink"
png.meta['tool'] = 'Inkscape'
png.store

Now the next time we fetch the object, we'll get back that metadata too:

client['images']['riak.png'].meta
# => {"tool"=>["Inkscape"], "creator"=>["Bryan Fink"]} 

The values come back as Arrays because HTTP allows multiple values per header, and user metadata is sent as HTTP headers.

Vector clock

The Vector clock is Riak's means of internal accounting; that is, tracking different versions of your data and automatically updating them where appropriate. You don't usually need to worry about the vector clock, but it is accessible on the RObject as well:

png.vclock
# => "a85hYGBgzGDKBVIcypz/fvpP21GawZTInMfK4NHAfZIvCwA="

That vector clock will automatically be threaded through any operations you perform directly on the RObject (like store, delete, and reload) so that you don't have to worry about it.

Last-Modified time and ETag

Especially if you're using the HTTP interface, the last_modified and etag are useful. When reloading your object, they will be used to prevent full fetches when the object hasn't changed in Riak. They can also be used as a form of optimistic concurrency control (with very weak guarantees, mind you) by setting the prevent_stale_writes flag:

png # previously loaded
png.etag
# => "\"3z8BJENgFyE9nYtWaLNcQR\"" 
png.last_modified
# => 2012-04-24 15:31:57 -0400

png2 = client['images']['riak.png']
png2.meta['date'] = Time.now.to_s
png2.store
# => #<Riak::RObject {images,riak.png} [image/png]:(100294 bytes)>
png.meta['date'] = Time.now.to_s
png.prevent_stale_writes = true
png.store
# Riak::HTTPFailedRequest: Expected [200, 204, 300] from Riak but received 412.

Riak prevented the stale write by sending a 412 Precondition Failed response over HTTP.

Secondary Indexes and Links

You can also access the Secondary Indexes and Links directly from the RObject, but we won't cover those here.

What to do next

Congratulations, you finished the "Key-Value Operations" guide! After this guide, you can go beyond into more advanced querying methods, or take advantage of extended features of the client. Secondary Indexes are a very popular feature, and as all good Rubyists have thorough test suites, the Test Server is also a good next step.