Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

A simple way to store a PN-Counter in a riak_object #536

Merged
merged 25 commits into from

4 participants

@russelldb
Owner

Adds an API endpoint for PB and HTTP for incrementing (and getting the value of) a counter.

The counter is implemented as a PN-Counter. In fact the PN-Counter and G-Counter modules from riak_dt are copied into riak_kv (along with their quickcheck tests).

There is a wrapper module riak_kv_counter that mediates between riak_object and riak_kv_pn_counter.

The regular get/put fsms are used. Some small change is made to the vnode to allow counter siblings to merge to a single value before being written to disk.

The wm and pb endpoints also use the riak_kv_counter merge function to ensure a single value is returned to the user.

If (by accident, I hope) a bucket/key is written too as a regular riak_object write, the conflicting riak_object sibling is maintained.

To use store counters the bucket must be {allow_mult, true}. I recommend that you don't mix counter and non-counter objects in the same bucket.


Examples:

curl -v -X POST localhost:8098/buckets/my_counters/counters/c1 -d"1"
Counters require bucket property 'allow_mult=true'*
> curl -X PUT localhost:8098/buckets/my_counters/props -H"Content-Type: application/json" -d "{\"props\" : {\"allow_mult\": true}}"

curl -X POST localhost:8098/buckets/my_counters/counters/c1 -d"1"
curl localhost:8098/buckets/my_counters/counters/c2
1
> curl -X POST localhost:8098/buckets/my_counters/counters/c1 -d"100"
101
> curl -X POST localhost:8098/buckets/my_counters/counters/c1 -d"-1"
100

Etc.


Depends on:
https://github.com/basho/riak_pb/tree/rdb-kv-counter

Riak test at
https://github.com/basho/riak_test/tree/rdb-kv-counters

PB support at
https://github.com/basho/riak-erlang-client/tree/rdb-kv-counter


Obviously I'll create a squashed branch after the discussion / review.

@seancribbs
Owner

FWIW I used this branch for the demo at NoSQL Matters Cologne. wfm

@russelldb russelldb referenced this pull request in basho/riak_pb
Merged

Add counter messages #45

src/riak_kv_gcounter.erl
@@ -0,0 +1,187 @@
+%% -------------------------------------------------------------------
+%%
+%% riak_kv_gcounter: A state based, grow only, convergent counter
+%%
+%% Copyright (c) 2007-2012 Basho Technologies, Inc. All Rights Reserved.
@jrwest
jrwest added a note

copyright update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_gcounter.erl
((8 lines not shown))
+%% Version 2.0 (the "License"); you may not use this file
+%% except in compliance with the License. You may obtain
+%% a copy of the License at
+%%
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+%%
+%% -------------------------------------------------------------------
+
+%%% @doc
@jrwest
jrwest added a note

would be nice to add a reference similar to vclock.erl [1]

[1] https://github.com/basho/riak_core/blob/master/src/vclock.erl#L25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_gcounter.erl
((26 lines not shown))
+
+-module(riak_kv_gcounter).
+
+-export([new/0, new/2, value/1, update/3, merge/2, equal/2]).
+
+-ifdef(EQC).
+-include_lib("eqc/include/eqc.hrl").
+-export([gen_op/0, update_expected/3, eqc_state_value/1]).
+-endif.
+
+-ifdef(TEST).
+-include_lib("eunit/include/eunit.hrl").
+-endif.
+
+%% EQC generator
+-ifdef(EQC).
@jrwest
jrwest added a note

know this is nitpicky but any reason this is at the top of the file instead of w/ all the other eqc stuff. found it a bit difficult reading the code. unless I'm missing some reason it needs to be defined here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_counter.erl
@@ -0,0 +1,140 @@
+%% -------------------------------------------------------------------
+%%
+%% riak_kv_counter: Counter logic to bridge riak_object and riak_kv_pncounter
+%%
+%% Copyright (c) 2007-2010 Basho Technologies, Inc. All Rights Reserved.
@jrwest
jrwest added a note

copyright update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_counter.erl
((13 lines not shown))
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+%%
+%% -------------------------------------------------------------------
+-module(riak_kv_counter).
+
+-export([update/3, merge/1, value/1]).
+
+-include("riak_kv_wm_raw.hrl").
+
+%% @doc A counter is a two tuple of a `riak_kv_pncounter'
@jrwest
jrwest added a note

think this should be a module edoc but it would be considered the edoc for update/3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_counter.erl
((21 lines not shown))
+%% -------------------------------------------------------------------
+-module(riak_kv_counter).
+
+-export([update/3, merge/1, value/1]).
+
+-include("riak_kv_wm_raw.hrl").
+
+%% @doc A counter is a two tuple of a `riak_kv_pncounter'
+%% stored in a `riak_object'
+%% with the tag `riak_kv_pncounter' as the first element.
+%% Since counters can be stored with any name, in any bucket, there is a
+%% chance that some sibling value for a counter is
+%% not a `riak_kv_pncounter' in that case, we keep the sibling
+%% for later resolution by the user.
+%%
+%% @TODO How do we let callers now about the sibling values?
@jrwest
jrwest added a note

@russelldb did you have any further thoughts on this? I can't think of a conceivable use case where this would happen on purpose. I think there is only so much we can do here. Forgetting about counters for a moment, if I write an application that accidentally uses the same key for two purposes Riak will never let me know. Errors would most likely (or hopefully) show up at the application level. With counters, the same holds but its even more likely I get an errror since my deserializer won't know what know what to do w/ the counter bytes. Since my assumption is this is an application-level bug I don't think its an issue.

tl;dr I don't think Riak needs to do anything special here (the TODO can be removed). Assuming the application wants to read what it wrote, when reading the sibling it will come across the counter bytes and highlight an application-level bug.

@russelldb Owner

I agree. And since I added the metadata content-type entry of application/riak-pncounter to all counter objects, client applications can detect a counter sibling and take some appropriate action.

@jrwest
jrwest added a note

cool, my vote would be to remove the TODO then. it has a spelling error anyways ;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_pncounter.erl
@@ -0,0 +1,182 @@
+%% -------------------------------------------------------------------
+%%
+%% riak_kv_pncounter: A convergent, replicated, state based PN counter
+%%
+%% Copyright (c) 2007-2012 Basho Technologies, Inc. All Rights Reserved.
@jrwest
jrwest added a note

copyright update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_pncounter.erl
((26 lines not shown))
+-include_lib("eqc/include/eqc.hrl").
+-endif.
+
+-ifdef(TEST).
+-include_lib("eunit/include/eunit.hrl").
+-endif.
+
+%% API
+-export([new/0, new/2, value/1, update/3, merge/2, equal/2]).
+
+%% EQC API
+-ifdef(EQC).
+-export([gen_op/0, update_expected/3, eqc_state_value/1]).
+-endif.
+
+%% EQC generator
@jrwest
jrwest added a note

same comment here as the one on gcounter re: location of these definitions in file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_pncounter.erl
((8 lines not shown))
+%% Version 2.0 (the "License"); you may not use this file
+%% except in compliance with the License. You may obtain
+%% a copy of the License at
+%%
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+%%
+%% -------------------------------------------------------------------
+
+-module(riak_kv_pncounter).
@jrwest
jrwest added a note

this module could use some typespecs. one for pncounter and all the exported functions.

@jrwest
jrwest added a note

this module could also use a reference to crdt paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_gcounter.erl
((12 lines not shown))
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+%%
+%% -------------------------------------------------------------------
+
+%%% @doc
+%%% a G-Counter CRDT, borrows liberally from argv0 and Justin Sheehy's vclock module
+%%% @end
+
+-module(riak_kv_gcounter).
@jrwest
jrwest added a note

same here, re: typespecs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jrwest jrwest commented on the diff
src/riak_kv_util.erl
@@ -332,6 +333,11 @@ mark_indexes_reformatted(Idx, 0, ForUpgrade) ->
mark_indexes_reformatted(_Idx, _ErrorCount, _ForUpgrade) ->
undefined.
+%% @Doc vtag creation function
+-spec make_vtag(erlang:timestamp()) -> list().
+make_vtag(Now) ->
@jrwest
jrwest added a note

The yessir backend had borrowed this function as well [1]. maybe replace its usage too?

[1] https://github.com/basho/riak_kv/blob/master/src/riak_kv_yessir_backend.erl#L319

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_vnode.erl
@@ -945,8 +949,18 @@ prepare_put(#state{idx=Idx,
IndexSpecs = []
end,
ObjToStore = case Coord of
- true ->
- riak_object:increment_vclock(RObj, VId, StartTime);
+ true ->
@jrwest
jrwest added a note

this was somewhat hard to grok w/ all the nested case statements. maybe pull out part into something like prepare_coord_put?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/riak_kv_wm_counter.erl
((8 lines not shown))
+%% Version 2.0 (the "License"); you may not use this file
+%% except in compliance with the License. You may obtain
+%% a copy of the License at
+%%
+%% http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing,
+%% software distributed under the License is distributed on an
+%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+%% KIND, either express or implied. See the License for the
+%% specific language governing permissions and limitations
+%% under the License.
+%%
+%% -------------------------------------------------------------------
+
+%% @doc @TODO doc this
@jrwest
jrwest added a note

this should be finished w/ similar API docs to wm_object pre-merge if possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jrwest

ran all the tests (including riak_test). code looks good. Was initially worried about upgrade path and future work but @russelldb has done a great job of making the changes to riak_kv minimal. It should be easy to integrate or even rip out his kv specific changes to make it work w/ DVVSets, etc. Most of the comments are code/style related not functional but a few things we discussed offline:

  • @russelldb is working on a branch that cleans up some of the duplication in the webmachine and pb code. going to look at this before merging (although if we want to get this in, imo it can wait if it has to, we have this type of duplication elsewhere, but its not ideal).
  • Adding a returnvalue option similar to returnbody on write that returns the counters value (also something that can be added later if we have to, but I think this would be good to include initially)
  • A more compact binary format for counters. Logically to users counters are numbers (regardless of implementation), having overhead similar to what we currently have for vector clocks is a lot. Also, this gets us binary versions for counters which is always a win.

I have not done any perf. testing w/ these changes. I believe @evanmcc was planning to do some if he has time. I can do some quick runs on EC2 spot instances or similar. I suspect the performance will be very similar to riak_kv since, as mentioned above, little is actually changed in the write path, but numbers are always better than my conjecture.

@russelldb russelldb Address review comments
Add `returnvalue` to both endpoints
Add specs + docs to counter modules
Add API docs to HTTP resource
Refactor prepare put code in vnode
Remove copied fun from yessir backend
1d7cbd9
@russelldb
Owner

Addressed them all except "compact binary format" which I will do after this scheduled break.

@russelldb
Owner

Heh, I tried copying the riak_object style binary encoding and it turns out a tad larger than t2b.

@jrwest

heh, I'm sure there are some gymnastics we could do but probably not worth it now. In that case: I have reviewed all the other changes and have taken returnvalue for a spin. Dunno if maybe someone from clients (e.g. @seancribbs) should take a loot at the interface side of things but :+1: from me! nice work @rdb.

@russelldb
Owner

Actually…I'm surprised. So a riak_object style scheme for the gcounter lead to a slightly larger size than t2b, as it did also for the pncounter (which just used the gcounter) but by the time you get up to riak_kv_counter (which use pncounter) the binary scheme is slightly better than t2b.

I wonder if there isn;'t some tweaking to be done here, to use t2b when it suits, and our own scheme elsewhere. I bet t2b is smarter about the size of the integers (for example) by using binary:encode_unsigned.

I pooshed the branch, in case you want to take a peek rdb-kv-counter...rdb-kv-counter_bin

@russelldb russelldb merged commit 2718992 into from
@seancribbs seancribbs was assigned
@thefosk

Can this feature be used to create atomic counters across the cluster?

@russelldb
Owner

No, it cannot. These are eventually consistent counters. Any replica / node can take an increment/decrement (even when partitioned) and that write will persist, and be correct when handoff / healing occurs. But no, not atomic across the cluster. Also be aware of the partial failure / partial success case: if you do an increment with W=2 and Riak reports a failure, maybe 1 replica did succeed. Re-attempting that operation may lead to 2 counts. That is to say these counters are not idempotent, either.

If you're familiar with Cassandra they're conceptually similar to those counters. Feel free to ask on the riak-users list[1] if you have more questions, please.

[1] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Feb 4, 2013
  1. @russelldb
Commits on Feb 7, 2013
  1. @russelldb
Commits on Feb 11, 2013
  1. @russelldb

    WIP commit

    russelldb authored
    EQC test
    Changes to counter because of it
    Export riak_object:index_specs
  2. @russelldb
  3. @russelldb
Commits on Feb 12, 2013
  1. @russelldb
  2. @russelldb
  3. @russelldb

    WIP commit

    russelldb authored
    EQC test
    Changes to counter because of it
    Export riak_object:index_specs
  4. @russelldb
  5. @russelldb
  6. @russelldb
  7. @russelldb

    Merge branch 'rdb-kv-counter' of beast:riak_kv into rdb-kv-counter

    russelldb authored
    Conflicts:
    	test/kv_counter_eqc.erl
Commits on Feb 18, 2013
  1. @russelldb

    WIP add PB handler

    russelldb authored
Commits on Apr 9, 2013
  1. @russelldb
  2. @russelldb

    Merge branch 'master' into rdb-kv-counter

    russelldb authored
    Conflicts:
    	src/riak_object.erl
Commits on Apr 10, 2013
  1. @russelldb

    Drop indexes from counters

    russelldb authored
    Add minimal required meta data to work with HTTP API
  2. @russelldb

    Respond with error if allow_mult is false on counter post

    russelldb authored
    Counters require sibling values to converge
Commits on May 9, 2013
  1. @russelldb
  2. @russelldb

    Add include for types

    russelldb authored
Commits on May 22, 2013
  1. @russelldb

    Address review comments

    russelldb authored
    Add `returnvalue` to both endpoints
    Add specs + docs to counter modules
    Add API docs to HTTP resource
    Refactor prepare put code in vnode
    Remove copied fun from yessir backend
Commits on May 23, 2013
  1. @russelldb

    Tentative addition of custom binary format for counters

    russelldb authored
    Counters all the way down.
Commits on May 24, 2013
  1. @russelldb

    Update binary format to use t2b for gcounter

    russelldb authored
    Fix eqc test ot work with binary encoded counters.
  2. @russelldb
  3. @russelldb

    Remove dbg statement

    russelldb authored
  4. @russelldb

    Merge pull request #563 from basho/rdb-kv-counter_bin

    russelldb authored
    Add binary format for counters
Something went wrong with that request. Please try again.