WIP: Moving AAE to riak_core #456

Open
wants to merge 18 commits into
from

Conversation

Projects
None yet
5 participants
Contributor

Licenser commented Nov 21, 2013

So,
I've started moving AAE from riak_kv to riak_core. It misses documentation and is not too well tested but it seems to work in first tests.

It changes the interface to the AAE code slightly by adding two more options that get passed:

  • The ring serivce (formerly riak_kv)
  • The responsible vnode module

This allows to run AAE for services other then _kv and also for multiple services on the same ring.

There is a riak_core_aae_vnode behavior that includes the functions the vnode needs to implement. And there still needs to move some code in there that is fairly general.

So please don't merge yet it's mostly to track progress and collect comments on the effort (and hopefully fine some people giving it a test :P)

It would be super-awesome if basho guys merge this.

lenary commented Nov 22, 2013

@Licenser I'm going to go through and make nit-picky comments in the docs. The devil is in the details, which we care about.

I may also have comments about the code, I think I saw one or two odd things in there.

docs/aae.md
+ {riak_core_entropy_manager, start_link,
+ [snarl_user, snarl_user_vnode]},
+ permanent, 30000, worker, [riak_core_entropy_manager]},
+```
@lenary

lenary Nov 22, 2013

Can you make a function in riak_core_entropy_manager that gives you this? something like riak_core_entropy_manager:supervisor_spec(Name, Service, VNode)

@lenary

lenary Nov 22, 2013

Does the first argument go with gen_name(Service) somewhere far below? If so, the function can (and should) take only two arguments so users don't trip up.

docs/aae.md
+* Add the `riak_core_aae_vnode` behavior to the vnode in question.
+* Implement the functions required by the behavior.
+* Implement the callback matches detailed in the VNode section.
+* Makre sure that `init/1` creates a hashtree, see `maybe_create_hashtrees/2` as example how to do that.
docs/aae.md
+
+* Add the Entropy Manager to your apps supervisor tree.
+* Add the `riak_core_aae_vnode` behavior to the vnode in question.
+* Implement the functions required by the behavior.
@lenary

lenary Nov 22, 2013

Probably shouldn't be a separate bullet point, could go with the one above.

docs/aae.md
+
+```erlang
+% ...
+handle_command(?FOLD_REQ{foldfun=Fun, acc0=Acc0}, _Sender, State) ->
@lenary

lenary Nov 22, 2013

What is this for? What should it do? Can you remove the call to fifo_db:fold/1 and we can work on a better toy example? Maybe an in-memory db that stores everything at each partition in a list?

docs/aae.md
+ {ok, State}.
+```
+
+An example implementation (taken from `riak_kv`) for internal utility functions used to create and maintain the hashtree:
@lenary

lenary Nov 22, 2013

Again, we may want to work on a better example of some kind. I'm not sure exactly what though.

docs/aae.md
+ %% Only maintain a hashtree if a primary vnode
+ {ok, Ring} = riak_core_ring_manager:get_my_ring(),
+ lager:debug("snarl_user/~p: creating hashtree.", [Index]),
+ case riak_core_ring:vnode_type(Ring, Index) of
@lenary

lenary Nov 22, 2013

You probably need a better explanation above about only having the hashtree on the primary node. Is this something we can build into the api so users don't have to do the work themselves?

priv/riak_core.schema
+%% -*- erlang -*-
+
+%% @doc enable active anti-entropy subsystem
+{mapping, "anti_entropy", "riak_kv.anti_entropy", [
@lenary

lenary Nov 22, 2013

surely "riak_core.anti_entropy"?

src/riak_core_aae_vnode.erl
+-type preflist() :: [{Index::integer(), Node :: term()}].
+
+-spec behaviour_info(atom()) -> 'undefined' | [{atom(), arity()}].
+behaviour_info(callbacks) ->
@lenary

lenary Nov 22, 2013

I'm not sure, but I think we may be looking to move to using -callback annotations. I'm checking with the rest of the Core team (the reason not to would be backwards compatibility)

src/riak_core_aae_vnode.erl
+ undefined.
+
+
+%% @doc aae_repair is called when the AAE system detectes a difference
@lenary

lenary Nov 22, 2013

On which vnode(s)? In vnode code, or somewhere else?

src/riak_core_aae_vnode.erl
+
+%% @doc aae_repair is called when the AAE system detectes a difference
+%% the simplest method to handle this is causing a read-repair if the
+%% system supports it. But the actuall implemetation is left to the
src/riak_core_aae_vnode.erl
+ ok.
+
+
+%% @doc hash_object is called to hash a object, how it does this is opaque
@lenary

lenary Nov 22, 2013

I assume this is for the tree?

src/riak_core_aae_vnode.erl
+
+%% @doc This is a asyncronous command that needs to send a term in the form
+%% `{ok, Hashtree::pid()}` or `{error, wrong_node}` to the process it was called
+%% from.
@lenary

lenary Nov 22, 2013

What's this for? How do you use it? Where do you use it?

@lenary

lenary Nov 26, 2013

This would normally call into your vnode, requesting the pid, but also in such a way that the vnode sends a message back to the caller. It also has some relatively specific code that it uses in the handle_command callback.

src/riak_core_aae_vnode.erl
+ {error, wrong_node}.
+
+%% @doc Returns the vnode master for this vnode type, that is the same
+%% used when registering the vnode.
@lenary

lenary Nov 22, 2013

Again, what is this for? Where is it used? How?

@lenary

lenary Nov 26, 2013

This is used to return the name your vnode master is registered under, to make various calls easier. i.e. for riak_kv it's riak_kv_vnode_master.

@lenary

lenary Nov 26, 2013

this should not call into the VNode, or else you'll probably break something. Just hard-code a value into the function.

+%% that do not have divergent KV replicas. In that case, read repair is never
+%% triggered. Always rehashing keys after any attempt at repair ensures that
+%% AAE does not try to repair the same non-divergent keys over and over.
+
@lenary

lenary Nov 22, 2013

Ok, and how do we use this? Where? What should it do?

@lenary

lenary Nov 26, 2013

Ok, this is used after repair, when something external wants all vnodes dealing with {Bucket, Key} to check the info in the hashtreee is up to date. This should probably just make a preflist command call into each vnode, but i suppose that's up to the developer.

src/riak_core_index_hashtree.erl
+ undefined ->
+ case riak_core_entropy_manager:enabled() of
+ true ->
+ lager:warning("Neither riak_kv/anti_entropy_data_dir or "
@lenary

lenary Nov 22, 2013

"riak_core/anti_entropy_data_dir" ?

lenary commented Nov 22, 2013

I still need to look over the code, but for the moment that should be enough to be getting on with.

Sorry for the tiny criticisms and weird questions, i just want to make sure this PR is 100 kinds of awesome.

Contributor

rzezeski commented Nov 22, 2013

I'm going to jump in here not having looked at any of the code yet. I just have a few concerns I want to raise.

I think this needs to be a post 2.0 project. Yokozuna and Riak Enterprise both have their own version of AAE as well and this generic version in core would need to account for that. Yokozuna is particularly tricky because it doesn't actually have real vnodes like KV. I'm not sure exactly how a generic core system would handle these differences. E.g. in Yokozuna all index hashtrees live under a supervisor, not the vnode. Another example, Yokozuna only exchanges intra-partition, not preflists between different partitions. If you look at the AAE status code you can see I at least managed to make that generic to KV and Yokozuna. In any case, any change to AAE has to be closely vetted in Yokozuna as well and a change like this was just not on the roadmap for 2.0 planning.

That said I would LOVE to have almost all AAE code shared between the different systems like KV and Yokozuna. Currently I have to keep an eye out for any AAE changes, study them carefully, and determine if they need to be ported to Yokozuna AAE code (which is mostly a copy & paste of KV but with some non-trivial differences). That's very error prone and hopefully this would solve all or most of that pain.

I'd like time to help review this ticket since it can greatly effect the Yokozuna code and right now I just don't have time for that given the upcoming 2.0 release.

Contributor

Licenser commented Nov 23, 2013

@lenary thanks a bunch for all those comments :) you rise some very valid points (and found some very misspelled typos ;) I'll update the pull request with changes regarding your points today.

I 100% agree with the -callback notation the only reason I stuck with the behavior info is that the riak_core_vnode uses it and I did not want to go a different path with this behavior. If you don't need the backwards compatibility (I think it's R14) I'll change it in an instant. Also all the functions in the aae_vnode are just there as a 'idea' for people how the headers look like since behavior info does not give a hint like that, kind of a poor mans -callback ;)

Contributor

Licenser commented Nov 23, 2013

@rzezeski I hear you on that, totally agree it should be general enough to be used on Yokozuna and Riak EE. That said I don't know Yokozuna, and have entirely no access to EE so I'll have to rely on input on that side.

But all in all I'm not too cornered, the AAE code is very well separated from the rest as it is not sure how it ties into Yokozuna.

Also there is an alternative to integrating AAE into core I wanted to bring up, making it own repository people can include if they like to - obviously tied to _core since it uses a good bit of _cores functions but a standalone thing. (just a thought, not sure if it's a good one)

As a note, the form here is running quite well in FiFo's Snarl and Sniffle :) so it does not seem to be horribly broken ;)

creation, recommended

so it should listen for

lenary commented Nov 23, 2013

@Licenser This is great work! Thanks!

I'm still waiting on an answer from other bashtronauts as to whether we can use -callback annotations.

Everything's looking a lot better, sorry for holding you to a higher standard than we do for ourselves yet, but my job soon will be to make sure we're at that high standard in _core.

Ok, there are still a few outstanding comments, on the stuff in aae_vnode, that I'd like cleared up for documentation purposes, if you'd be so kind.

@rzezeski I'm very bad at managing expectation. I'm not imagining this change happening in time for 2.0, but hopefully the next major release afterwards (when it should land with the rest of my _core work)

Contributor

Licenser commented Nov 26, 2013

Okay put in some more detailed information to address @lenary's comments only thing I'm not sure about what is wrong with the comment on Line 71, it seems very well explained (and has very few typos and grammer errors since it wasn't written by me :P)

Contributor

Licenser commented Nov 27, 2013

I'm stuck with one problem and was hoping for some help. every now and then one of the exchange_fsm's crashes with this message

2013-11-27 05:20:37 =ERROR REPORT====
** State machine <0.1021.0> terminating
** Last event in was start_exchange
** When State == prepare_exchange
**      Data  == {state,{0,'dev4@127.0.0.1'},{182687704666362864775460604089535377456991567872,'dev4@127.0.0.1'},{0,3},<0.517.0>,undefined,0,300000,sniffle_vm_vnode,sniffle_vm}
** Reason for termination =
** {{noproc,{gen_server,call,[{ok,'dev4@127.0.0.1'},{riak_vnode_req_v1,0,{server,undefined,undefined},{riak_core_fold_req_v2,#Fun<riak_core_index_hashtree.7.99184084>,ok,false,[aae_reconstruction]}},infinity]}},{gen_server,call,[<0.517.0>,{get_lock,local_fsm,<0.1021.0>},infinity]}}
2013-11-27 05:20:37 =CRASH REPORT====
  crasher:
    initial call: riak_core_exchange_fsm:init/1
    pid: <0.1021.0>
    registered_name: []
    exception exit: {{{noproc,{gen_server,call,[{ok,'dev4@127.0.0.1'},{riak_vnode_req_v1,0,{server,undefined,undefined},{riak_core_fold_req_v2,#Fun<riak_core_index_hashtree.7.99184084>,ok,false,[aae_reconstruction]}},infinity]}},{gen_server,call,[<0.517.0>,{get_lock,local_fsm,<0.1021.0>},infinity]}},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,622}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
    ancestors: [sniffle_vm_entropy_manager,sniffle_sup,<0.341.0>]
    messages: [{'DOWN',#Ref<0.0.0.3643>,process,<0.517.0>,{noproc,{gen_server,call,[{ok,'dev4@127.0.0.1'},{riak_vnode_req_v1,0,{server,undefined,undefined},{riak_core_fold_req_v2,#Fun<riak_core_index_hashtree.7.99184084>,ok,false,[aae_reconstruction]}},infinity]}}}]
    links: []
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 27
    reductions: 175
  neighbours:

It thinks it should send something to {ok,'dev4@127.0.0.1'} and I'm entirely uncertain where it gets that idea from, I've looked thorough the code path to this point as good as I understand it and see no place this should be returned.

If someone with a deeper understanding of the AAE system could give a hint here it would be greatly appreciated.

Contributor

Licenser commented Nov 27, 2013

Some of the callbakcs are now in the 'convention' land and do not need to be exported by the implementing vnode any more.

lenary commented Nov 28, 2013

Heads up @Licenser - there's likely to be a few other optimisations to AAE for 2.0, I'll keep you posted. Allegedly they're minor

Contributor

Licenser commented Nov 28, 2013

Okay @lenary fortunately found the problem :)

lenary commented Nov 28, 2013

Is that ETS table per-service, or is it ok to have one globally?

Contributor

Licenser commented Nov 28, 2013

I made it be one global one and not one per service, it tags the things it inserts the service :) I found it more sensible then having one per service.

Contributor

jrwest commented Mar 24, 2014

As @rzezeski mentioned, this is best-suited as a post-2.0 project. We are doing some issue cleaning today, so I am marking this for milestone 2.1 (which is an umbrella for potentially planned future work at this point, not an actual commitment to the 2.1 release)

@jrwest jrwest added this to the 2.1 milestone Mar 24, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment