New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False/bogus/needless sibling creation during handoff #1046
Comments
Could the handle_handoff_command on the handoff node just strip the coord from the put request? |
@jonmeredith That code is twisted. I spent a long time before I realized the following thingies below, from internal chat.
and
|
I think it needs to do more than that. But it needs to modify the request, which means a small change to riak_core to allow that. I’m on with it. Just change https://github.com/basho/riak_core/blob/1.4/src/riak_core_vnode.erl#L369 to allow either On 17 Nov 2014, at 16:03, Jon Meredith notifications@github.com wrote:
|
Er, sorry ... that was intended to show that handle_handoff_command() doesn't just do stuff on a handoff receiver's execution path. |
FWIW, I built Riak 1.3.2 from source, using the The sibling explosion is later, but it still happens. Instead of 10 seconds delay in visit_item() as he originally suggests, perhaps using 1 second could get the transfers to happen a bit more quickly and reveal the extra siblings? A list of something like
|
At https://github.com/basho/riak_kv/blob/1.4/src/riak_kv_vnode.erl#L872 can handle_handoff_command(Req=?KV_PUT_REQ{}, Sender, State) -> strip_coord(?KV_PUT_REQ{options= Options} = Req) -> On Mon, Nov 17, 2014 at 9:06 AM, Russell Brown notifications@github.com
Jon Meredith |
Can't do that -- riak_kv_vnode:handle_handoff_command() is called in the the 1st coordinating vnode's execution path. Madness! |
Since vnode1 is handing off, handle_handoff_command is called. This makes sense to me. Removing coord is not enough, anyway. The request is a put request, you need to generate a frontier on it. It must be coordinated somewhere. |
Thanks for the clarification, adding support for returning a {forward, NewReq, NewModState} definitely seems like the answer. |
Wow, nice work on this diagnosis @shino, @slfritchie, and @russelldb! |
I want to ask one questions for necessary condition. |
@shino I don't think you need any interleaving. Node D has been added to the system and claims partition X from node A Now repeat, D,X again handles incoming [{ax, 2}] which conflicts with [{ax,1}, {dx, 2}], another sibling, (now 3) and so on for each new write to A,X that is forwarded. The client never sees the [{dx, n}] entry, so it's update never dominate them. Does it sound right to you? The fix works by
|
Perfect explanation, thank you! |
Finally took the time to read through this issue. Great work Scott, Russell On Tue, Nov 18, 2014 at 7:27 AM, Shunichi Shinohara <
|
Fixed by #1047, closing. |
…s during handoff Forward port of this issue from 1.4 to 2.0: basho/riak_kv#1046 Original fix is: 76cbdc2 There is a matching PR for riak_kv
First, many thanks to @shino for being tenacious and putting together the necessary evil. I would've floundered without it.
Prerequisite: Set up Riak 1.4.x using Shino's recipe at https://gist.github.com/shino/f30eb66d8a53b8d71224 ... I used Riak 1.4.10.
My annotated log messages discussing the problem are at:
https://gist.github.com/slfritchie/442e09035c70c5ab6240 ... the first put is boring. The next two are intentionally simulating a read-update-put cycle which is written correctly but is 100% unlucky with regard to concurrent operation interleaving. The 2nd put is the winner and thus is boring.
The 3rd put is exciting because it causes the vector clock to be coordinated twice, as the vnode level and not at the FSM level. The put's 1st coordinator vnode is the vnode that is also handing off, so the vnode (riak_kv and riak_core interacting here, alas) forwards the op to the handoff destination vnode. The put op still has the
[coord]
options list, so the handoff receiver also acts as a coordinator. As far as Shino and I can tell, and probably @russelldb also, this 2nd coord put is bad. It creates an a new actor ID in the vclock (call itID_c2
), and all subsequent vnode ops that are forwarded to the vnode, by a single, sequential, correctly-written client, never have theID_c2
actor in the vclock because the get ops are never sent to the handoff destination vnode, and the handoff destination vnode is the only one that has a record of theID_c2
actor.Thus each single, sequential write to this key will create a new sibling and will continue to create siblings until handoff is finished.
In an ideal world, the
coord
option would be removed from an op that's forwarded. Unfortunately, that's impossible with the current API between riak_core_vnode and riak_kv_vnode:handle_handoff_command(), specifically the{forward, ...}
return tuple.Russell tells me that the forwarded object should be the result of the 1st coordinator put, so it's quite likely that the current forward-the-original-object-without-modification is wrong for a second reason, and we haven't noticed yet (but future CRDT could be tripped up by this error).
Basho folk: see also the Eng chat room for the morning of 2014-11-17 Monday US time.
The text was updated successfully, but these errors were encountered: