assume handoff if vnode exits 'normal' during queue requeuest #68

Merged
merged 3 commits into from Mar 1, 2013

Conversation

Projects
None yet
2 participants
Member

russelldb commented Feb 28, 2013

RE-OPENING…#64 closed in error.

The pipe_verify_handoff_blocking test in riak_test PR 154 demonstrated
that the monitor used by riak_pipe_vnode:queue_work_send may fire while
handoff is taking place. This situation is further described in
riak_pipe issue 63.

This commit addresses the problem by first checking whether the Reason
for the vnode's exit was 'normal'. If it was not 'normal', the
vnode_down error is returned as before. If it was 'normal', then it is
assumed that handoff happened, and the monitor is moved to the new
vnode, as indicated by riak_core_ring:next_owner/2.

Bryan Fink assume handoff if vnode exits 'normal' during queue requeuest
The pipe_verify_handoff_blocking test in riak_test PR 154 demonstrated
that the monitor used by riak_pipe_vnode:queue_work_send may fire while
handoff is taking place. This situation is further described in
riak_pipe issue 63.

This commit addresses the problem by first checking whether the Reason
for the vnode's exit was 'normal'. If it was not 'normal', the
vnode_down error is returned as before. If it was 'normal', then it is
assumed that handoff happened, and the monitor is moved to the new
vnode, as indicated by riak_core_ring:next_owner/2.
fd72f55

@russelldb russelldb commented on an outdated diff Mar 1, 2013

src/riak_pipe_vnode.erl
@@ -353,6 +343,41 @@ queue_work_send(#fitting{ref=Ref}=Fitting,
{error, {nodedown, Node}}
end.
+queue_work_wait(Ref, Index, Node, VnodePid) ->
+ %% monitor in case the vnode is gone before it
+ %% responds to this request
+ MonRef = erlang:monitor(process, VnodePid),
+ %% block until input confirmed queued, for backpressure
+ receive
+ {Ref, Reply} ->
+ erlang:demonitor(MonRef),
+ Reply;
+ {'DOWN',MonRef,process,VnodePid,normal} ->
+ %% the vnode likely just shut down after completing handoff
+ {ok, Ring} = riak_core_ring_manager:get_my_ring(),
+ case riak_core_ring:next_owner(Ring, Index) of
@russelldb

russelldb Mar 1, 2013

Member

Emacs erlang-flymake complains about exporting Next from this case statement. I'm a recent, zealotus convert to this stylistic preference. I think it makes the provenance of bindings much, much easier to find. I this case Next is used very close to this case statement, so is less of an issue.

@russelldb russelldb commented on an outdated diff Mar 1, 2013

src/riak_pipe_vnode.erl
+ %% block until input confirmed queued, for backpressure
+ receive
+ {Ref, Reply} ->
+ erlang:demonitor(MonRef),
+ Reply;
+ {'DOWN',MonRef,process,VnodePid,normal} ->
+ %% the vnode likely just shut down after completing handoff
+ {ok, Ring} = riak_core_ring_manager:get_my_ring(),
+ case riak_core_ring:next_owner(Ring, Index) of
+ {undefined, undefined, undefined} ->
+ %% ownership finished changing before we asked
+ %% ... check if Next==Node?
+ Next = riak_core_ring:index_owner(Ring);
+ {Node, Next, _Status} ->
+ %% ownership is still changing ... should this be
+ %% looser, and not care whether the transfer was
@russelldb

russelldb Mar 1, 2013

Member

I'd say so, all you want is the current Pid, is there a case where two ownership transfers happened so fast not relaxing this would lead to a badmatch for this case statement?

Bryan Fink added some commits Mar 1, 2013

Bryan Fink match result of case instead of exporting Next
Russell's review says this makes Emacs erlang-flymake happier.
b5dcc7d
Bryan Fink don't insist on transfer being from a specific node
As Russell pointed out, it's possible that two+ transfers could happen
in the time between receiving the DOWN and checking the owner. Ignore
where the transfer is _from_, and just monitor wherever it's going to.
a87f40a
Member

russelldb commented Mar 1, 2013

Ran basho/riak_test#154 before and after and got fail and pass as you'd expect.

WFM +1.

beerriot was assigned Mar 1, 2013

@beerriot beerriot pushed a commit that referenced this pull request Mar 1, 2013

Bryan Fink Merge pull request #68 from branch 'bwf-vnode-down' 9752bb1

@beerriot beerriot merged commit a87f40a into master Mar 1, 2013

1 check passed

default The Travis build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment