Permalink
Browse files

move more control of copy transfers to vnode and add basic forwarding

* Instead of vnode manager triggering each transfer for each source
  index it triggers one "copy" transfer each. The
  copy transfer contains the list of target indexes to "copy"
  to. The vnode then triggers an outbound ownership_copy one at a time
  until all transfers for the list of indexes are complete. Once
  complete, it notifies the vnode manager like reglular handoff.
* Added (barely tested) support for forwarding.
* This approach more closely resembles typical ownership
  transfer/hinted handoff for a vnode. The primary differences are: 1)
  data is not deleted after handoff completes  (this needs to be
  addressed -- at some point some data needs to be deleted, see
  comments). 2) in the case that an index exists in both old & new
  rings it may copy its data to target indexes and then keep
  running. In this case data also needs to be deleted (also punted on)
  but some data must still remain (referred to as rehash in Core 2.0
  doc). 3) the same vnodes that are affected by #2 also differ in that
  after they begin forwarding they may stop and continue running in
  their regular state. In addition, when forwarding, these indexes will
  forward some requests but others will still be handled by the local vnode
  (not forwarded). What to do with a request during explicit forwarding
  (when vnode returns {forward, X} during handle_handoff_command) when
  forwarding that message would result in it being delivered to same
  vnode still needs to be addressed (see comments).
* This commit adds a vnode callback, request_hash, required only if
  supporting changing ring sizes. We probably need something better than
  this but its sufficient for a prototype. The function's argument is
  the request to be handled by the vnode and the return value is the
  hashed value of the key from the request. This is necessary because
  the request is opaque to riak_core_vnode. One obvious issue, for
  example, is in the case of FOLD_REQ there is no key to hash -- even
  though we probably shouldn't be and in some cases don't forward this
  anyways.
  • Loading branch information...
1 parent 085e3f9 commit ac639643a2778d54971521e20ca231d47b6eb09b @jrwest jrwest committed Jan 28, 2013
Showing with 248 additions and 154 deletions.
  1. +2 −1 src/riak_core_claimant.erl
  2. +3 −2 src/riak_core_handoff_sender.erl
  3. +12 −7 src/riak_core_ring.erl
  4. +178 −102 src/riak_core_vnode.erl
  5. +53 −42 src/riak_core_vnode_manager.erl
@@ -716,7 +716,8 @@ maybe_finish_size_change(Node, CState) ->
Claimant = riak_core_ring:claimant(CState),
case Claimant of
Node ->
- {true, riak_core_ring:future_ring(CState)};
+ {true,
+ riak_core_ring:increment_ring_version(node(), riak_core_ring:future_ring(CState))};
_ ->
{false, CState}
end.
@@ -167,6 +167,7 @@ start_fold(TargetNode, Module, {Type, Opts}, ParentPid, SslOpts) ->
module=Module,
parent=ParentPid,
tcp_mod=TcpMod,
+ stats=FinalStats,
total=SentCount} = R,
case ErrStatus of
@@ -190,9 +191,9 @@ start_fold(TargetNode, Module, {Type, Opts}, ParentPid, SslOpts) ->
FoldTimeDiff = end_fold_time(StartFoldTime),
lager:info("~p transfer of ~p from ~p ~p to ~p ~p"
- " completed: sent ~p objects in ~.2f seconds",
+ " completed: sent ~p of ~p objects in ~.2f seconds",
[Type, Module, SrcNode, SrcPartition,
- TargetNode, TargetPartition, SentCount,
+ TargetNode, TargetPartition, FinalStats#ho_stats.objs, SentCount,
FoldTimeDiff]),
case Type of
View
@@ -799,13 +799,18 @@ next_owner(?CHSTATE{next={_,Next}}, Idx, Mod) ->
case lists:keyfind(Idx, 1, Next) of
false ->
{undefined, undefined, undefined};
- {_, Owner, NextOwner, _Transfers} ->
- %% TODO: this is just a temporary patch since this function is only called
- %% by riak_core_vnode_manager:check_forward/3 so by always returning
- %% awaiting we essentially punt on forwarding. this should actually
- %% compute if all transfers for the idx under expansion have completed
- %% and return that status
- {Owner, NextOwner, awaiting};
+ {_, Owner, NextOwner, Transfers} ->
+ %% TODO: this really shouldn't be necessary but the next list
+ %% version from the last apprach is still being used. update when changed
+ HasAwaiting = lists:any(fun({_, _, Mods, _}) ->
+ not ordsets:is_element(Mod, Mods)
+ end,
+ Transfers),
+ Status = case HasAwaiting of
+ true -> awaiting;
+ false -> complete
+ end,
+ {Owner, NextOwner, Status};
{_, Owner, NextOwner, _Transfers, complete} ->
{Owner, NextOwner, complete};
{_, Owner, NextOwner, Transfers, _Status} ->
Oops, something went wrong.

0 comments on commit ac63964

Please sign in to comment.