Permalink
Commits on Dec 23, 2011
  1. @jtuple

    Fix handoff issues in vnode/vnode manager

    Change trigger_handoff and finish_handoff events in the
    vnode to be ignored if the vnode modstate is already deleted.
    
    Add periodic timer to vnode manager to perform various management
    activities. Currently re-triggers ownership handoff so that handoff
    under load is not triggered only on ring changes.
    jtuple committed Dec 23, 2011
  2. @jtuple

    Merge branch 'az896-vnode-manager-rewrite-part2-rb'

    Conflicts:
    	src/riak_core_vnode.erl
    jtuple committed Dec 23, 2011
  3. @jtuple

    Refactor riak_core vnode management (part 2)

    Move forwarding and handoff decisions from individual vnode processes
    and into the vnode manager. The vnode manager makes handoff and
    forwarding decisions whenever the ring changes, and triggers vnode
    state changes as appropriate.
    
    Rewrite the logic by which per vnode handoff is marked as complete in
    the ring. In particular, move the logic from riak_core_gossip and into
    riak_core_vnode. The underlying ring changes are still serialized by
    riak_core_ring_manager through the ring_trans function.
    
    Change gossip throttling logic to trigger gossip whenever gossip tokens
    reset, replacing the gossip interval approach.
    
    Perform various tuning and additional minor changes that improve cluster
    operation during a heavy gossip spike.
    jtuple committed Dec 2, 2011
  4. @massung

    Merge pull request #128 from basho/jrm-handoff-1.1

    Handoff Manager now in charge of senders and receivers.
    massung committed Dec 23, 2011
Commits on Dec 22, 2011
  1. @massung

    Changed handoff_status to status and returning the list instead of {o…

    …k,List}. handoff_status.status is now defaulted to a list instead of undefined.
    massung committed Dec 22, 2011
  2. @massung
  3. @Vagabond

    Merge pull request #115 from basho/az903-repl-improvements

    Allow applications to register repl helpers, for custom per-object repl behaviour
    Vagabond committed Dec 22, 2011
  4. @massung

    More tweaks. Removed handoff tuple and now have a modindex and separa…

    …te node field in the handoff_status. send_outbound kills existing handoff (if different) before starting up.
    massung committed Dec 22, 2011
  5. Merge branch 'az996-listkeys-backpressure'

    Bryan Fink committed Dec 22, 2011
  6. @massung

    Vnodes now always try to handoff (ignoring whether or not it thinks i…

    …t already is), and the handoff_manager is the final determiner if it should handoff or not.
    massung committed Dec 22, 2011
  7. @massung
  8. @massung
  9. @massung

    Handoff manager now tracking the vnode_pid for outbound handoffs and …

    …notifies the vnode when the handoff completes.
    massung committed Dec 22, 2011
Commits on Dec 21, 2011
  1. @jonmeredith @massung
  2. @massung
Commits on Dec 20, 2011
  1. @massung

    Fixed bug in listener and added support for killing oldest handoffs i…

    …f set_concurrency is reduced below the number of active handoffs.
    massung committed Dec 20, 2011
  2. @massung
  3. @massung

    Updated the handoff_listener to properly reject incoming handoffs ins…

    …tead of shutting down. Also added a rejected_handoffs stat to riak_core_state that should act as a sliding window.
    massung committed Dec 20, 2011
  4. @massung

    Simple eunit test to build off of for the handoff manager and modifie…

    …d the listener to follow protocol.
    massung committed Dec 20, 2011
Commits on Dec 19, 2011
  1. @massung

    Handoff manager controls senders and receivers and tracks them. Hando…

    …ffs are a record. Handoff locks gone. Listener and vnode now go through the handoff manager instead of spawning senders and receivers themselves.
    massung committed Dec 19, 2011
  2. @rzezeski

    Merge branch '1.0'

    rzezeski committed Dec 19, 2011
  3. @rzezeski
Commits on Dec 16, 2011
  1. @rzezeski
  2. @russelldb

    Merge branch 'rdb_bz1278'

    russelldb committed Dec 16, 2011
  3. @rzezeski

    Convert new_claim to act as pass-thru to claim mdoule

    Think about removing this module in future major version with a warning
    in the release notes to anyone referencing this module.
    rzezeski committed Dec 12, 2011
  4. @rzezeski

    Revert "Remove new_claim module, everything was moved into claim module"

    This reverts commit b6409ca.
    
    Since we've already introduced this module name there may be people using
    it already and we don't want to break their system.
    rzezeski committed Dec 12, 2011
  5. @rzezeski
  6. @rzezeski

    Default to v2 claim, update QC tests, fix bug in select_indices

    1. The new default claim is now set to v2.
    
    2. The semantics of wants_claim changed so I had to update the wants_claim
       test.  Essentially the old wants_claim was simply an idicator if the ring
       is inbalanced at all and would return `{yes,0}` if it is.  The new wants_claim
       is more true to the name in that it return `{yes,N}` meaning the node would
       like to calim `N` partitions.
    
    3. Based on the unique nodes property there was an edge case in the situation
       where there is 16 partitions and 15 nodes.  I'm not sure if this edge case
       would appear in other situations.  Anyways, the way select_indices was written
       when the 15th node would go to claim it would determine that there was no safe
       partition it could claim and then would perform a rebalance (diagonalize).
       However a rebalance doesn't make any guarentee about keeping the target_n
       invariant on wrap around.  So you would end up with the last and first partition
       being owned by the same node.  The problem was that select_indies assumed that
       the first owner could give up it's partition `First = (LastNth =:= Nth)` but that
       wouldn't hold true and then no other partition could be claimed because they
       would all be within target_n of the LastNth/FirstNth.  My change is to pass
       an explicit flag in the accumulator that represents whether or not the node has
       claimed anything yet.  This makes, the possibly incorrect, assumption that the node
       never currently owns anything when `select_indices` is called.  I was able to get a
       500K iteration of the QC prop to pass but I do wonder if things could be different
       in production.  After talking with Joe he seemed to think the change was safe.
    rzezeski committed Dec 2, 2011
  7. @rzezeski
  8. @rzezeski

    Add 1 & 2 arity claim APIs

    The claim APIs currently require both 1 & 2 arity functions
    because of the two different ways legacy gossip and new gossip
    call claim.
    
    The reason both default and v1 are exported is because soon the
    default will be v2 and you still need a way to allow the user to
    set the claim algo to v1.
    rzezeski committed Nov 30, 2011
  9. @rzezeski

    Rename current claim algo to v1

    The new claim algo in riak_core_new_claim is going to replace
    the current default.  Rename current to v1 and later add new
    as v2.
    rzezeski committed Nov 30, 2011
  10. @rzezeski

    Comment out spiraltime QC

    This test always results in a timeout for me.
    For now just don't run it.
    rzezeski committed Nov 30, 2011
Commits on Dec 15, 2011
  1. @rzezeski

    Merge branch '1.0'

    rzezeski committed Dec 15, 2011
  2. @rzezeski
  3. @rzezeski

    Set default handoff_concurrency to 1

    We've found that in extreme load situations our default handoff concurrency,
    paired with the fact that, currently, no incoming throttling is done, can cause
    the node to become overloaded and the latency to spike.  Rather than have
    a potentially harmful default Riak should err on side of safety.
    rzezeski committed Dec 15, 2011