Permalink
Switch branches/tags
Commits on Jun 18, 2012
Commits on Jun 13, 2012
Commits on Jun 12, 2012
  1. Kill local inbound handoffs

    rzezeski committed Jun 12, 2012
    When killing repairs attempt to kill both sides of the xfer to make
    sure it's dead.  That is, kill both the senders on the remote nodes
    and the receivers on the local node.
  2. Factor general filter code into Core

    rzezeski committed Jun 11, 2012
    Factor out the repair filter code shared by Search & KV into Core.
  3. Log error for non-existing xfer

    rzezeski committed Jun 11, 2012
    Rather than throwing and thus crashing the vnode mgr when a xfer
    complete msg comes in log an error and move on.  This case can occur
    if the vnode mgr crashes and it's state gets out of sync with the
    handoff mgr.
  4. Match in function head

    rzezeski committed Jun 11, 2012
  5. Check repair preconditions on owner

    rzezeski committed Jun 11, 2012
    Make the various precondition checks for repair on the owning node of
    the partition.  This is better because the repair call could be made
    from any node but ideally you want to check the ring and node status
    from the owning node for the simple fact that nodes don't always agree
    on the ring or the state of other nodes.
  6. Rewrite logging code to be cleaner

    rzezeski committed Jun 11, 2012
    The logging code in handoff sender was unnecessarily complex.  Also,
    it's better not to use a helper function as it obfuscates _where_ the
    logging statement was actually made.  I.e. lager will always print the
    name of the helper fun and not where the call was actually originated
    from.
  7. Fix call to `trigger_ownership_handoff`

    rzezeski committed Jun 9, 2012
    In a previous commit I changed the `trigger_ownership_handoff` to take
    the transfers instead of the ring but forgot to update both places it
    is called.
  8. Pass in ring, extract chash

    rzezeski committed Jun 9, 2012
    Rather than extracting the chash outside of core, pass the ring and
    extract the chash there.
  9. Use existing `pending_changes` API

    rzezeski committed Jun 9, 2012
    Don't bother with the wrapper and just match directly against list.
  10. Add chash API to `riak_core_ring`

    rzezeski committed Jun 8, 2012
    Repair needs to get at the hash ring to build it's key filter.
  11. Make xfer async and idempotent

    rzezeski committed Jun 8, 2012
    The synchronous xfer call from vnode mgr to handoff mgr was very
    brittle.  Firstly, the handoff sender needs the vnode pid and the pid
    comes from the vnode manager but if the src and target partitions are
    on the same node then a deadlock will occur when the handoff mgr tries
    to get the pid from the vnode mgr which is still waiting for the
    handoff mgr to return, thus a deadlock.  Secondly, errors can occur
    during the call or max concurrency could be reached.
    
    Rather than code for these cases it's easier to make the xfer API
    idempotent and have the vnode manager constantly retry until it has
    confirmation of a completed xfer.
  12. Clarify types a bit

    rzezeski committed Jun 8, 2012
    Remove some confusion around the handoff types.
  13. Don't let `max_concurrency` crash handoff manager

    rzezeski committed Jun 7, 2012
    Check for `{error, max_concurrency}` return from `send_handoff` and
    create a `#handoff_status` with status of `max_concurrency` so that
    it will be retried at the next available opportunity.
    
    This is a bit hackish but if I put this code in `send_handoff` then I
    have to modify how hinted handoff works.
  14. Check for same node repair special case

    rzezeski committed Jun 7, 2012
    If the source and target vnodes are on the same owner then don't
    recursively call vnode manager to get the pid.  Otherwise, a deadlock
    occurs.
  15. Check `Reason` in vnode terminate

    rzezeski committed Jun 7, 2012
    If the worker pool is busy doing something, or `Mod:terminate` does
    blocking work, it could keep the vnode process in the `terminate`
    callback for up to 60s.  This is bad because for that 60s msgs will be
    queued up on the mailbox but then at the end of terminate they will
    just drop.  In the case of normal termination you probably want to let
    the worker finish it's work but a non-normal exit should bring down
    the vnode quickly to allow it to restart.
  16. Remove intermediate handoff pid

    rzezeski committed Jun 6, 2012
    Instead of using the handoff pid (`HPid`) "trick" in order to detect
    vnode death and thus complicating the sender code make the handoff
    manager be responsible for monitoring the vnode and kill the sender if
    vnode death is detected.
    
    As a refresher, the reason the `HPid` was introduced in the first
    place is because vnode master `sync_command` will wait infinitely if
    the vnode dies.  See commit `c44b086fd924f72809913c8059f4f02596b41548`
    for more info.
  17. Remove dead code

    rzezeski committed Jun 6, 2012
    The `handoff_finished` msg is no longer used.  Instead the 'DOWN' msg
    causes the xfer to be marked as complete.
  18. Fix dialyzer targets

    rzezeski committed Jun 6, 2012
  19. Remove timeout in handoff mgr

    rzezeski committed Jun 5, 2012
    This timeout was leftover from a previous attempt to write repair and
    is not used anymore.
  20. Move common repair stuff into core

    rzezeski committed May 17, 2012
    Place all common repair functions between Search and KV in the
    `riak_core_repair` module.
  21. Add ability to get all bucket props

    rzezeski committed May 16, 2012
    It's useful to be able to get all the bucket properties in one go.
    This is used by repair to get the `n_val` for all buckets.
  22. Add repair ability

    rzezeski committed May 4, 2012
    Add the framework needed for services on top of core to provide a
    "repair" mechanism.  At this point repair was built with the sole
    purpose of repairing data for KV and Search.
    
    The key insight behind repair is that since Riak is a replicated data
    store one partition can be rebuilt from the replicas on other
    partitions.  Specifically, the adjacent (i.e. before and after)
    partitions on the ring, together, contain all the replicas that are
    stored on the partition to be repaired.  The rub is that adjacent
    partitions also contain replicas that are _not_ meant to be on the
    partition under repair.  This means a filter function must be used
    while folding over the source partitions to transfer only the data
    that belongs.  This is done as efficiently as possible by generating a
    hash range for all the buckets and thus avoiding a preflist
    calculation for each key.  Only a hash of each key is done, it's range
    determined from a bucket->range map, and then the hash is checked
    against the range.  The services under repair (i.e. Search or KV) must
    provide callback functions to generate these ranges since it is
    specific to the service.
    
    In order to not repeat code and capitalize on concurrency control the
    repair mechanism is currently an extension of handoff.  This creates
    some awkwardness as repair is _not_ handoff.  Some of the differences
    include:
    
    1. It needs to filter data during the fold
    
    2. All vnodes involved are primary and thus repair _cannot_ block
    
    3. Repair involves 3 distinct partitions and 3 distinct vnodes
    
    4. Repair does not imply a change of responsibility from one vnode to
    another, but is more a sharing of data
    
    In the future the handoff subsystem should be rewritten around the
    notion of "transfers" in which repair, handoff, and ownership are all
    different logical operations but use the transfer mechanism
    underneath.  Especially now that repair is in it should be more clear
    what that system should look like to meet the goals of all 3.
    
    The idea of repair lives in the vnode manager as it is very much a
    vnode semantic.  This is important because other things like handoff
    and ownership also affect vnodes and thus affect repair.  The vnode
    manager controls the logical repair, but the handoff/transfer
    mechanism controls the physical movement of replicas from source to
    target.
    
    For now, it seemed easiest and smartest to _not_ allow ownership
    change and repair to run concurrently.  In the event that an ownership
    change is detected all repairs will be hard killed, regardless of
    status.
    
    In the case were the low-level repair transfer is killed because of
    `handoff_concurrency` a msg is _not_ sent back to the vnode mgr to
    indicate a failure for that transfer.  Instead, the vnode mgr has a
    period tick.  During each tick the vnode mgr checks the status of all
    it's repairs and retries any transfers that have since died for a
    reason other than completion.
Commits on Jun 11, 2012
  1. Update capability bugfix based on review feedback

    jtuple committed Jun 11, 2012
    Reduce scope of try/catch logic to only catch failures related to
    the capabilities ETS table, therefore ensuring we do not silently
    catch failures in other parts of the code.
    
    Fix incorrect return value on failure: Ring => {false, Ring}
  2. Fix issues with previous merge and cleanup some -specs

    jtuple committed Jun 11, 2012
    Merge of 'gh177-staged-clustering' and 'master' had a few issues.
    
    Re-add "clear member metadata on leave" logic that was removed in merge.
    Re-add riak_core_ring:cancel_transfers/1 that was removed in merge.
    Perform minor code clean-up.
    Fix some typespecs to make dialyzer happier (more work still needed).
    
    This finalizes integration into 'master' for #181
  3. Merge branch 'gh177-staged-clustering' into jdm-merge-staged-clustering

    jonmeredith committed Jun 11, 2012
    Resolves merge conflicts between staged clustering and claim sim/claimv3 branches.
    
    Conflicts:
    	src/riak_core.erl
    	src/riak_core_gossip.erl
    	src/riak_core_ring.erl