Permalink
Switch branches/tags
Commits on May 22, 2012
  1. Clean-up handoff_ip code provided by micmac in pull request #176

    jtuple committed May 22, 2012
    Fix potential badmatch from re-using variable TNHandoffIP.
    
    Change get_handoff_ip to use standard try/catch block, and default to
    using the normal hostname if unable to resolve the handoff IP.
Commits on May 18, 2012
  1. Working handoff_ip=0.0.0.0 fix

    Endre Hirling
    Endre Hirling committed May 18, 2012
Commits on May 17, 2012
  1. Handle changing handoff_ip to other than the ip associated with node …

    Endre Hirling
    Endre Hirling committed May 17, 2012
    …name
Commits on Apr 29, 2012
  1. Merge pull request #172 from basho/edocs-readme-add

    Mark Phillips
    Mark Phillips committed Apr 29, 2012
    adding pointer to core edocs to README
Commits on Apr 28, 2012
  1. adding pointer to core edocs to readme

    Mark Phillips
    Mark Phillips committed Apr 28, 2012
Commits on Apr 25, 2012
  1. Add visibility into transfers/handoff

    rzezeski committed Apr 18, 2012
    Purpose
    ----------
    
    Visibility into handoff is really poor.  The typical method used to
    discover handoff information is `riak-admin transfers` but that gives
    hardly any useful information, as shown below.
    
        'dev3@127.0.0.1' waiting to handoff 7 partitions
        'dev2@127.0.0.1' waiting to handoff 4 partitions
        'dev1@127.0.0.1' waiting to handoff 5 partitions
    
    This PR adds visibility transfers/handoff by tracking various stats on
    active transfers and displaying this information in a human friendly
    way, as shown below.
    
        ./dev/dev1/bin/riak-admin transfers
        'dev3@127.0.0.1' waiting to handoff 6 partitions
        'dev2@127.0.0.1' waiting to handoff 4 partitions
        'dev1@127.0.0.1' waiting to handoff 6 partitions
    
        Active Transfers:
    
        transfer type: ownership_handoff
        vnode type: riak_kv_vnode
        partition: 365375409332725729550921208179070754913983135744
        started: 2012-04-24 18:43:44 [5.96 s ago]
        last update: 2012-04-24 18:43:48 [1.91 s ago]
        objects transferred: 8651
    
                               2135 Objs/s
          dev3@127.0.0.1 =======================>   dev1@127.0.0.1
                                17.62 MB/s
    
    This PR also gets rid of the annoying side effect of resetting the
    inactivity timeout when calling `riak-admin transfers`.  This would
    often cause users to wonder why handoffs were never occurring.
    
    Implementation Details
    ----------
    
    One issue with handoff is that it uses vnode folds to do all it's
    work.  This has the one nice benefit that it avoids a local copy of
    data (1) but has bad side effect of using uninterruptable fold.  That
    is, the vnode fold does the work as fast as it can and doesn't stop
    until it's done (2).
    
    In order to get status updates about the handoff the accumulator keeps
    some local stats and _approximately_ every 2 seconds sends those stats
    to the handoff manager via the `status_update/2` API.  I say the
    timing is approximate because expiration of the interval is only
    checked during a sender/receiver sync phase (determined by
    `ACK_COUNT`).  If the receiver can't keep up or the sender fold is
    slow then the status updates could take longer.  Essentially, this
    code assumes that `ACK_COUNT` objects can be transferred in less than
    2s.  **N.B.** The duration of the status update interval will not
    invalidate the stats since they are based on start time and time of
    last sync (see `riak_core_handoff_manager:update_stats/2`).
    
    The reason the sender only sends a status update every 2s and only
    checks if this interval has expired on sender/receiver sync is because
    the vnode fold is a tight loop.  Sending an update for every object
    would be too chatty and checking the interval every object could
    potentially slow from overhead of getting time and doing math.
    
    There are two types of transfer currently, _ownership handoff_ and
    _hinted handoff_.  Soon there will be another type, _repair_.  In
    order to disambiguate the two types of handoff I have to determine if
    the source vnode is primary or not.  In the case of ownership handoff
    it is a `primary -> secondary` handoff (where the secondary becomes
    primary after handoff completes) and for hinted handoff it's
    `secondary -> primary`.
    
    In order to make the stats a little easier to read I added a little
    human friendly formatting.  I decided to put the code to support this
    in Core rather than KV.  I stole and modified the code from
    @seancribbs.
    
    One aspect of this PR I'm not wild about is the fact that in order to
    get the status a msg must be sent to each handoff manager on each each
    node for every time `riak-admin transfers` is called (3).  I'd rather
    see a push system where all active status data is collated at a
    particular node, like the claimant node in ownership, and the status
    call simply reads that.  The pull system is probably fine for now but
    could cause trouble on larger clusters, especially if some script
    accidentally calls it in a tight loop.
    
    I'm wondering if I should have make use of the stats API for the
    collection of data in the handoff manager rather than a dict?
    
    Footnotes
    ----------
    
    1: That is, if the handoff sender process itself was running the
    handoff then the vnode data would have to be copied from vnode heap to
    sender heap.
    
    2: In the future I think an iterator/cursor based approach to handoff
    that is async, interruptable, and rate limited would be good.
    
    3: Which calls `riak_core_status:all_active_transfers` where the RPC
    is done.
Commits on Apr 13, 2012
  1. Merge branch '1.1'

    jtuple committed Apr 13, 2012
    Conflicts:
    	rebar.config
  2. Merge pull request #168 from basho/jdm-raise-handoff-to-two

    Jared Morrow
    Jared Morrow committed Apr 13, 2012
    Raise handoff concurrency to two
Commits on Apr 6, 2012
  1. Roll riak_core version 1.1.2

    Jared Morrow
    Jared Morrow committed Apr 6, 2012
  2. Update dep on riak_sysmon to get 1.1.2

    Jared Morrow
    Jared Morrow committed Apr 6, 2012
  3. Correct env var name

    russelldb authored and jonmeredith committed Mar 21, 2012
    Fix typo in timeout var name
  4. Add timeout to all handoff sender's receives

    russelldb authored and jonmeredith committed Mar 16, 2012
    Don't bother sending the final 'sync' message if handoff failed
Commits on Apr 5, 2012
Commits on Apr 4, 2012
  1. Change vnode manager to periodically start never-before-started vnodes

    jtuple committed Apr 4, 2012
    Ensures vnodes responsible for fallback data are eventually started.
    Resolves issue #154
Commits on Apr 2, 2012
Commits on Mar 22, 2012
  1. Merge pull request #153 from basho/gh153_rdb_hoff_timeout

    russelldb committed Mar 22, 2012
    Intermittent hang with handoff sender
Commits on Mar 21, 2012
  1. Correct env var name

    russelldb committed Mar 21, 2012
    Fix typo in timeout var name
Commits on Mar 16, 2012
  1. Add timeout to all handoff sender's receives

    russelldb committed Mar 16, 2012
    Don't bother sending the final 'sync' message if handoff failed
Commits on Mar 1, 2012
  1. Merge branch '1.1'

    Jared Morrow
    Jared Morrow committed Mar 1, 2012
Commits on Feb 27, 2012
  1. Roll version riak_core 1.1.1

    Jared Morrow
    Jared Morrow committed Feb 27, 2012
Commits on Feb 25, 2012
  1. Merge pull request #145 from basho/gh144-rolling-upgrade-mapred

    jtuple committed Feb 25, 2012
    Fix map_reduce during rolling upgrade to 1.1.
Commits on Feb 24, 2012
  1. Fix map_reduce during rolling upgrade to 1.1.

    jtuple committed Feb 24, 2012
    Resolve issue #144.
    
    Change riak_core_vnode_master to still handle return_vnode messages
    given that pre-1.1 nodes may still send those messages to a 1.1 node.
    
    Add legacy routing logic to riak_core_vnode_master:command_return_vnode
    in order to properly send messages to pre-1.0 nodes that do not have
    vnode proxy processes.
  2. Merge branch '1.1'

    Vagabond committed Feb 24, 2012
Commits on Feb 20, 2012
  1. Merge branch '1.1'

    Jared Morrow
    Jared Morrow committed Feb 20, 2012
  2. Fix dependency versions for release

    Jared Morrow
    Jared Morrow committed Feb 20, 2012