Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Branch: jdb-handoff-ip
Commits on May 22, 2012
  1. @jtuple

    Clean-up handoff_ip code provided by micmac in pull request #176

    jtuple authored
    Fix potential badmatch from re-using variable TNHandoffIP.
    Change get_handoff_ip to use standard try/catch block, and default to
    using the normal hostname if unable to resolve the handoff IP.
  2. @jtuple
Commits on May 18, 2012
  1. @micmac

    Working handoff_ip= fix

    micmac authored
  2. @micmac
Commits on May 17, 2012
  1. @micmac
Commits on Apr 29, 2012
  1. Merge pull request #172 from basho/edocs-readme-add

    Mark Phillips authored
    adding pointer to core edocs to README
Commits on Apr 28, 2012
  1. adding pointer to core edocs to readme

    Mark Phillips authored
Commits on Apr 25, 2012
  1. @rzezeski
  2. @rzezeski

    Add visibility into transfers/handoff

    rzezeski authored
    Visibility into handoff is really poor.  The typical method used to
    discover handoff information is `riak-admin transfers` but that gives
    hardly any useful information, as shown below.
        'dev3@' waiting to handoff 7 partitions
        'dev2@' waiting to handoff 4 partitions
        'dev1@' waiting to handoff 5 partitions
    This PR adds visibility transfers/handoff by tracking various stats on
    active transfers and displaying this information in a human friendly
    way, as shown below.
        ./dev/dev1/bin/riak-admin transfers
        'dev3@' waiting to handoff 6 partitions
        'dev2@' waiting to handoff 4 partitions
        'dev1@' waiting to handoff 6 partitions
        Active Transfers:
        transfer type: ownership_handoff
        vnode type: riak_kv_vnode
        partition: 365375409332725729550921208179070754913983135744
        started: 2012-04-24 18:43:44 [5.96 s ago]
        last update: 2012-04-24 18:43:48 [1.91 s ago]
        objects transferred: 8651
                               2135 Objs/s
          dev3@ =======================>   dev1@
                                17.62 MB/s
    This PR also gets rid of the annoying side effect of resetting the
    inactivity timeout when calling `riak-admin transfers`.  This would
    often cause users to wonder why handoffs were never occurring.
    Implementation Details
    One issue with handoff is that it uses vnode folds to do all it's
    work.  This has the one nice benefit that it avoids a local copy of
    data (1) but has bad side effect of using uninterruptable fold.  That
    is, the vnode fold does the work as fast as it can and doesn't stop
    until it's done (2).
    In order to get status updates about the handoff the accumulator keeps
    some local stats and _approximately_ every 2 seconds sends those stats
    to the handoff manager via the `status_update/2` API.  I say the
    timing is approximate because expiration of the interval is only
    checked during a sender/receiver sync phase (determined by
    `ACK_COUNT`).  If the receiver can't keep up or the sender fold is
    slow then the status updates could take longer.  Essentially, this
    code assumes that `ACK_COUNT` objects can be transferred in less than
    2s.  **N.B.** The duration of the status update interval will not
    invalidate the stats since they are based on start time and time of
    last sync (see `riak_core_handoff_manager:update_stats/2`).
    The reason the sender only sends a status update every 2s and only
    checks if this interval has expired on sender/receiver sync is because
    the vnode fold is a tight loop.  Sending an update for every object
    would be too chatty and checking the interval every object could
    potentially slow from overhead of getting time and doing math.
    There are two types of transfer currently, _ownership handoff_ and
    _hinted handoff_.  Soon there will be another type, _repair_.  In
    order to disambiguate the two types of handoff I have to determine if
    the source vnode is primary or not.  In the case of ownership handoff
    it is a `primary -> secondary` handoff (where the secondary becomes
    primary after handoff completes) and for hinted handoff it's
    `secondary -> primary`.
    In order to make the stats a little easier to read I added a little
    human friendly formatting.  I decided to put the code to support this
    in Core rather than KV.  I stole and modified the code from
    One aspect of this PR I'm not wild about is the fact that in order to
    get the status a msg must be sent to each handoff manager on each each
    node for every time `riak-admin transfers` is called (3).  I'd rather
    see a push system where all active status data is collated at a
    particular node, like the claimant node in ownership, and the status
    call simply reads that.  The pull system is probably fine for now but
    could cause trouble on larger clusters, especially if some script
    accidentally calls it in a tight loop.
    I'm wondering if I should have make use of the stats API for the
    collection of data in the handoff manager rather than a dict?
    1: That is, if the handoff sender process itself was running the
    handoff then the vnode data would have to be copied from vnode heap to
    sender heap.
    2: In the future I think an iterator/cursor based approach to handoff
    that is async, interruptable, and rate limited would be good.
    3: Which calls `riak_core_status:all_active_transfers` where the RPC
    is done.
Commits on Apr 13, 2012
  1. @jtuple

    Merge branch '1.1'

    jtuple authored
  2. @jaredmorrow

    Merge pull request #168 from basho/jdm-raise-handoff-to-two

    jaredmorrow authored
    Raise handoff concurrency to two
  3. @jonmeredith
Commits on Apr 6, 2012
  1. @jaredmorrow
  2. @jaredmorrow
  3. @jonmeredith
  4. @russelldb @jonmeredith

    Correct env var name

    russelldb authored jonmeredith committed
    Fix typo in timeout var name
  5. @russelldb @jonmeredith

    Add timeout to all handoff sender's receives

    russelldb authored jonmeredith committed
    Don't bother sending the final 'sync' message if handoff failed
  6. @jtuple
  7. @jtuple
Commits on Apr 5, 2012
  1. @jtuple
Commits on Apr 4, 2012
  1. @jtuple
  2. @jtuple

    Change vnode manager to periodically start never-before-started vnodes

    jtuple authored
    Ensures vnodes responsible for fallback data are eventually started.
    Resolves issue #154
Commits on Apr 2, 2012
  1. @jtuple
  2. @jtuple
Commits on Mar 22, 2012
  1. @russelldb

    Merge pull request #153 from basho/gh153_rdb_hoff_timeout

    russelldb authored
    Intermittent hang with handoff sender
Commits on Mar 21, 2012
  1. @russelldb

    Correct env var name

    russelldb authored
    Fix typo in timeout var name
Commits on Mar 16, 2012
  1. @russelldb

    Add timeout to all handoff sender's receives

    russelldb authored
    Don't bother sending the final 'sync' message if handoff failed
Commits on Mar 1, 2012
  1. @jaredmorrow

    Merge branch '1.1'

    jaredmorrow authored
Commits on Feb 27, 2012
  1. @jaredmorrow
Commits on Feb 25, 2012
  1. @jtuple

    Merge pull request #145 from basho/gh144-rolling-upgrade-mapred

    jtuple authored
    Fix map_reduce during rolling upgrade to 1.1.
Commits on Feb 24, 2012
  1. @jtuple

    Fix map_reduce during rolling upgrade to 1.1.

    jtuple authored
    Resolve issue #144.
    Change riak_core_vnode_master to still handle return_vnode messages
    given that pre-1.1 nodes may still send those messages to a 1.1 node.
    Add legacy routing logic to riak_core_vnode_master:command_return_vnode
    in order to properly send messages to pre-1.0 nodes that do not have
    vnode proxy processes.
  2. @Vagabond

    Merge branch '1.1'

    Vagabond authored
  3. @Vagabond
Commits on Feb 20, 2012
  1. @jaredmorrow

    Merge branch '1.1'

    jaredmorrow authored
  2. @jaredmorrow
Something went wrong with that request. Please try again.