Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Commits on Apr 25, 2012
  1. @rzezeski

    Add visibility into transfers/handoff

    rzezeski authored
    Visibility into handoff is really poor.  The typical method used to
    discover handoff information is `riak-admin transfers` but that gives
    hardly any useful information, as shown below.
        'dev3@' waiting to handoff 7 partitions
        'dev2@' waiting to handoff 4 partitions
        'dev1@' waiting to handoff 5 partitions
    This PR adds visibility transfers/handoff by tracking various stats on
    active transfers and displaying this information in a human friendly
    way, as shown below.
        ./dev/dev1/bin/riak-admin transfers
        'dev3@' waiting to handoff 6 partitions
        'dev2@' waiting to handoff 4 partitions
        'dev1@' waiting to handoff 6 partitions
        Active Transfers:
        transfer type: ownership_handoff
        vnode type: riak_kv_vnode
        partition: 365375409332725729550921208179070754913983135744
        started: 2012-04-24 18:43:44 [5.96 s ago]
        last update: 2012-04-24 18:43:48 [1.91 s ago]
        objects transferred: 8651
                               2135 Objs/s
          dev3@ =======================>   dev1@
                                17.62 MB/s
    This PR also gets rid of the annoying side effect of resetting the
    inactivity timeout when calling `riak-admin transfers`.  This would
    often cause users to wonder why handoffs were never occurring.
    Implementation Details
    One issue with handoff is that it uses vnode folds to do all it's
    work.  This has the one nice benefit that it avoids a local copy of
    data (1) but has bad side effect of using uninterruptable fold.  That
    is, the vnode fold does the work as fast as it can and doesn't stop
    until it's done (2).
    In order to get status updates about the handoff the accumulator keeps
    some local stats and _approximately_ every 2 seconds sends those stats
    to the handoff manager via the `status_update/2` API.  I say the
    timing is approximate because expiration of the interval is only
    checked during a sender/receiver sync phase (determined by
    `ACK_COUNT`).  If the receiver can't keep up or the sender fold is
    slow then the status updates could take longer.  Essentially, this
    code assumes that `ACK_COUNT` objects can be transferred in less than
    2s.  **N.B.** The duration of the status update interval will not
    invalidate the stats since they are based on start time and time of
    last sync (see `riak_core_handoff_manager:update_stats/2`).
    The reason the sender only sends a status update every 2s and only
    checks if this interval has expired on sender/receiver sync is because
    the vnode fold is a tight loop.  Sending an update for every object
    would be too chatty and checking the interval every object could
    potentially slow from overhead of getting time and doing math.
    There are two types of transfer currently, _ownership handoff_ and
    _hinted handoff_.  Soon there will be another type, _repair_.  In
    order to disambiguate the two types of handoff I have to determine if
    the source vnode is primary or not.  In the case of ownership handoff
    it is a `primary -> secondary` handoff (where the secondary becomes
    primary after handoff completes) and for hinted handoff it's
    `secondary -> primary`.
    In order to make the stats a little easier to read I added a little
    human friendly formatting.  I decided to put the code to support this
    in Core rather than KV.  I stole and modified the code from
    One aspect of this PR I'm not wild about is the fact that in order to
    get the status a msg must be sent to each handoff manager on each each
    node for every time `riak-admin transfers` is called (3).  I'd rather
    see a push system where all active status data is collated at a
    particular node, like the claimant node in ownership, and the status
    call simply reads that.  The pull system is probably fine for now but
    could cause trouble on larger clusters, especially if some script
    accidentally calls it in a tight loop.
    I'm wondering if I should have make use of the stats API for the
    collection of data in the handoff manager rather than a dict?
    1: That is, if the handoff sender process itself was running the
    handoff then the vnode data would have to be copied from vnode heap to
    sender heap.
    2: In the future I think an iterator/cursor based approach to handoff
    that is async, interruptable, and rate limited would be good.
    3: Which calls `riak_core_status:all_active_transfers` where the RPC
    is done.
Something went wrong with that request. Please try again.