Permalink
Commits on May 28, 2013
  1. add ring resizing impl. doc

    jrwest committed Apr 15, 2013
  2. cleanup code around ring resizing

    riak_core_claimant:
      * removes unneeded clauses in filter_changes_pred/4
      * add comments in several places to clarify behaviour
    riak_core_ring:
      * add comments and typespecs, fix typos
      * refactor reschedule_resize_transfers/3
      * refactor resize_transfer_status/4
      * simplify future_index/4 a bit
      * bring back determine_handoff_target in vnode mgr
    riak_core_vnode:
      * fix forward typespec in state record
      * refactor small similarity btwn active & forward_or_vnode_command
      * pass forwarding type to vnode_forward for easier tracing
    jrwest committed May 24, 2013
  3. prevent passing of commands to vnode mod after post-resize delete

    unlike when vnodes are marked as deleted during normal ownership transfer,
    the vnodes that are deleted during the cleanup process do not forward. As
    a result, they may receive messages between the period they unregister and when
    they are shutdown.
    jrwest committed May 13, 2013
  4. stop-gap prevention of ring resizing for unsupported applications

    * see comment in riak_core_claimant on why this is a hack
    * search will not, intially, be supported
    * control may break as ring status evolves to provide more observability during
      the operation. failures would be transient but its easier to require disabling
      it then requiring it not be accessed during the operation. In the future, control
      will be aware of the operation and this can be dropped altogether
    jrwest committed May 3, 2013
  5. remove unneeded change to riak_core_ring:all_next_owners

    its implementation was a bit misleading with the changes and was only
    needed to support one part of the codebase (starting extra vnode proxies
    when ring expands)
    jrwest committed May 1, 2013
  6. improve performance and fix several bugs in future index calculation

    * make future index a constant time calculation (it was O(RingSize * FutureRingSize) before)
    * for no possible reason the notsent acc was a linked list instead of a set
    * during shrinking the n-value for a preflist could be, depending on ownership assignment and
      order of transfers, implicitly grown (e.g. N=3 -> N=5) leaving behind unreachable data. This
      is because primaries involved in shrinking may transfer data out after receiving data in. In
      this case it is necessary to identify keys that are owned by the source partition in the *future*
      ring. This is done by detecting two conditions, the first is when the position of the source in
      the current preflist is greater than or equal to the size of the new ring (an impossible N value
      in the new ring) and an optional N-value threshold which solves a specific case when halving the ring
      where the first condition is not triggered until data has already been copied where it shouldn't be
    
    add eqc test for riak_core_ring:future_index
    jrwest committed May 1, 2013
  7. use object_info/1 to hash keys for filtering during resize

    using riak_core_util:chash_key was not flexible enough for other
    applications (e.g. riak_search). This makes object_info/1 a required
    callback for applications implementing dynamic ring.
    jrwest committed Apr 12, 2013
  8. honor forced_ownership_handoff for resize operations

    ensures that at most forced_ownership_handoff vnodes will be performing
    resize transfers accross the cluster at once
    
    also, the previous code was just wrong
    jrwest committed Apr 11, 2013
  9. support for when SHTF during resize

    allow ring resize operation to be aborted. The operation is staged like
    any other cluster op. The abort will only be staged/performed if the
    operation is in the process of performing resize transfers (resized ring
    has not been installed). Otherwise, there is nothing to abort, so
    the operation is ignored.
    
    Support force-replace during resize. Works like force-replace during
    typical cluster operations. Transfers are rescheduled on the replacing
    node under the assumption that the replacing node may have been rebuilt
    from a recent backup.
    
    force-remove is not supported because it may end up causing too
    many transfers to be rescheduled on the remaining nodes, overloading
    them. Instead force-remove is supported by performing an abort
    of the resize operation first.
    jrwest committed May 28, 2013
  10. ability to shrink ring

    not many changes were necessary with the exception that when the ring shrinks
    some indexes disappear, which needs to be handled specially. Note: this only works
    assuming NewRingSize < MaxN, where MaxN is the largest N value used on any bucket
    in the cluster. Also, reduce impact on gossip during resize using 'set_only'.
    jrwest committed Feb 15, 2013
  11. cleanup after resized ring

    After a ring resizes there are several cleanup operations that must
    be performed. In some cases a partition no longer exists (if the ring
    shrinks) or its owner has been moved to another node. In these cases the
    vnode must be unregistered and all its data deleted. In other cases, where
    the vnode does continue to exist and has not moved, some data must be removed
    but the vnode continues to run. This commit addresses the former.
    
    After completion, the resized ring will have deletions scheduled for the
    appropriate indexes. The deletions are scheduled like any other ring transition.
    Although this is considered part of the resize operation, at this time the
    future ring has been installed and the vnodes targetted for cleanup appear as fallbacks.
    After each deletion is complete, the vnode unregisters. After all deletions are
    complete the full resize operation is considered complete.
    
    Also move starting of extra vnode proxies to ring handler
    jrwest committed May 28, 2013
  12. add support for resizable ring

    Allows the ring to be resized as a cluster operation similar to join/leave.
    Expansion is only allowed, but the restriction is temprorary, and shrinking
    requires only small modifications.
    
    When the operation is staged/commited the claimant calulates a resized ring,
    running it through claim. It then schedules a "resize operation" for each
    partition, via the next list. Each partition is scheduled for the operation because
    during resize, each index potentially has data to send to others. Per partition,
    the resize operation consists of several "resize transfers", each of which must be
    completed to complete the entire operation. The first of these resize transfers is
    scheduled by the claimant. Subsequent transfers are scheduled by the vnode that
    owns the partition. When all of the transfers for a partition of have completed,
    the operation is marked as such in the next list. Once the operation has completed
    for *all* partitions the claimant installs the resized ring.
    
    The handoff subsystem has been modified to add a new transfer type: resize_transfer.
    In addition, the handoff sender now supports applying a function to keys that are
    not sent (due to the handoff filter). The function operates on the unsent key and
    a per-sender defined accumulator. Resize transfers are triggered like hinted and
    ownership handoff by the vnode manager and vnode. The primary difference between
    those transfers and a resize_transfer is that the latter includes a filter
    that only sends keys between the source and target partitions if, for both, the
    key hashes to the same position in the preflist. Resize transfers also include an
    unsent keys function and accumulator that determines which partitions the key
    would be destined for. The vnode uses this information to schedule its subsequent
    resize transfers. A partition which has not been transferred to yet and is in the
    returned list will be scheduled. No more resize transfers will be schedueld when
    the returned list contains no partitions or only partitions that have already been
    scheduled.
    
    One major difference between typical transfers and a resize operation is the partition
    does not delete its data (or unregister) after completing. For some partitions, they will
    remain running but a portion of their data should be removed. Other partitions will no
    longer live on their original owner should have all of their data deleted. This is
    not addressed in this commit.
    
    During resizing, how vnodes forward has been modified, as well. Unlike regular handoff
    forwarding is performed only explicitly (when the vnode returns {forward,...} from
    handle_handoff_command). The difference is that since the operation consists of multiple
    transfers, and will support the ability to abort the operation, all requests are handled
    locally and may be forwarded explicitly if they have a defined "request hash". To determine
    the hash of the request, a new vnode callback has been added: request_hash. This callback is
    optional, unless the application supports ring resizing. The callback can also return undefined
    instead of a hash. In which case, the message will be always be handled locally only. Special care
    must be taken to handle messages on a vnode forwarded from another, because primaries take part
    in forwarding during resize. Coverage requests are not forwarded during a resize.
    jrwest committed Feb 15, 2013
Commits on Apr 8, 2013
  1. Merge pull request #296 from basho/1.3_to_master

    1.3 to master
    engelsanchez committed Apr 8, 2013
Commits on Apr 5, 2013
  1. Merge branch '1.3' into 1.3_to_master

    Conflicts:
    	rebar.config
    	src/riak_core_ring_handler.erl
    	src/riak_core_util.erl
    	src/riak_core_vnode.erl
    	src/riak_core_vnode_manager.erl
    engelsanchez committed Apr 5, 2013
  2. Merge pull request #291 from basho/dip_ssl

    SSL support
    Vagabond committed Apr 5, 2013
  3. Implement SSL support for riak_core_connection and riak_core_service_mgr

    This is a port of the SSL implementation from Riak's MDC implementation.
    The app.config arguments are the same, only now they're under riak_core.
    
    SSL is negotiated right after capabilities are exchanged, so minimal
    information is sent 'in the clear'. If one side requests SSL and the
    other side does not have it enabled, SSL is not allowed to connect.
    Dave Parfitt committed with Vagabond Mar 21, 2013
Commits on Apr 3, 2013
  1. Merge pull request #292 from basho/dip_pin_ranch

    use custom Ranch build to support R14B03|4
    Dave Parfitt committed Apr 3, 2013
  2. use custom Ranch build to support R14B03|4

    Dave Parfitt committed Apr 3, 2013
Commits on Apr 1, 2013
  1. Merge pull request #289 from basho/dip_typos

    fixed typos found by @DeadZen
    Dave Parfitt committed Apr 1, 2013
  2. fixed typos found by @DeadZen

    Dave Parfitt committed Apr 1, 2013
Commits on Mar 22, 2013
  1. Roll riak_core version 1.3.1

    Jared Morrow committed Mar 22, 2013
  2. Merge pull request #288 from basho/kv508-stats-warn

    Failure to calculate a stats value should be temporary so warn only
    russelldb committed Mar 22, 2013
Commits on Mar 21, 2013
  1. Make stats more robust in the face of failure

    Use pid() for timer call so that a crashing stat cache does
    not end up with multiple timers for the same stat mods
    
    In some cases underlying ets tables for stats go away. When this
    happens the effected stats break and stay broken.
    
    When a stat is broken the stat calculation throws an error. For
    the sake of robustness this commit wraps stat calculation in a try
    catch, and returns the atom `unavailable` if a stat cannot be
    calculated. Broken stats are expected to be detected and repaired
    when they are updated.
    
    Rather than calculate stats on demand when stale, backfill the cache
    
    Always serve the stats that are in the cache, no matter how old they are.
    Add a timestamp to the stats so consumers know how stale they are.
    Fill the cache continuously in the background.
    russelldb committed Mar 21, 2013
  2. Merge pull request #281 from basho/eas-parallel-vnode-init-backport

    Porting parallel vnode init fix to 1.3 + revert switch
    engelsanchez committed Mar 21, 2013
Commits on Mar 20, 2013
  1. Merge pull request #284 from basho/dip_conn_mgr

    initial add of the Riak Core Connection Manager
    
    We'll be circling back to fix the Ranch incompatibilities with R14B03|4 soon.
    Dave Parfitt committed Mar 20, 2013
Commits on Mar 19, 2013
Commits on Mar 18, 2013
  1. Fix backwards compatibility of start_vnode

    To allow parallel initialization of vnodes without changing the API in a
    non-backwards compatible way. To use it, the vnode module has to
    implement start_vnodes/1.
    engelsanchez committed Mar 18, 2013