Skip to content
Commits on Sep 4, 2014
  1. @jaredmorrow

    Merge branch '2.0'

    jaredmorrow committed
Commits on Jul 2, 2014
  1. @andrewjstone
  2. @andrewjstone

    Demonitor on success in proxy_spawn/1

    andrewjstone committed
    When we spawn a proxy process to run a function and if it successfully
    returns we need to demonitor it so that we don't receive errant 'DOWN'
    messages.
Commits on Jul 1, 2014
  1. @andrewjstone

    Use a proxy process when joining/removing SC nodes

    andrewjstone committed
    Centralize the proxy process implementation in riak_core_util and
    utilize that in riak_core_claimant:bootsrap_members/1.
    
    We use a proxy process for riak_ensemble_manager:{join/remove} to handle
    any errors resulting from the riak_ensemble_manager crashing, and
    to prevent late responses from getting sent to the claimant in the case
    that it already got a timeout for the given operation.
    
    Also update dialyzer.ignore-warnings.
Commits on Jun 20, 2014
  1. @andrewjstone

    Stub out fakemod:handle_overload_info/2 for test

    andrewjstone committed
    In riak_core_vnode_proxy:overload_test_/0 we use fakemod as a vnode
    module because we don't expect any callbacks from the proxy, even in
    overload. This prior commit however, ensures that the vnode will get
    called back with any messages that aren't handled directly via the
    proxy. Previously those messages were dropped on the floor.
Commits on Jun 19, 2014
  1. @andrewjstone
Commits on Jun 13, 2014
  1. @borshop

    Merge pull request #603 from basho/bugfix/reip-update-claimant

    borshop committed
    Bugfix/reip update claimant
    
    Reviewed-by: reiddraper
Commits on Jun 12, 2014
  1. @engelsanchez

    Add pmap stall unit test

    engelsanchez committed
  2. @jcapricebasho

    Convert members and seen in rename_node from list to orddict, fixing …

    jcapricebasho committed
    …broken orddict:find calls after a reip
Commits on Jun 11, 2014
  1. @jtuple

    Fix riak_core_util:pmap/2 infinite stall

    jtuple committed
    This commit changes riak_core_util:pmap/2 to use spawn_link rather
    than spawn to create the asynchronous processes. Without this change,
    pmap/2 can stall forever if any of these processes crashes -- eg. by
    using a map function that generates an error.
    
    This commit also fixes the function -spec as well as optimizes the
    final stage of the pmap by using a list comprehension rather than
    lists:unzip (this is faster and generates less garbage).
  2. @jonmeredith

    Update rename_node to also set claimant field.

    jonmeredith committed
    If reip is being used to restore a dead cluster and the previous
    claimant node is renamed first then the ring never converges (see below).
    This change modifies rename_node to also check the claimant field
    and update as necessary.  The ring manager / claimant could also
    be hardened to verify the claimant is a valid node in the cluster,
    but I suspect that will be worked with riak_ensemble in the near future.
    
    Example with previous claimant dev1 renamed to dev4
    
    red:dev john$ dev1/bin/riak-admin ring-status
    ================================== Claimant ===================================
    Claimant:  'dev1@127.0.0.1'
    Status:    down
    Ring Ready: unknown
    
    ============================== Ownership Handoff ==============================
    No pending changes.
    
    ============================== Unreachable Nodes ==============================
    The following nodes are unreachable: ['dev1@127.0.0.1','dev2@127.0.0.1',
                                          'dev3@127.0.0.1']
    
    WARNING: The cluster state will not converge until all nodes
    are up. Once the above nodes come back online, convergence
    will continue. If the outages are long-term or permanent, you
    can either mark the nodes as down (riak-admin down NODE) or
    forcibly remove the nodes from the cluster (riak-admin
    force-remove NODE) to allow the remaining nodes to settle.
    red:dev john$ dev1/bin/riak-admin down dev3@127.0.0.1
    Failed: "dev3@127.0.0.1" is not a member of the cluster.
Commits on Jun 5, 2014
  1. @jtuple

    Add logic to automatically enable consensus system

    jtuple committed
    Currently, in addition to enabling consensus in app.config, a user must
    also manually call 'riak_ensemble_manager:enable()' from one and only
    one node in a cluster to activate the consensus sub-system. This is
    necessary to ensure that there is only a single logical root ensemble
    history -- all other nodes adopt the history from the single enabled
    node.
    
    However, this step is not only annoying but also error-prone. Enabling
    consensus on multiple nodes can break the consensus system, requiring
    manual intervention.
    
    This commit addresses this problem by making riak_core automatically
    enable the consensus system in a safe way. This is accomplished by
    having the claimant node enable the consensus system. To avoid the
    issue where the claimant in multiple 1-node clusters enables consensus
    before being joined, this commit requires the cluster to have at least
    three nodes before the claimant will enable the consensus system.
    
    To prevent a race during claimant changes, a claimant must first write
    a special ring metadata value that prevents future claimants from
    activating the consensus system. It is not until after the ring has
    converged cluster wide, and the claimant sees the appropriate metadata
    value, that the claimant activates the consensus system.
    
    Resolves #571
Commits on Jun 4, 2014
  1. @reiddraper

    Add spawn directive to tcp_mon nodeupdown_test

    reiddraper committed
    See added comment for more details. This may _not_ fix the occasional
    failing test, but is an attempt. The related issue is #596.
  2. @borshop

    Merge pull request #578 from basho/ajs/ensemble_remove_node

    borshop committed
    Ensure ensembles reconfigure before nodes exit
    
    Reviewed-by: andrewjstone
  3. @jtuple
Commits on Jun 3, 2014
  1. @borshop

    Merge pull request #592 from basho/bugfix/jrw/590-bucket-fixup-test-c…

    borshop committed
    …leanup
    
    change bucket_fixup_test:fixup_test_/0 to wait for ring manager death
    
    Reviewed-by: reiddraper
  2. @borshop

    Merge pull request #600 from basho/bugfix/jrw/593

    borshop committed
    attempt to isolate hashtree tests more by using a reference
    
    Reviewed-by: reiddraper
  3. @reiddraper

    Merge pull request #598 from basho/bugfix/jra/rex

    reiddraper committed
    Make riak_core_util:safe_rpc catch exit correctly
  4. @jrwest
Commits on May 30, 2014
  1. @bowrocker

    Make safe_rpc catch exit correctly.

    bowrocker committed
    - change 'EXIT':{noproc, ...} to exit:{noproc, ...}
Commits on May 29, 2014
  1. @borshop

    Merge pull request #543 from basho/feature/sdc/vnode-callbacks

    borshop committed
    Add callback annotations to riak_core_vnode
    
    Reviewed-by: kellymclaughlin
  2. @borshop

    Merge pull request #591 from basho/jrd-sources-display

    borshop committed
    Fix minor display bug with security sources
    
    Reviewed-by: lordnull
  3. @borshop

    Merge pull request #581 from basho/bugfix/sdc/bucket-set-spec-incomplete

    borshop committed
    Correct return type information on riak_core_bucket:set_bucket/2
    
    Reviewed-by: reiddraper
Commits on May 28, 2014
  1. @lordnull

    Merge pull request #586 from basho/bugfix/mw/safer_rpc

    lordnull committed
    Made many rpc:call/4,5 calls safer if rex is down.
Commits on May 27, 2014
  1. @jrwest

    change bucket_fixup_test:fixup_test_/0 to wait for ring manager death

    jrwest committed
    riak_core_ring_manager:stop/0 is async and races w/ the start of load_test_/0. see #590
Commits on May 26, 2014
  1. @macintux

    Also fix typo in comment

    macintux committed
  2. @macintux
Commits on May 23, 2014
  1. @kellymclaughlin
  2. @kellymclaughlin

    Extend hashtree eqc test timeouts

    kellymclaughlin committed
    The sha_test_ and eqc_test_ test suites in the hashtree module have
    had trouble with slowness and timeouts on slower test machines.  Use
    spawn so each test runs in its own process and extend the timeouts to
    120 seconds to avoid these spurious failures.
Commits on May 22, 2014
  1. @reiddraper

    Fix hashtree:sha_test_ from timing out

    reiddraper committed
    EQC takes a long time, and uses gigabytes of memory to generate 1MB
    random binaries, as evidenced by:
    
        eqc_gen:sample(eqc_gen:binary(1024 * 1024)).
    
    We're still able to write a solid test, by ensuring that we always chunk
    the binary into between 1 and 16 chunks. So instead of generating the
    chunk size, we now generate the number of chunks, and derive chunk size
    from that. This change allowed us to remove one level of ?FORALL
    nesting, as well.
  2. @lordnull

    Made many rpc:call/4,5 calls safer if rex is down.

    lordnull committed
    Created a couple of utility functions to handle do the wrapping. Most
    rpc:call/4,5 that where not wrapped in a try/catch have been changed to
    use the utility. Those that weren't appeared to want a crash due to
    context (such as explictly matching for only the success value).
Commits on May 19, 2014
  1. @seancribbs
Commits on Apr 22, 2014
  1. @reiddraper

    Update core:security:bucket() spec

    reiddraper committed
    Buckets are binaries, not strings. See basho/riak_kv#920 for more
    detail. This was discovered by the way kv was calling
    riak_core_security:check_permission/2.
Commits on Apr 21, 2014
  1. @andrewjstone

    Add ready_to_exit check to ensure_vnodes_started

    andrewjstone committed
    Ensure all vnode modules that implement the ready_to_exit/0 callback
    return true before transitioning vnodes from leaving to exiting state.
Commits on Apr 18, 2014
  1. @andrewjstone
Something went wrong with that request. Please try again.