Permalink
Switch branches/tags
Commits on Oct 14, 2013
  1. At this point in riak's history disterl is our primary bottleneck.

    evanmcc committed with engelsanchez Oct 4, 2013
    Something about the way that it is framing messages is causing its
    tcp connection to size its send and recv buffers much smaller than
    is optimal.
    
    This change adds code to automatically set dist port sizes on nodeup,
    as well as code to manually set all disterl connections to specific
    sizes, meant to be used from the console for testing and tuning.
    
    Small refactorings:
    
    * Store the send & recv buffer sizes in #state so that nodeup events
    that arrive after a set_dist_buf_sizes() call will use the specified
    sizes rather than OTP app environment var defaults.
    
    * Create get_riak_env_vars/0 helper function and export it for debugging use
    After refactoring further, it's only used once internally, so if YMMV,
    I'm fine with inlining this function's code and removing the export.
    
    * Add explicit buffer size args to set_port_buffers().
    
    Variable renaming to match inet:setopts() option names
    
    - remove redundant supervisor.
    
    allow feature to be disabled
Commits on Aug 23, 2013
  1. Roll riak_core version 1.4.2

    Jared Morrow committed Aug 23, 2013
  2. Roll webmachine dep to 1.10.4

    Jared Morrow committed Aug 23, 2013
Commits on Aug 20, 2013
  1. Merge pull request #356 from basho/eas-folsom-stat-error-protection

    engelsanchez committed Aug 20, 2013
    Add protection against folsom stat errors
Commits on Aug 19, 2013
  1. Merge pull request #359 from basho/pevm-drop-bad-data

    evanmcc committed Aug 19, 2013
    Corruption filtering changes for core.
  2. Add protection against folsom stat errors

    engelsanchez committed Aug 14, 2013
    Folsom may sometimes return an error tuple if something goes wrong (see
    folsom_ets.erl), but our code was only catching exceptions. So the error
    would end up being used as a valid value and crash the riak_kv_stat
    process later. This fixes that problem and gives us better protection
    from folsom funkiness.
Commits on Aug 17, 2013
Commits on Aug 1, 2013
  1. Roll version 1.4.1

    rzezeski committed Aug 1, 2013
Commits on Jul 31, 2013
  1. Merge pull request #351 from basho/gh350-vnodeq-stats

    russelldb committed Jul 31, 2013
    Fix catch pattern to match all errors
Commits on Jul 30, 2013
  1. Merge pull request #352 from basho/jdm-tcp-mon-add-dist-fix

    jonmeredith committed Jul 30, 2013
    Fix TCP mon to correctly spot nodes coming up.
  2. Fix TCP mon to correctly spot nodes coming up.

    jonmeredith committed Jul 30, 2013
    Corrected add_dist_conn argument order on nodeup event.
Commits on Jul 29, 2013
Commits on Jul 9, 2013
  1. Fix two major vnode manager bugs

    jtuple committed Jul 9, 2013
    First, fix a bug that enabled a race condition wherein the vnode
    manager could start the same vnode multiple times. This would result
    in both vnode instances trying to acquire the same backend, which
    would fail and force the Riak node to shutdown.
    
    The cause of this bug was a change introduced during the large
    ring optimization work for Riak 1.4. In this work, an unbounded
    `ets:match_delete` that resulted in a table scan was changed to
    a straightforward `ets:delete`. Unfortunately, the `ets:delete`
    could delete data associated with a newer instance of a given vnode
    in cases where a monitor for a prior instance fired after the new
    instance was created.  This bug was fixed by switching to a bounded
    `ets:match_delete` that avoids the table scan while also avoiding
    unintended deletes.
    
    Second, fix a bug introduced during the parallel vnode initialization
    work from Riak 1.3.1 that caused the vnode manager to newly monitor a
    given vnode each time get_vnode_pid was called. This bug could result
    in an unbounded number of monitors being created in certain scenarios,
    causing a node to become slower over time until it was restarted.
Commits on Jul 1, 2013
  1. Merge pull request #345 from basho/jrw-incrvsn-resize-replace

    jrwest committed Jul 1, 2013
    Incrememnt Ring Version when Force-Replacing during Resize
Commits on Jun 28, 2013
  1. Incrememnt Ring Version when Force-Replacing during Resize

    jrwest committed Jun 28, 2013
    Because the claimant runs in a different "mode" the ring version may
    not be incremented otherwise causing reconcilation during gossip to
    fail. Seen in the wild and recreated periodically during riak_test
Commits on Jun 26, 2013
  1. Roll riak_core version 1.4.0

    Jared Morrow committed Jun 26, 2013
Commits on Jun 24, 2013
  1. Merge pull request #331 from basho/jrw-resize-foh-fix

    jrwest committed Jun 24, 2013
    fix forced_ownership_handoff during resize
Commits on Jun 21, 2013
  1. Merge pull request #336 from basho/gh335-reshed-stats

    russelldb committed Jun 21, 2013
    Fix crashing stat mod never getting rescheduled
Commits on Jun 19, 2013
  1. only silently drop DOWN-normal messages in deleted modstate

    Bryan Fink committed Jun 19, 2013
    This is a restriction of the modification made in PR #334.
    
    Dropping all {'DOWN',_,process,_,normal} messages on the floor instead
    of passing them to vnode handle_info functions causes riak_pipe vnodes
    to missing messages that it uses to cleanup workers for pipes that
    shutdown unexpectedly.
    
    This commit restricts the DOWN-normal message dropping to the case
    that the vnode's modstate is {deleted, _}. PR #334 suggests the original
    modification was made only to quiet the log spam generated by the
    following clause, which also only operates in modstate-deleted.
    
    Before this commit, the riak_test pipe_verify_exceptions would fail
    during its verify_middle_fitting_normal test, because workers would be
    left running after the fitting exited 'normal'. After this commit,
    workers are once again terminated correctly, so the test passes again.
Commits on Jun 17, 2013
  1. Merge pull request #339 from basho/eas-fix-partition-repair-not-sent-fun

    engelsanchez committed Jun 17, 2013
    Fix repair handoff crash, missing not sent fun
Commits on Jun 15, 2013
Commits on Jun 14, 2013
Commits on Jun 13, 2013
  1. Merge pull request #334 from basho/slf-no-log-spam-on-normal-shutdown

    slfritchie committed Jun 13, 2013
    Reporting 'normal' events is spammy, don't do it
Commits on Jun 12, 2013
  1. Fix crashing stat mod never getting rescheduled

    russelldb committed Jun 12, 2013
    1.3.1 updated the cache to fetch stats in the background rather than
    on demand. A new bug was added. If the stat mod crashes during
    production of stats, it is never rescheduled.
    
    Fix by rescheduling when crash is detected. Exponentially backoff
    the schedule after an error so as not to spam the log.
Commits on Jun 7, 2013
Commits on Jun 4, 2013
  1. Merge pull request #332 from basho/pevm-timeout-guard

    evanmcc committed Jun 4, 2013
    update bad value protection for timer value
  2. remove superfluous case

    evanmcc committed Jun 4, 2013
Commits on Jun 3, 2013
  1. fix forced_ownership_handoff during resize

    jrwest committed Jun 3, 2013
    All resize operations remain in the ring's list of pending
    changes until all complete. Prior to this change transfers would
    only be triggered for the first forced_ownership_handoff operations.
    Subsequent operations would only be triggered by vnode *inactivity*.
    
    This commit modifies the use of forced_ownership_handoff during resize
    to ensure that only resize operations that are still pending are in
    the throttled transfer list.
Commits on May 30, 2013
  1. Merge pull request #330 from basho/jrw-infinity-timeout-fix

    jrwest committed May 30, 2013
    dont start coverage timeout timer if timeout is infinite