Switch branches/tags
Commits on May 3, 2013
  1. Rewrite and extend the vnode overload protection logic

    jtuple committed Apr 25, 2013
    The primary change is the addition of a fast path method for adjusting
    the estimated mailbox size of a vnode. The vnode proxy now sends a ping
    to the vnode after a configurable number of messages.  If the vnode
    replies, the vnode proxy knows that all messages sent before the ping
    have been handled and therefore it can adjust the mailbox estimate
    accordingly. In the common case, this ping/pong approach keeps the
    mailbox estimate up-to-date without ever falling back to the more
    expensive process_info call.
    Minor changes:
    -- When a message is dropped, the `dropped_vnode_requests_total` counter
       is incremented.
    -- When a vnode exits, the proxy resets its overload state.
    -- Overload parameters are now configurable via application variables.
    The `vnode_overload_threshold` setting determines the overload threshold.
    If a vnode mailbox ever reaches this number, the corresponding vnode proxy
    starts dropping messages. When set to `undefined`, overload protection is
    disabled and the vnode proxy uses the older/more efficient code path.
    The `vnode_check_request_interval` setting sets the frequency of mailbox
    pings. After this number of messages have been forwarded by the proxy,
    the proxy will send a ping message.
    The `vnode_check_interval` setting sets the maximum number of messages
    that can be sent before the proxy will directly query the vnode mailbox
    size using process_info.
Commits on May 2, 2013
  1. Implement vnode overload protection

    Vagabond authored and jtuple committed Apr 12, 2013
Commits on Oct 17, 2012
  1. Merge pull request #232 from basho/core232-sysmon-memory-usage

    Jared Morrow
    Jared Morrow committed Oct 17, 2012
    Address high memory use by riak_core_sysmon_handler
Commits on Oct 1, 2012
  1. Address high memory usage by riak_core_sysmon_handler.

    kellymclaughlin committed Oct 1, 2012
    Fixes CORE232
    Change riak_core_sysmon_handler to use hibernation to free up
    resources when it is idle since it does not do a good job of freeing
    these resources on its own. Also force garbage collection on the
    riak_core_sysmon_handler process if it receives a large_heap message
    about itself. This is to avoid a feedback loop that can lead to severe
    memory usage by the handler, node slowness, and even cause the node to
    crash after consuming all available memory.
Commits on Sep 27, 2012
Commits on Sep 26, 2012
  1. Improve exception handling for synchronous calls in riak_core_vnode_p…

    kellymclaughlin committed Sep 26, 2012
    Fixes CORE231
    The call functions in riak_core_vnode_proxy only expected a result
    of {ok, Res}. This change calls a new function, call_reply, to
    better handle the case when an exception is thrown and wrap
    the reason in an error tuple.
Commits on Sep 25, 2012
Commits on Sep 21, 2012
  1. Call ring_trans synchronously, not in a spawn

    reiddraper committed Sep 21, 2012
    Calling `add_supported_to_ring` is not threadsafe.
    If a process retrieves the member_meta and then
    it's concurrently updated by another process,
    the original process' changed will be overwritten.
    To exhibit the original bug, I added a
    timer:sleep(crypto:rand_uniform(1, 1000))
    line inside the spawned fun that calls
    riak_core_ring_manager:ring_trans(F, ok)
Commits on Sep 13, 2012
  1. Merge pull request #223 from basho/jdb-timers

    Jared Morrow
    Jared Morrow committed Sep 13, 2012
    Change ticks from timer to more efficient erlang:send_after
  2. Merge pull request #224 from basho/adt-os-timestamp

    Jared Morrow
    Jared Morrow committed Sep 13, 2012
    erlang:now() -> os:timestamp() in all the places it is safe
Commits on Sep 8, 2012
  1. Update webmachine dep to bring in new mochiweb

    Jared Morrow authored and Vagabond committed Sep 8, 2012
  2. erlang:now() -> os:timestamp() in all the places it is safe

    Vagabond committed Sep 8, 2012
    There are a few places I didn't touch as it was unclear if the values
    needed to be monotonic or not. Specifically core_claimant, core_gossip,
    core_ring and core_ring_manager.
Commits on Sep 7, 2012
  1. Roll riak_core version 1.2.1

    Jared Morrow
    Jared Morrow committed Sep 7, 2012
Commits on Aug 29, 2012
  1. Include eunit header.

    kellymclaughlin committed Aug 27, 2012
  2. Add eunit test.

    kellymclaughlin committed Aug 27, 2012
Commits on Aug 16, 2012
  1. Take newer upstream poolboy.

    jonmeredith committed Aug 16, 2012
    @vagabond promised it works.
Commits on Aug 3, 2012
  1. Ensure legacy nodes are probed when new capabilities registered

    jtuple committed Aug 3, 2012
    The capability system caches prior probes of legacy app vars when dealing
    with legacy nodes. Prior to this commit, the logic was simple. If there
    were any cached results, no probes were performed. Unfortunately, this
    could lead to a race condition. If capabilities were probed before all
    applications (eg. riak_core, riak_kv) had started and registered their
    capabilities, the cache would only include some results, and no probes
    would be performed for the newly registered capabilities. This commit
    makes things more fine-grained, checking for cached results of individual
    This change does nothing for non-legacy nodes. All nodes that support
    the capability system natively already worked with delayed registration.
Commits on Jul 25, 2012
  1. Fix spurious "Forcing update of stalled ring"

    jtuple committed Jul 25, 2012
    Changed the force update logic in riak_core_claimant to not perform
    a forced update if we have pending staged joins and no auto-joining
    nodes. Forcing a ring update because of staged joins will not actually
    change the ring, because staged joins will not transition until
    committed. This was a false positive detection of a stalled ring.
  2. restructure supervision tree so that folsom is an included app

    russelldb committed Jul 25, 2012
    All stat mods depend on folsom, yet they are not linked to it.
    This change brings folsom under supervision of a core stat sup,
    which also supervises the riak stat subsystem. Now when folsom exits
    everyone gets to restart clean and recover.
    riak_core_stat_sup (rest_for_one)
           - folsom_sup
           - riak_core_stats_sup (one_for_one)
              - riak_*_stat
              - riak_stat_cache
    riak_core_stats_sup will start and supervise gen_server stat mods at
    registration time, and will re-start them should the sup crash.
Commits on Jul 24, 2012
Commits on Jul 22, 2012
  1. Merge pull request #214 from basho/jdb-ring-manager-load

    jtuple committed Jul 22, 2012
    Fix race during start-up that could overwrite existing ring file.
    Fix issue #166.
  2. Fix capability system race condition

    jtuple committed Jul 21, 2012
    Prevent the node from crashing when the capability system attempts to
    negotiate capabilities before the node has registered any capabilities.
  3. Make the ring manager responsible for loading the ring

    jtuple committed Jul 22, 2012
    Change riak_core_ring_manager and riak_core_app so that the ring manager
    is responsible for loading the ring file from the disk rather than starting
    with an initially empty ring and then relying upon the riak_core app to
    later load the ring. This avoids a race condition with the ring manager
    writing the empty ring to the disk before the riak_core app loads the
    prior ring.
    Note: Riak previously relied upon starting with a fresh ring in order to
    ensure secondary vnodes were started in case any had fallback data that
    needed to be handed off. The act of starting secondaries has long since
    been moved to the riak_core_vnode_manager that periodically starts up
    secondary vnodes over time, therefore there is no longer any need to
    start with a fresh ring. This commit will therefore always load a saved
    ring when the ring_manager starts, rather than starting with a fresh ring.
Commits on Jul 20, 2012
  1. Register stat mods with the riak_core app

    russelldb committed Jul 19, 2012
    When the stat cache crashes, we must re-register stat mods with
    the cache so that it works when re-started.
    Delete stats before register
    This is to ensure that a restarted riak_core_stat will not
    leave any orphaned folsom stats. Folsom needs some work to handle
    crashing owners better. Some table in folsom are owned
    by the creating process, and some by folsom. If riak_core_stat
    crashes some folsom can be left inconsistent. This cleans up
    at start time.
Commits on Jul 18, 2012
  1. Fix tag on riak_sysmon to be 1.1.2

    Jared Morrow
    Jared Morrow committed Jul 18, 2012