Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: bugfix/pevm-fi…
Commits on Nov 7, 2013
  1. @evanmcc

    - add stats return for non-existent fstats entries

    evanmcc authored
    - log when non-existent fstats entries are trimmed
    - dedup the fstats list so the above logging isn't constantly
Commits on Oct 30, 2013
  1. @evanmcc
Commits on Oct 28, 2013
  1. @evanmcc

    add keydir_trim_fstats to bitcask_nifs, allowing the post merge live

    evanmcc authored
    files update to remove entries for deleted files so they are no longer
    considered for merge or for status
Commits on Oct 15, 2013
  1. @joedevivo

    Merge pull request #112 from basho/jd-cuttlefish

    joedevivo authored
    moved in bitcask schema bits from riak.schema
  2. reworded comment for require_crc_hint in schema

    Joe DeVivo authored
Commits on Oct 9, 2013
  1. moved in bitcask schema bits from riak.schema

    Joe DeVivo authored
Commits on Sep 23, 2013
  1. @evanmcc

    Merge pull request #110 from basho/pevm-remove-stdbool

    evanmcc authored
    remove unused header
  2. @evanmcc

    remove unused header

    evanmcc authored
Commits on Sep 21, 2013
  1. @evanmcc

    Merge pull request #104 from basho/pevm-hybrid-multifold

    evanmcc authored
    Improve bitcask iteration concurrency.
  2. @evanmcc

    Merge branch 'develop' into pevm-hybrid-multifold

    evanmcc authored
  3. @evanmcc

    Merge pull request #106 from basho/eas-multifold-refactor

    evanmcc authored
    Refactor of Evan's multifold
Commits on Sep 20, 2013
  1. @engelsanchez

    Fix fold start time

    engelsanchez authored
    The fold code was creating a timestamp *before* the keydir was frozen
    and using that to look at a snapshot of the data. This was hidden by the
    fact that the snapshot code would give in and give you a closest match
    if no entry found for the given timestamp, even if this violated the
    snapshot logic. When I cleaned up that logic, this surfaced.
    Also, fixing get to return not found in the case an entry for the given
    snapshot time was not found.
  2. @evanmcc

    Merge pull request #103 from basho/pevm-keydir-overhead

    evanmcc authored
    add optional key transformer to support new key formats
  3. @evanmcc

    remove pdict usage

    evanmcc authored
Commits on Sep 19, 2013
  1. @engelsanchez
  2. @engelsanchez

    Fix uninitialized variable

    engelsanchez authored
    Found by Andrew running Valgrind.
  3. @engelsanchez

    Fix warnings in debug printing code

    engelsanchez authored
    Also, this will only compile now if BITCASK_DEBUG is set.
  4. @engelsanchez

    Refactor of Evan's multifold

    engelsanchez authored
    This is an attempt to clean up the multifold logic to make it more
    - New find_result struct holds the results of a find operation. This
      simplifies the logic of most operations that use a find and later have
      to figure out exactly where things were found, whether they were a
      tombstone, etc.
    - Removes the need to malloc temporary entries while manipulating the
      keydir. For this, a new proxy struct was created to hold the values
      of a regular or list entry when it's found.
    - Removes the need to malloc temporary entries when looking up stuff
      from the keydir. A new khash operation was added to allow custom hash
      and equal functions to be used, so you can look up from something
      other than a variable length entry.
    - Adds locking to iteration now that it can happen while puts are
      updating the entries hash. Found during review of original PR
Commits on Sep 17, 2013
  1. @evanmcc

    - add safety parens to the MAX macros

    evanmcc authored
    - add frees to close memory leaks revealed by valgrind
      one of which fixes a small leak in master
    - change key count and bytes count to 64bit (previously
      size_t, then uint32_t)
  2. @evanmcc
Commits on Sep 12, 2013
  1. @vinoski
Commits on Sep 10, 2013
  1. @evanmcc

    Improve bitcask iteration concurrency.

    evanmcc authored
    The problem:
    Bitcask was designed to have a single iteration in progress at any one
    time (multiple concurrent iterations are allowed, but must explicitly
    ask to use the old snapshot). Effectively this means that to have a
    trustworthy snapshot, subsequent iterations must wait upon the first
    to complete, whereupon they're awoken and can attemt to iterate.
    The fix:
    The code change here makes a second form of bitcask keydir entry, the
    entry list.  Entry lists are a deduplicated version of entries.  The
    head contains the key, the key length, and a pointer to a list of
    siblings, which contain the keydir values that can vary from instance
    to instance of a key.
    When a put or delete happens while a fold is under way, the entry is
    converted to an entry list with the original entry and the update.
    Gets (including fold gets) now discriminate between versions via
    An arguable implementation decision here was to implement entry lists
    as tagged pointers (LSB set to 1).  This does make reasoning about
    types more complicated, but in a space-constrained environment, the
    extra complexity is worth the space overhead incurred by a union type
    with an internal tag.
    _A complication_
    Keydir entries are stored in an in-memory hashtable using the khash
    library.  Unfortunately, khash isn't iteration-safe across resizes.
    Thus, when we detect that a new put would cause a resize, we fall back
    on the old path, where a pending keydir is instantiated and all new
    updates are noted there until all running iterations complete and the
    keydirs are folded back in.  While this still means that folds can
    block for a potentially long time, since khash doubles the size of its
    storage at each resize, that populated keydirs are much less likely to
    run into blocking behavior.
    Bitcask has a reasonable test suite.  It should be run repeatedly in
    both IO modes.  Note that #93 is
    still open and fails occasionally.  That code is not changed here, so
    should not be counted.
    A long PULSE testing run should be done.  I've accumulated ~9 hours or
    so have been done so far, but ideally more should be done.
    Some riak tests that involve iteration and listing would be good to
    attempt.  Real-world fold-speed tests like handoff and AAE rebuilds
    would also be interesting to see.
    The problematic sitation that we're attempting to avoid here is
    something like this:
     - on a large keydir, a slow iteration, like a merge, is happening.
     - at the same time, AAE is waiting with a thrown-away tree
     - and someone is trying to do a MR job that requires a keyfold
    You should be able to run riak without this changeand see increased
    latencies on fold-dependent operations during merges and AAE tree
    rebuilds.  With this change there should be fewer and shorter latency
    spikes while these things are going on.
    Alternate approaches:
    _Switch to a hash structure with snapshots_
    This likely is a good idea, rather than the current hybrid approach
    but the existing options are honestly somewhat uninspiring.  Because
    of time limitations, it was decided that there simply wasn't enough
    time to do this.
    _layers of pending keydirs_
    While this is intuitively appealing, there are two major problems with
    this approach.  The first is the change of get behavior from O(1) to
    O(N), which could be problematic if N is large.  The second is more
    serious, which is that it's impossible to establish a consistent
    multi-level iteration snapshot without a lot of memory or time
    Eg. with levels 1, 2, and 3, and values A, B, and C
    1 A:3  B:4
    2      B:del C:7
    3 A:1  B:5   C:nf
    You need to either make a new keydir and merge all the pending keydirs
    into that temporary keydir for the snapshot, or keep track of seen
    keys, which is essentially the same approach.  Some thinking about the
    issue suggests that instead of N keydirs, you'd really need N**2
    Open work:
    _rw locks_
    This approach requires taking a lock per step of the iterator.  It's
    possible that this could slow down folds considerably due to lock
    contention.  It may be better to move from the `enif_mutex` to a
    `posix_rw_lock`.  This isn't a trivial change, I don't think, so
    careful measurement should be done to see if it's a problem before we
    do the work.
    This change has broader appeal, as it could potentially make concurrent
    readers (once we've added multiple readers per vnode) faster.
    _gc thread_
    At the moment, we only clean up entry-lists when the value is updated,
    so if you had a large run of writes during a fold that were then never
    updated, the overhead there might never be reclaimed.  It may be worth it
    to add a thread that slowly iterates over the keydir when there are no
    active folders and cleans up existing entry lists.
    Files changed, w/ brief summary of changes:
    - `c_src/bitcask_nifs.c`
      the bulk of the changes described above are here
    - `src/bitcask.erl`
      remove keydir git before put
      instrument folds with time for snapshots
      update tests, add new tests
    - `src/bitcask_nifs.erl`
      add keydir_get/3, timestamp specific get
      make itr timestamps consitent with the other bitcask timestamps
      update tests to work with new behavior
    - `c_src/kash.h`
      add will_resize check
    - `src/bitcask_fileops.erl`
      merges expect hintfiles to always be created, but multiple
      concurrent folds seem to violate this invariant. sleep for a short
      time and then abort the merge if the hintfile doesn't get created.
    - `test/bitcask_qc.erl`
      changes to how deletion is accounted for, as the keydir put
      before delete has been removed to reduce the number of entry
      list members
Commits on Aug 27, 2013
  1. @evanmcc
Commits on Aug 24, 2013
  1. @evanmcc
Commits on Aug 23, 2013
  1. @jtuple

    Tag meck in rebar.config

    jtuple authored
Commits on Aug 21, 2013
  1. removed duplicate line

    Joe DeVivo authored
Commits on Jul 11, 2013
  1. @lemenkov

    Remove warn/2 function

    lemenkov authored
    It's better just to call error_logger:warning_msg/2 here.
    See also commit f9a1069
    Signed-off-by: Peter Lemenkov <>
Commits on Jun 29, 2013
  1. @evanmcc
Commits on Jun 25, 2013
  1. @jaredmorrow

    Merge branch '1.6'

    jaredmorrow authored
  2. @jaredmorrow

    Roll bitcask version 1.6.3

    jaredmorrow authored
Commits on Jun 7, 2013
  1. @evanmcc
Commits on May 31, 2013
  1. @slfritchie

    Merge pull request #92 from basho/slf-fix-merge-worker-logging

    slfritchie authored
    Fix merge logging bug introduced by bs-merge-expiration-change branch
Commits on May 30, 2013
  1. @slfritchie
Commits on May 24, 2013
  1. @evanmcc

    Merge pull request #90 from basho/pevm-dialyzer-fixes

    evanmcc authored
    Dialyzer fixes
Commits on May 23, 2013
  1. @evanmcc
Something went wrong with that request. Please try again.