Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: slf-tombstone-…
Commits on Feb 20, 2014
  1. @slfritchie
Commits on Jan 30, 2014
  1. @slfritchie

    Add new NIF, bitcask_nifs_keydir_global_pending_frozen/0, to assist P…

    slfritchie authored
    …ULSE test
    Bitcask's fold semantics are difficult enough to try to predict,
    but when a keydir is actually frozen, the job is even more difficult.
    This NIF is added to reliably inspect if the keydir was frozen during
    a PULSE test case, and if we find a fold or fold_keys problem, we
    let it pass.
    The new NIF is also tested by an existing EUnit test,
  2. @slfritchie
  3. @slfritchie
Commits on Jan 28, 2014
  1. @slfritchie
  2. @slfritchie

    Add new EUnit test and fix PULSE model

    slfritchie authored
    The EUnit test is freeze_close_reopen_test(), which forces an
    actual old-style freeze of the keydir and then checks the sanity
    of folds while frozen.
    The PULSE model change is something I'm not 100% happy with,
    but anytime the PULSE model has false positives, it takes a huge
    amount of time to determine that it's a false alarm.  So, this
    change should eliminate a rare source of false reports ... but I
    hope I haven't introduced something that will also hide a real
    The problem comes from having a read-only proc folding a cask
    and having it frozen, then the read-write proc closes & reopens
    the cask and does a fold.  If the keydir has been frozen the
    entire time, the PULSE model doesn't know about the freezing and
    thus reports an error in fold/fold_keys results.
    This model change will discover if there has ever been a fold
    in progress while the 1st/read-write pid opens the cask.  If yes,
    then fold/fold_keys mismatches are excused.
Commits on Jan 24, 2014
  1. @slfritchie

    Add sibling->regular entry conversion sweeper, to GC keydir siblings

    slfritchie authored
    The amortization mechanism attempts to limit itself to less than
    1 msec of additional latency for any get/put/delete call, but as
    shown below, it doesn't always stay strictly under that limit when
    you're freeing hundreds of megabytes of data.
    Below are histograms showing the NIF function latency as recorded
    on OS X on my MBP for a couple of different mutation rates: 90%
    and 9%.  The workload is:
    * Put 10M keys
    * Create an iterator
    * Modify some percentage of the keys (e.g. 90%, 9%)
    * Close the iterator
    * Fetch all 10M keys, measuring the NIF latency time of each call.
    ---- snip ---- snip ---- snip ---- snip ---- snip ----
    By my back-of-the-envelope calculations (1 word = 8 bytes)
          bitcask_keydir_entry = 3 words + key bytes
          bitcask_keydir_entry_head = 2 words + key bytes
          bitcask_keydir_entry_sib = 5 words
    So each 2-sibling entry is using 2 + (2*5) = 12 words, not counting
    the key bytes.
    After the sibling sweep, we're going from 12 words -> 3 words per key.
    So for 10M keys, that's a savings of 9 words -> 687MBytes.
    (RSS for the 10M keys @ 90% mutation tests peaks at 1.45GB of RAM.  By
     comparison, the 10M key @ 0% mutation test peaks at ~850MByte RSS.
     So, those numbers roughly match, yay.)
    No wonder that there are a very small number of
    bitcask_nifs_keydir_get_int() calls that are taking > 10msec to
    finish: it may be the OS getting involved with side-effects from
    free(3) calls??
    ** 90% mutation
    10M keys, ~90% mutated, 0% deleted
        bitcask_nifs:yoo(10*1000000, 9*1000*1000, 0).
    *** Tracing on for sequence 1...
        bitcask_nifs_keydir_get_int latency with off-cpu (usec)
                 value  ------------- Distribution ------------- count
                     8 |                                         0
                    16 |@@@@@@@@@@@@@@@                          40
                    32 |@@@@@@                                   17
                    64 |@@@                                      8
                   128 |@@                                       5
                   256 |@@                                       5
                   512 |@@@@@@@@@@                               27
                  1024 |@@                                       5
                  2048 |                                         0
        bitcask_nifs_keydir_get_int latency (usec)
                 value  ------------- Distribution ------------- count
                     0 |                                         0
                     1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         7954469
                     2 |@@@@@@@@                                 2051656
                     4 |                                         10440
                     8 |                                         25446
                    16 |                                         209
                    32 |                                         5
                    64 |                                         0
                   128 |                                         9
                   256 |                                         0
                   512 |                                         11698
                  1024 |                                         723
                  2048 |                                         12
                  4096 |                                         0
                  8192 |                                         0
                 16384 |                                         1
                 32768 |                                         1
                 65536 |                                         0
    *** Tracing on for sequence 4...
        bitcask_nifs_keydir_get_int latency with off-cpu (usec)
                 value  ------------- Distribution ------------- count
                     8 |                                         0
                    16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              56
                    32 |@@@@@@                                   13
                    64 |@@@@                                     8
                   128 |@@                                       5
                   256 |                                         0
        bitcask_nifs_keydir_get_int latency (usec)
                 value  ------------- Distribution ------------- count
                     0 |                                         0
                     1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          7754589
                     2 |@@@@@@@@@                                2211364
                     4 |                                         15906
                     8 |                                         25269
                    16 |                                         375
                    32 |                                         8
                    64 |                                         0
    ** 9% mutation
    10M keys, ~9% mutated, 0% deleted
        bitcask_nifs:yoo(10*1000000, 9 * 100*1000, 0).
    *** Tracing on for sequence 1...
        bitcask_nifs_keydir_get_int latency with off-cpu (usec)
                 value  ------------- Distribution ------------- count
                     8 |                                         0
                    16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             64
                    32 |@@@@@                                    12
                    64 |@@@                                      7
                   128 |@@                                       4
                   256 |@                                        3
                   512 |@                                        2
                  1024 |                                         0
        bitcask_nifs_keydir_get_int latency (usec)
                 value  ------------- Distribution ------------- count
                     0 |                                         0
                     1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           7530608
                     2 |@@@@@@@@@@                               2465556
                     4 |                                         12014
                     8 |                                         24721
                    16 |                                         234
                    32 |                                         6
                    64 |                                         0
                   128 |                                         0
                   256 |                                         1
                   512 |                                         1473
                  1024 |                                         3
                  2048 |                                         0
    *** Tracing on for sequence 4...
        bitcask_nifs_keydir_get_int latency with off-cpu (usec)
                 value  ------------- Distribution ------------- count
                     4 |                                         0
                     8 |@                                        3
                    16 |@@@@@@@@@@@@@@@@@@@@@@@@@                55
                    32 |@@@@@@@                                  15
                    64 |@@                                       5
                   128 |@@@                                      6
                   256 |@                                        2
                   512 |                                         1
                  1024 |                                         0
        bitcask_nifs_keydir_get_int latency (usec)
                 value  ------------- Distribution ------------- count
                     0 |                                         0
                     1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           7631794
                     2 |@@@@@@@@@                                2345179
                     4 |                                         13769
                     8 |                                         26056
                    16 |                                         324
                    32 |                                         22
                    64 |                                         2
                   128 |                                         0
Commits on Jan 22, 2014
  1. @slfritchie

    Add an 'epoch' counter to subdivide Bitcask timestamps

    slfritchie authored
    In order for the multi-folder work to operate correctly, it
    needs to be able to keep track of exactly when a fold started
    relative to any mutations that happen during the same 1-second
    time interval.  If a merge process cannot tell exactly if a
    mutation happened before or after its fold started, then merge
    may do the wrong thing: operate on a key zero times, or
    operate on a key multiple times.
    An epoch counter now subdivides Bitcask timestamps.  The epoch
    counter is incremented whenever an iterator is formed.  A new NIF
    was added, keydir_fold_is_starting(), to inform a fold what the
    current epoch is.  The fold starting timestamp + epoch are used for
    all get operations that the folder performs.
    If the keydir contains only a single entry for a key, there's
    no need to store the epoch with that key.  The epoch is stored
    in the struct bitcask_keydir_entry_sib, when there are multiple
    entries per key.
    Things are very tricky (alas) when keeping entries in the
    siblings 'next' linked list in newest-to-oldest timewise order.
    A merge can do something "newer" wall-clock-wise with a mutation
    that is "older" by that same wall-clock view.  The 'tstamp'
    stored in the keydir is the wall-clock-time when the entry
    was written for any reason, including a merge.  However, we
    do NOT want a merge to change a key's expiration time, thus a
    merge may not change a key's tstamp -- the solution is to have
    the keydir also store the key's 'orig_tstamp' to keep a copy
    of they key's specified-by-the-client timestamp for expiration
    To avoid mis-behavior of merging when the OS system clock
    moves backward across a 1-second boundary, there is new checking
    for mutations where Now < keydir->biggest_timestamp.  Operations
    are retried when this condition is detected.
    Try to avoid false-positives in the
    bitcask_pulse:check_no_tombstones() predicate by calling only
    when opened in read-write mode.
    Remove the fold_visits_unfrozen_test_() and replace with a
    corrected fold_snapshotX_test_()
Commits on Jan 7, 2014
  1. @slfritchie
  2. @slfritchie
  3. @slfritchie
Commits on Dec 26, 2013
  1. @slfritchie
  2. @slfritchie
  3. @slfritchie

    Dialyzer fixes

    slfritchie authored
  4. @slfritchie
  5. @slfritchie

    Avoid storing tombstones in the keydir when possible

    slfritchie authored
    When a Bitcask is opened and scan_key_files() is reading data
    from disk and loading the RAM keydir, we now detect if the key
    is a tombstone and, if so, do not store it in the keydir.
    Normally, only hint files are scanned during startup.  However,
    hint files have not stored enough information to confirm that
    a key is/is not a tombstone.  I have added such a flag in a
    backward-compatible way: the offset size has been reduced from
    64 -> 63 bits, and the uppermost bit (which is assumed to be
    0 in all cases -- we assume nobody has actually written a file
    bit enough to require 64 bits to describe the offset) is used
    to signal tombstone status.
    An optional argument was given to the increment_file_id() NIF
    to communicate to the NIF that a data file exists ... a fact
    that would otherwise be lost if a hint/data file contains
    only tombstones.
    For testing purposes, fold() and fold_keys() are extended with
    another argument to expose the presence of keydir tombstones.
    Adjust timeouts and concurrency limits in bitcask_pulse.erl
    to avoid the worst of false-positive errors when using the
    PULSE model: {badrpc,timeout} nonsense.
  6. @slfritchie
Commits on Dec 25, 2013
  1. @slfritchie
Commits on Dec 24, 2013
  1. @slfritchie
  2. @slfritchie
  3. @slfritchie
Commits on Dec 23, 2013
  1. @slfritchie

    Fix merge race when keydir_put specifies old fileid & offset races wi…

    slfritchie authored
    …th delete
    Scenario, with 3 writers, 2 & 3 are racing:
    * Writer 1: Put K, write @ {file 1, offset 63}
    * Writer 2: Delete operation starts ... but there is no write to disk yet
    * Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date ->
      but there is no write to disk yet
    * Writer 2: writes a tombstone @ {3,48}
    * Writer 2: Keydir conditional delete @ old location @ {1,63} is ok
    * Writer 2: keydir delete returns from NIF-land
    * Writer 3: merge copies data from {1, 63} -> {4, 42}
    * Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to
      incorrect conditional validation: the record is gone, but bug
      permits put to return 'ok'.
  2. @slfritchie
Commits on Dec 21, 2013
  1. @slfritchie
  2. @slfritchie

    Add UNIX epoch time to put_int NIF call, to avoid C use of time(3).

    slfritchie authored
    I hope this will eliminate a nasty source of nondeterminism during
    PULSE testing.
  3. @slfritchie

    Omnibus bugfixing, including:

    slfritchie authored
    * Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need
    it to signal races with merge, alas.
    * Add state to #filestate to be able to 'undo' last update to both a
    data file and its hint file.  This probably means that we're going
    to have to play some games with merge file naming, TBD, stay tuned.
    * For bitcask:delete(), make the keydir delete conditional: if it fails,
    redo the entire thing again.
    * inner_merge_write() can have a race that, if a partial merge happens
    at the proper time after, we see an old value reappearing.  Fix by
    checking the return value of the keydir put, and if 'already_exists',
    then undo the write.
    * When do_put() has a race and gets 'already_exists' from the keydir,
    undo the write before retrying.  If this key is deleted sometime later,
    and then a partial merge happens after that, we might see this value
    reappear after the merge is done.
    * Add file_truncate() to bitcask_file.erl.  TODO: do the same for the
    NIF style I/O.
    * Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl
Commits on Dec 18, 2013
  1. @slfritchie
  2. @slfritchie

    Fix bug introduced by almost-bugfix in commit 1a9c99 that was suppose…

    slfritchie authored
    …d to be a partial fix for GH #82
  3. @slfritchie

    Fix iterator freeze bug demonstrated by last commit: remove kh_put_wi…

    slfritchie authored
    …ll_resize() predicate test for creating a keydir->pending
  4. @slfritchie
  5. @slfritchie

    WIP: new test new_20131217_c_test_() is broken, need to experiment wi…

    slfritchie authored
    …th bitcask:delete & NIF usage before continuing
Commits on Dec 17, 2013
  1. @slfritchie

    FML: fix tombstone merging bug when K & K's tombstone are in same fileid

    slfritchie authored
    Originally found with bitcask_pulse, I deconstructed the test case to
    help understand what was happening: the new EUnit test is
    As a result of the puts, key #13 is written 3x to fileid #1 (normal,
    tombstone, normal) and 1x to fileid #2 (normal @ the very beginning
    of the file).  The merge creates fileid #3 and copies only the
    tombstone (the normal entry isn't copied because it is out-of-date).
    Before the close, the internal keydir contains the correct info
    about key #13, but after the close and re-open, we see key #13's
    entries: normal (and most recent) in fileid 32, and tombstone in
    fileid #3, oops.
    The fix is to remove all of the merge input fileids from the set of fileids
    that will survive/exist after the merge is finished.
  2. @slfritchie

    Fix remaining bug when bitcask scan/fold order was changed to oldest-…

    slfritchie authored
    The NIF change fixes a long-standing latent bug: when put'ing a key
    that does not exist, if there's a race with a merge, keydir_put_int()
    would return 'ok' (error) rather than 'already_exists' (correct).  The
    'already_exists' return value is a signal to the read-write owner of
    the bitcask that the current append file must be closed and a new one
    opened (with a larger fileid than any merge).
    The tombstone change adds a new tombstone data format.  Old tombstones
    will be handled correctly.  New tombstones for any key K contain
    the fileid & offset of the key that it is deleting.  If the fileid
    F still exists, then the tombstone will always be merged forward.
    If the fileid F does not exist, then merging forward is not
    necessary. When F was merged, the on-disk representation of key K
    not be merged forward: K does not exist in the keydir (because it
    was deleted by this tombstone), or it was replaced by a newer put.
Commits on Dec 15, 2013
  1. @slfritchie
  2. @slfritchie
Something went wrong with that request. Please try again.