Skip to content
Commits on Dec 3, 2012
  1. @jtuple

    Retry merge

    jtuple committed Dec 3, 2012
  2. @jtuple

    Add more debug

    jtuple committed Dec 3, 2012
  3. @jtuple

    Add more debug

    jtuple committed Dec 3, 2012
  4. @jtuple

    Add try/catch

    jtuple committed Dec 3, 2012
  5. @jtuple
  6. @jtuple

    Merge branch 'jdb-avoid-file-nifs'

    jtuple committed Dec 3, 2012
    Conflicts:
    	test/bitcask_timeshift.erl
  7. @jtuple
  8. @jtuple
Commits on Nov 29, 2012
  1. @jtuple
  2. @jtuple

    Increase timeout on slower tests

    jtuple committed Nov 28, 2012
  3. @jtuple
Commits on Nov 28, 2012
  1. @jtuple
  2. @jtuple

    Use Erlang file I/O from dedicated procs rather than NIFs

    jtuple committed Nov 27, 2012
    Bitcask previously used raw file I/O to read/write files. However, since
    raw file I/O uses a non-optimized selective receive to wait for a reply
    back from the efile driver, this approach had numerous problems when
    Bitcask was used within processes with many incoming messages (such as how
    Bitcask is used in Riak).
    
    In commit 79d5eb3, NIFs were introduced
    to solve this problem. The file I/O NIFs would block the Erlang scheduler,
    but solve the issue encountered with selective receive. Unfortunately,
    using blocking NIFs is much worse than originally thought. Thus, NIFs are
    not the right solution to this problem.
    
    This commit changes Bitcask to once again use Erlang's built-in file I/O,
    but now wraps each open file in a separate gen_server that interacts with
    the raw port. The original process now waits on a gen_server reply which
    uses an optimized selective receive, while the file process handles the
    unoptimized selective receive from the port driver. In our usage, the file
    process only has a single request outstanding, and therefore does not run
    into the selective receive issue.
Commits on Nov 27, 2012
  1. @slfritchie
  2. @slfritchie
Commits on Nov 16, 2012
  1. @slfritchie

    Merge pull request #70 from basho/slf-dialyzer20121116

    slfritchie committed Nov 16, 2012
    Clear all Dialyzer warnings
  2. @slfritchie

    Remove type inference cruft

    slfritchie committed Nov 16, 2012
  3. @slfritchie

    Minimal changes to get zero Dialyzer warnings

    slfritchie committed Nov 16, 2012
    Nice to see that Dialyzer caught a bug from parallel development,
    adding the is_empty_estimate/1 function, and today's merge of
    the new QuickCheck model.
  4. @slfritchie
  5. @slfritchie
  6. @slfritchie

    Add usage example to comments at top of Run-eunit-loop.expect

    slfritchie committed Nov 16, 2012
    For the record, the versions of QuickCheck & PULSE that I
    was using for this testing:
    
    * QuickCheck 1.27.2
    * PULSE 1.27.2
    * git://github.com/Quviq/pulse_otp.git
      commit dff6ea12af94c0320d4a5beabc16a1fa50abf688
      Author: Hans Svensson <hanssv@gmail.com>
      Date:   Mon Aug 27 15:42:43 2012 +0200
Commits on Nov 15, 2012
  1. @slfritchie
  2. @slfritchie
  3. @slfritchie
  4. @slfritchie
  5. @slfritchie
Commits on Nov 14, 2012
  1. @slfritchie

    Merge pull request #66 from basho/slf-crc-error-spam

    slfritchie committed Nov 13, 2012
    Fix log spam introduced by branch 'gh62-badrecord-mstate'
  2. @slfritchie
Commits on Nov 13, 2012
  1. @slfritchie

    Add Run-eunit-loop.expect

    slfritchie committed Nov 13, 2012
    The Run-eunit-loop.expect script is a work-around for a number of
    problems that I ran into when using PULSE adn the bitcask_eqc.erl
    test model.  It's a mess and could really use a refactoring, but
    it does what I needed it to....
    
    Before identifying the cause of SIGSEGV and "Bad tag" aborts,
    I wanted to automatically restart testing if the test failed for
    either reason.  Ditto for deadlock and timeout problems within
    PULSE itself.  The timeout problems have been quite mysterious,
    but as far as I can tell it isn't a problem with the bitcask
    code.  I would run rebar with network distributed enabled, then
    attach to it via "erl -remsh rebar@localhost" and use Redbug to
    watch for calls to bitcask, bitcask_nifs, bitcask_lockops, and
    bitcask_fileops when one of the timeouts happened ... and there
    were zero calls to those functions.
    
    The most common problem I see, as of this commit, are "Invariant
    broken" errors, e.g.,
    
         Invariant broken, <0.28726.1> did send at {"c_src/bitcask_nifs.c",1718} when 'handle_info.WorkerPid' is supposed to be the only running process!!
    
    If I change pulse_send() so that we send messages directly to
    their recipient (instead of the PULSE controller), then I see
    PULSE timeout errors instead of invariant errors.  {shrug}
    
    Another PULSE problem that I'd see in the past were things
    like this:
    
        =ERROR REPORT==== 10-Nov-2012::17:43:40 ===
        Unknown command for pulse: loaded
        =ERROR REPORT==== 11-Nov-2012::21:31:43 ===
        Unknown command for pulse: [code_server|{module,pulse_gen_server}]
        =ERROR REPORT==== 4-Nov-2012::21:47:14 ===
        Unknown command for pulse: {11,<<121,48,236,98,2,156,137,137,27,144,44>>}
        Unknown command for pulse: [{root,yield},
                                    {root,yield},
                                    {root,yield},
                                    .......
  2. @slfritchie

    Clean up bitcask_eqc.erl

    slfritchie committed Nov 13, 2012
    * Allow any test process to call incr_clock()
    * Clean up unused var warnings
    * Add explicit support to run on local or slave node, default = local
    * Comment out the useful but now (I hope) unnecessary scribbling of
      the test inputs to a /tmp scratch file before executing.
    * prop_pulse_test_() runs for 60 seconds
    * Fix fork_results() `after` timeout to avoid false positives when
      the test case is really huge.
  3. @slfritchie
  4. @slfritchie

    Add bitcask:init_keydir_scan_key_files/0 to retry initial keydir scan

    slfritchie committed Nov 12, 2012
    %% If someone launches enough parallel merge operations to
    %% interfere with our attempts to scan this keydir for this many
    %% times, then we are just plain unlucky.  Or QuickCheck smites us
    %% from lofty Mt. Stochastic.
  5. @slfritchie

    Fix very elusive GC-triggered VM crash caused by pulse_send()

    slfritchie committed Nov 12, 2012
    This fix appears to fix a very vexing, elusive VM crash that
    is triggered well after the data-corrupting fact by a major
    GC event.
    
    After experimenting many different ways with trying to find
    the problem, using valgrind + a specific test case that
    managed to always provoke the problem.  It was quite difficult
    to find a deterministic counter-example -- almost all were
    ones that would succeed most of the time and only fail once
    every 20 or 50 or 200 attempts.
    
    Fix: use the same env for all parts of the term that we send
         to the PULSE process.
Commits on Nov 12, 2012
  1. @slfritchie
  2. @slfritchie

    Fix old/reintroduced race condition with merge

    slfritchie committed Nov 12, 2012
    An earlier version of this if statement in bitcask_nifs_keydir_put_int()
    included this timestamp check ... and it's certainly necessary, to avoid
    a race with a merge where the merge candidate has a timestamp that's
    older than the current keydir entry.
Something went wrong with that request. Please try again.