Fix Travis CI failure of multiple_merges_during_fold_test After many intermittent timing failures over in TravisCI land, we have this: BUILD PASSED: [bitcask jdb-fix-travis-failure:771c92d] by Scott Lystig Fritchie https://travis-ci.org/basho/bitcask/builds/5266676
in cases where the op list is too short to make a reasonable & correct guess of behavior. Use eqc:testing_time/2 for test length & EUnit timeout at 2x testing time.
Additional testing demonstrates that using Erlang's built-in I/O provides more consistent behavior than the synchronous NIF approach. While the NIF approach is known to provide higher throughput, the non-deterministic impact on the Erlang VM from the use of blocking NIFs outweighs the benefit. Defaulting to the safer option. Users can manually revert to using NIFs if desired.
The recent changes to revert Bitcask to using pure Erlang file I/O have lead to noticeable performance regressions in certain workloads. This change makes the file mode configurable, setting the default to the previous NIF-based approach. Users can switch to the Erlang-based I/O mode if they run into situations where the NIF-approach leads to scheduler collapse. The option is determined based on the Bitcask application variable 'io_mode'. When missing or set to 'nif', the NIF-approach is used. When set to 'erlang', standard Erlang efile is used.
The environment that Travis runs test under unfortunately does not immediately pick up the expected number of files to merge for the different merge operations in the multiple_merges_during_fold_test. This commit changes the test to re-try merging until the expected result is seen (or eventually have the test timeout).
Bitcask previously used raw file I/O to read/write files. However, since raw file I/O uses a non-optimized selective receive to wait for a reply back from the efile driver, this approach had numerous problems when Bitcask was used within processes with many incoming messages (such as how Bitcask is used in Riak). In commit 79d5eb3, NIFs were introduced to solve this problem. The file I/O NIFs would block the Erlang scheduler, but solve the issue encountered with selective receive. Unfortunately, using blocking NIFs is much worse than originally thought. Thus, NIFs are not the right solution to this problem. This commit changes Bitcask to once again use Erlang's built-in file I/O, but now wraps each open file in a separate gen_server that interacts with the raw port. The original process now waits on a gen_server reply which uses an optimized selective receive, while the file process handles the unoptimized selective receive from the port driver. In our usage, the file process only has a single request outstanding, and therefore does not run into the selective receive issue.
For the record, the versions of QuickCheck & PULSE that I was using for this testing: * QuickCheck 1.27.2 * PULSE 1.27.2 * git://github.com/Quviq/pulse_otp.git commit dff6ea12af94c0320d4a5beabc16a1fa50abf688 Author: Hans Svensson <email@example.com> Date: Mon Aug 27 15:42:43 2012 +0200