Skip to content

Use Erlang file I/O from dedicated procs rather than NIFs #73

Merged
merged 7 commits into from Dec 3, 2012

3 participants

@jtuple
jtuple commented Dec 3, 2012

Bitcask previously used raw file I/O to read/write files. However, since raw file I/O uses a non-optimized selective receive to wait for a reply back from the efile driver, this approach had numerous problems when Bitcask was used within processes with many incoming messages (such as how Bitcask is used in Riak).

In commit 79d5eb3, NIFs were introduced to solve this problem. The file I/O NIFs would block the Erlang scheduler, but solve the issue encountered with selective receive. Unfortunately, using blocking NIFs is much worse than originally thought. Thus, blocking NIFs are not the right solution to this problem.

This commit changes Bitcask to once again use Erlang's built-in file I/O, but now wraps each open file in a separate gen_server that interacts with the raw port. The original process now waits on a gen_server reply which uses an optimized selective receive, while the file process handles the unoptimized selective receive from the port driver. In our usage, the file process only has a single request outstanding, and therefore does not run into the selective receive issue.

jtuple added some commits Nov 27, 2012
@jtuple jtuple Use Erlang file I/O from dedicated procs rather than NIFs
Bitcask previously used raw file I/O to read/write files. However, since
raw file I/O uses a non-optimized selective receive to wait for a reply
back from the efile driver, this approach had numerous problems when
Bitcask was used within processes with many incoming messages (such as how
Bitcask is used in Riak).

In commit 79d5eb3, NIFs were introduced
to solve this problem. The file I/O NIFs would block the Erlang scheduler,
but solve the issue encountered with selective receive. Unfortunately,
using blocking NIFs is much worse than originally thought. Thus, NIFs are
not the right solution to this problem.

This commit changes Bitcask to once again use Erlang's built-in file I/O,
but now wraps each open file in a separate gen_server that interacts with
the raw port. The original process now waits on a gen_server reply which
uses an optimized selective receive, while the file process handles the
unoptimized selective receive from the port driver. In our usage, the file
process only has a single request outstanding, and therefore does not run
into the selective receive issue.
f20c143
@jtuple jtuple Handle exclusive mode in bitcask_file:file_open a6cc3ae
@jtuple jtuple Changes to make PULSE test more stable d950b95
@jtuple jtuple Increase timeout on slower tests 2d11479
@jtuple jtuple Clean-up open bitcask_file instances when owner exits 69e02fb
@jtuple jtuple Clean-up bitcask_file code + add state typespecs 047a322
@gburd gburd and 1 other commented on an outdated diff Dec 3, 2012
c_src/bitcask_nifs.c
@@ -1646,7 +1646,7 @@ static void msg_pending_awaken(ErlNifEnv* env, bitcask_keydir* keydir,
for (idx = 0; idx < keydir->pending_awaken_count; idx++)
{
enif_clear_env(msg_env);
-#ifdef PULSE
+#ifdef PULSE_NOWAY_JOSE
@gburd
gburd added a note Dec 3, 2012

Is there a better name for this? Maybe a comment explaining what's going on would help too.

@slfritchie
slfritchie added a note Dec 3, 2012

FWIW, the gsb-async-nifs branch uses:

#ifdef PULSE_NO_REALLY_JUST_USE_THE_REGULAR_SEND

Removing the unused code would be good because it's, well, unused. However, I think it's also a useful placeholder for later discussion with Hans & Ulf & the rest of the Quviq folk: it's only 52% clear to me why you'd want to use a PULSE-instrumented send here in the NIF code, but it's 100% clear that actually using a PULSE-instrumented send here in the NIF code causes all kinds of deadlock problems for PULSE. Some kind of "TODO" reminder would be good. {shrug}

@gburd
gburd added a note Dec 3, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@gburd
gburd commented Dec 3, 2012

I've run tests on a 5 node cluster with AAE enabled and been able to load 75M keys using this code without issue and at a speed similar to what is in production now. The code looks clean and straight forward, nice work. If you have time in the future you might consider adding a way to choose NIFs vs Erlang File I/O via a config option. That might come in handy as a simple fallback if we needed it. Other than that, looks good. +1

@jtuple jtuple merged commit 9621aaf into master Dec 3, 2012

1 check failed

Details default The Travis build failed
@engelsanchez engelsanchez deleted the jdb-avoid-file-nifs branch Mar 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.