Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Bz877 listkeys fix #1

Merged
3 commits merged into from

4 participants

Dave Smith Jon Meredith Bryan Fink Erik Søe Sørensen
Dave Smith

Please review this patch for bz://877.

dizzyd added some commits
Dave Smith dizzyd Add defines for condition variables on R13B04 3a8a98f
Dave Smith dizzyd Fix bz://877; keydir NIF should not hold a rwlock across invocations …
…(even though it works on most platforms).

A new thread is now spun up on iteration start to hold the rwlock until iteration completes.
60509d0
Dave Smith dizzyd Move iterator out to keydir_handle; multiple independent readers can …
…now fold (correctly) at the same time.
cfc8f46
Jon Meredith
Owner

I've given the code a quick review over the weekend. I'll spin up the code and test today.

Bryan Fink

Just gave this a smoke test locally, basic operations seem to work, as well as concurrent key folding.

Erik Søe Sørensen

Re. bitcask_nifs_keydir_itr(), the check tagged "// If a iterator thread is already active for this keydir, bail":
This part is not thread safe - there is a race condition: Two (or more) iterator lock threads can be created - and also with two copies of the mutex and condition variable. Confusion ensues.

So, yes it is a race condition -- but only if you use the same handle from two different processes. The state info is per handle, not keydir.

(Reported as https://issues.basho.com/show_bug.cgi?id=1090, with triggering example.)
True, you'd have to use the same handle from different processes; that might be deadly, though, cf. BZ1090.
However, we have experienced something that looks like this issue - OS-level-process stack traces show one thread blocking in iterator_lock_thread(), and another blocking in bitcask_keydir_put_int().

Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie Work-around model limitation/bug in PULSE test, bitcask_eqc.erl
In the C65 counterexample below, the {var,2} worker proc tries to
get key #19 and allegedly fails.  The last step of the main proc is
a put loop of keys #1 - #34.  The {var,2} worker is racing with
both the main proc and also with the {var,7} worker that is
doing a fold.  The fold happens first, which freezes the keydir
when the next put happens.  Then the {var,2} worker does its get
and finds (correctly) `not_found` ... but the model is buggy and
can't predict the correct answer.

I'm not good enough at the temporal logic to fix the model
The Correct Way.  So I've patched it up so that any 'bad' thing
detected by the main/1st proc is truly bad.  But if a 'bad' thing
happens in a forked worker, then if we know that there's a fold
happening at the same time, we assume that the model's predicted
answer is bad and thus we *ignore* the 'bad' thing.

    191> C65 = binary_to_term(element(2,file:read_file("zzz.c65@develop@commit=a2d75.bin"))).
    [[{set,{var,2},
           {call,bitcask_eqc,fork,
                 [[{init,{state,undefined,false,false,[]}},
                   {set,{not_var,8},{not_call,bitcask_eqc,bc_open,[false]}}]]}},
      {set,{var,4},
           {call,bitcask_eqc,fork,
                 [[{init,{state,undefined,false,false,[]}},
                   {set,{not_var,2},{not_call,bitcask_eqc,bc_open,[false]}},
                   {set,{not_var,3},
                        {not_call,bitcask_eqc,bc_close,[{not_var,2}]}},
                   {set,{not_var,9},{not_call,bitcask_eqc,bc_open,[false]}},
                   {set,{not_var,10},
                        {not_call,bitcask_eqc,get,[{not_var,9},19]}}]]}},
      {set,{var,6},
           {call,bitcask_eqc,fork,
                 [[{init,{state,undefined,false,false,[]}},
                   {set,{not_var,3},{not_call,bitcask_eqc,bc_open,[false]}}]]}},
      {set,{var,7},
           {call,bitcask_eqc,fork,
                 [[{init,{state,undefined,false,false,[]}},
                   {set,{not_var,5},{not_call,bitcask_eqc,bc_open,[false]}},
                   {set,{not_var,7},
                        {not_call,bitcask_eqc,fold,[{not_var,5}]}}]]}},
      {set,{var,12},
           {call,bitcask_eqc,fork,
                 [[{init,{state,undefined,false,false,[]}},
                   {set,{not_var,3},{not_call,bitcask_eqc,bc_open,[false]}},
                   {set,{not_var,8},
                        {not_call,bitcask_eqc,fold,[{not_var,3}]}}]]}},
      {set,{var,17},{call,bitcask_eqc,bc_open,[true]}},
      {set,{var,28},{call,bitcask_eqc,put,[{var,17},1,<<>>]}},
      {set,{var,29},{call,bitcask_eqc,needs_merge,[{var,17}]}},
      {set,{var,30},{call,bitcask_eqc,fork_merge,[{var,17}]}},
      {set,{var,32},
           {call,bitcask_eqc,puts,[{var,17},{1,34},<<>>]}}],
     {77863,46676,48146},
     [{events,[]}]]
3b70204
Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie FML: fix tombstone merging bug when K & K's tombstone are in same fileid
Originally found with bitcask_pulse, I deconstructed the test case to
help understand what was happening: the new EUnit test is
new_20131217_a_test_.

As a result of the puts, key #13 is written 3x to fileid #1 (normal,
tombstone, normal) and 1x to fileid #2 (normal @ the very beginning
of the file).  The merge creates fileid #3 and copies only the
tombstone (the normal entry isn't copied because it is out-of-date).
Before the close, the internal keydir contains the correct info
about key #13, but after the close and re-open, we see key #13's
entries: normal (and most recent) in fileid 32, and tombstone in
fileid #3, oops.

The fix is to remove all of the merge input fileids from the set of fileids
that will survive/exist after the merge is finished.
1712650
Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie Fix race with merge & bitcask:delete(): remove from keydir *first*
Change order of scan & fold: now oldest -> youngest

New regression test (currently broken) for recent scan change of oldest -> youngest

Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl

Write tombstone on delete only if there's a keydir entry, plus race bugfix

Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest

The NIF change fixes a long-standing latent bug: when put'ing a key
that does not exist, if there's a race with a merge, keydir_put_int()
would return 'ok' (error) rather than 'already_exists' (correct).  The
'already_exists' return value is a signal to the read-write owner of
the bitcask that the current append file must be closed and a new one
opened (with a larger fileid than any merge).

The tombstone change adds a new tombstone data format.  Old tombstones
will be handled correctly.  New tombstones for any key K contain
the fileid & offset of the key that it is deleting.  If the fileid
F still exists, then the tombstone will always be merged forward.
If the fileid F does not exist, then merging forward is not
necessary. When F was merged, the on-disk representation of key K
not be merged forward: K does not exist in the keydir (because it
was deleted by this tombstone), or it was replaced by a newer put.

FML: fix tombstone merging bug when K & K's tombstone are in same fileid

Originally found with bitcask_pulse, I deconstructed the test case to
help understand what was happening: the new EUnit test is
new_20131217_a_test_.

As a result of the puts, key #13 is written 3x to fileid #1 (normal,
tombstone, normal) and 1x to fileid #2 (normal @ the very beginning
of the file).  The merge creates fileid #3 and copies only the
tombstone (the normal entry isn't copied because it is out-of-date).
Before the close, the internal keydir contains the correct info
about key #13, but after the close and re-open, we see key #13's
entries: normal (and most recent) in fileid 32, and tombstone in
fileid #3, oops.

The fix is to remove all of the merge input fileids from the set of fileids
that will survive/exist after the merge is finished.

WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing

Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken

Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending

Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82

Fix unexported function errors, thanks Buildbot

Omnibus bugfixing, including:

* Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need
it to signal races with merge, alas.

* Add state to #filestate to be able to 'undo' last update to both a
data file and its hint file.  This probably means that we're going
to have to play some games with merge file naming, TBD, stay tuned.

* For bitcask:delete(), make the keydir delete conditional: if it fails,
redo the entire thing again.

* inner_merge_write() can have a race that, if a partial merge happens
at the proper time after, we see an old value reappearing.  Fix by
checking the return value of the keydir put, and if 'already_exists',
then undo the write.

* When do_put() has a race and gets 'already_exists' from the keydir,
undo the write before retrying.  If this key is deleted sometime later,
and then a partial merge happens after that, we might see this value
reappear after the merge is done.

* Add file_truncate() to bitcask_file.erl.  TODO: do the same for the
NIF style I/O.

* Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl

Add UNIX epoch time to put_int NIF call, to avoid C use of time(3).

I hope this will eliminate a nasty source of nondeterminism during
PULSE testing.

Fix buildbot ?TESTDIR definition location problem

Uff da: most PULSE non-determinism problems fixed?

Fix merge race when keydir_put specifies old fileid & offset races with delete

Scenario, with 3 writers, 2 & 3 are racing:

* Writer 1: Put K, write @ {file 1, offset 63}
* Writer 2: Delete operation starts ... but there is no write to disk yet
* Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date ->
  but there is no write to disk yet
* Writer 2: writes a tombstone @ {3,48}
* Writer 2: Keydir conditional delete @ old location @ {1,63} is ok
* Writer 2: keydir delete returns from NIF-land
* Writer 3: merge copies data from {1, 63} -> {4, 42}
* Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to
  incorrect conditional validation: the record is gone, but bug
  permits put to return 'ok'.

Fix bug in previous commit: offset=0 is valid, do not check it

Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing

Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity

Adjust command frequency weightings for PULSE model

Add no_tombstones_after_reopen_test() to test desired condition

Avoid storing tombstones in the keydir when possible

When a Bitcask is opened and scan_key_files() is reading data
from disk and loading the RAM keydir, we now detect if the key
is a tombstone and, if so, do not store it in the keydir.

Normally, only hint files are scanned during startup.  However,
hint files have not stored enough information to confirm that
a key is/is not a tombstone.  I have added such a flag in a
backward-compatible way: the offset size has been reduced from
64 -> 63 bits, and the uppermost bit (which is assumed to be
0 in all cases -- we assume nobody has actually written a file
bit enough to require 64 bits to describe the offset) is used
to signal tombstone status.

An optional argument was given to the increment_file_id() NIF
to communicate to the NIF that a data file exists ... a fact
that would otherwise be lost if a hint/data file contains
only tombstones.

For testing purposes, fold() and fold_keys() are extended with
another argument to expose the presence of keydir tombstones.

Adjust timeouts and concurrency limits in bitcask_pulse.erl
to avoid the worst of false-positive errors when using the
PULSE model: {badrpc,timeout} nonsense.

Add checks to the PULSE model to try to catch improper tombstones in keydir

Dialyzer fixes

Add failing test: tombstones exist in keydir if hint files are destroyed

Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit

Fix intermittent bitcask_merge_worker intermittent timeout failure

NIF review: avoid looking at argv[1] directly

Remove unwanted oopsie foo.erl
04ada05
Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie Fix race with concurrent merges and deletes
Change order of scan & fold: now oldest -> youngest

New regression test (currently broken) for recent scan change of oldest -> youngest

Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl

Write tombstone on delete only if there's a keydir entry, plus race bugfix

Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest

The NIF change fixes a long-standing latent bug: when put'ing a key
that does not exist, if there's a race with a merge, keydir_put_int()
would return 'ok' (error) rather than 'already_exists' (correct).  The
'already_exists' return value is a signal to the read-write owner of
the bitcask that the current append file must be closed and a new one
opened (with a larger fileid than any merge).

The tombstone change adds a new tombstone data format.  Old tombstones
will be handled correctly.  New tombstones for any key K contain
the fileid & offset of the key that it is deleting.  If the fileid
F still exists, then the tombstone will always be merged forward.
If the fileid F does not exist, then merging forward is not
necessary. When F was merged, the on-disk representation of key K
not be merged forward: K does not exist in the keydir (because it
was deleted by this tombstone), or it was replaced by a newer put.

FML: fix tombstone merging bug when K & K's tombstone are in same fileid

Originally found with bitcask_pulse, I deconstructed the test case to
help understand what was happening: the new EUnit test is
new_20131217_a_test_.

As a result of the puts, key #13 is written 3x to fileid #1 (normal,
tombstone, normal) and 1x to fileid #2 (normal @ the very beginning
of the file).  The merge creates fileid #3 and copies only the
tombstone (the normal entry isn't copied because it is out-of-date).
Before the close, the internal keydir contains the correct info
about key #13, but after the close and re-open, we see key #13's
entries: normal (and most recent) in fileid 32, and tombstone in
fileid #3, oops.

The fix is to remove all of the merge input fileids from the set of fileids
that will survive/exist after the merge is finished.

WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing

Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken

Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending

Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82

Fix unexported function errors, thanks Buildbot

Omnibus bugfixing, including:

* Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need
it to signal races with merge, alas.

* Add state to #filestate to be able to 'undo' last update to both a
data file and its hint file.  This probably means that we're going
to have to play some games with merge file naming, TBD, stay tuned.

* For bitcask:delete(), make the keydir delete conditional: if it fails,
redo the entire thing again.

* inner_merge_write() can have a race that, if a partial merge happens
at the proper time after, we see an old value reappearing.  Fix by
checking the return value of the keydir put, and if 'already_exists',
then undo the write.

* When do_put() has a race and gets 'already_exists' from the keydir,
undo the write before retrying.  If this key is deleted sometime later,
and then a partial merge happens after that, we might see this value
reappear after the merge is done.

* Add file_truncate() to bitcask_file.erl.  TODO: do the same for the
NIF style I/O.

* Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl

Add UNIX epoch time to put_int NIF call, to avoid C use of time(3).

I hope this will eliminate a nasty source of nondeterminism during
PULSE testing.

Fix buildbot ?TESTDIR definition location problem

Uff da: most PULSE non-determinism problems fixed?

Fix merge race when keydir_put specifies old fileid & offset races with delete

Scenario, with 3 writers, 2 & 3 are racing:

* Writer 1: Put K, write @ {file 1, offset 63}
* Writer 2: Delete operation starts ... but there is no write to disk yet
* Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date ->
  but there is no write to disk yet
* Writer 2: writes a tombstone @ {3,48}
* Writer 2: Keydir conditional delete @ old location @ {1,63} is ok
* Writer 2: keydir delete returns from NIF-land
* Writer 3: merge copies data from {1, 63} -> {4, 42}
* Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to
  incorrect conditional validation: the record is gone, but bug
  permits put to return 'ok'.

Fix bug in previous commit: offset=0 is valid, do not check it

Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing

Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity

Adjust command frequency weightings for PULSE model

Add no_tombstones_after_reopen_test() to test desired condition

Avoid storing tombstones in the keydir when possible

When a Bitcask is opened and scan_key_files() is reading data
from disk and loading the RAM keydir, we now detect if the key
is a tombstone and, if so, do not store it in the keydir.

Normally, only hint files are scanned during startup.  However,
hint files have not stored enough information to confirm that
a key is/is not a tombstone.  I have added such a flag in a
backward-compatible way: the offset size has been reduced from
64 -> 63 bits, and the uppermost bit (which is assumed to be
0 in all cases -- we assume nobody has actually written a file
bit enough to require 64 bits to describe the offset) is used
to signal tombstone status.

An optional argument was given to the increment_file_id() NIF
to communicate to the NIF that a data file exists ... a fact
that would otherwise be lost if a hint/data file contains
only tombstones.

For testing purposes, fold() and fold_keys() are extended with
another argument to expose the presence of keydir tombstones.

Adjust timeouts and concurrency limits in bitcask_pulse.erl
to avoid the worst of false-positive errors when using the
PULSE model: {badrpc,timeout} nonsense.

Add checks to the PULSE model to try to catch improper tombstones in keydir

Dialyzer fixes

Add failing test: tombstones exist in keydir if hint files are destroyed

Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit

Fix intermittent bitcask_merge_worker intermittent timeout failure

NIF review: avoid looking at argv[1] directly

Remove unwanted oopsie foo.erl

Restore put will resize check for multi-fold

Restoring the check that will postpone freezing the keydir until puts
will require a structural change. Without this, the new multifold code
is crippled.

Redo merge/delete race fix

Consolidating retry on merge race logic in do_put, making delete simply
a wrapper around a put(tombstone) operation.
Some cleanup regarding the different tombstone versions

Add missing file truncate op for NIF mode

Plus a little unit test to make sure the file truncation operation
works.
bd037b8
Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie Fix race with concurrent merges and deletes
Change order of scan & fold: now oldest -> youngest

New regression test (currently broken) for recent scan change of oldest -> youngest

Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl

Write tombstone on delete only if there's a keydir entry, plus race bugfix

Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest

The NIF change fixes a long-standing latent bug: when put'ing a key
that does not exist, if there's a race with a merge, keydir_put_int()
would return 'ok' (error) rather than 'already_exists' (correct).  The
'already_exists' return value is a signal to the read-write owner of
the bitcask that the current append file must be closed and a new one
opened (with a larger fileid than any merge).

The tombstone change adds a new tombstone data format.  Old tombstones
will be handled correctly.  New tombstones for any key K contain
the fileid & offset of the key that it is deleting.  If the fileid
F still exists, then the tombstone will always be merged forward.
If the fileid F does not exist, then merging forward is not
necessary. When F was merged, the on-disk representation of key K
not be merged forward: K does not exist in the keydir (because it
was deleted by this tombstone), or it was replaced by a newer put.

FML: fix tombstone merging bug when K & K's tombstone are in same fileid

Originally found with bitcask_pulse, I deconstructed the test case to
help understand what was happening: the new EUnit test is
new_20131217_a_test_.

As a result of the puts, key #13 is written 3x to fileid #1 (normal,
tombstone, normal) and 1x to fileid #2 (normal @ the very beginning
of the file).  The merge creates fileid #3 and copies only the
tombstone (the normal entry isn't copied because it is out-of-date).
Before the close, the internal keydir contains the correct info
about key #13, but after the close and re-open, we see key #13's
entries: normal (and most recent) in fileid 32, and tombstone in
fileid #3, oops.

The fix is to remove all of the merge input fileids from the set of fileids
that will survive/exist after the merge is finished.

WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing

Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken

Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending

Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82

Fix unexported function errors, thanks Buildbot

Omnibus bugfixing, including:

* Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need
it to signal races with merge, alas.

* Add state to #filestate to be able to 'undo' last update to both a
data file and its hint file.  This probably means that we're going
to have to play some games with merge file naming, TBD, stay tuned.

* For bitcask:delete(), make the keydir delete conditional: if it fails,
redo the entire thing again.

* inner_merge_write() can have a race that, if a partial merge happens
at the proper time after, we see an old value reappearing.  Fix by
checking the return value of the keydir put, and if 'already_exists',
then undo the write.

* When do_put() has a race and gets 'already_exists' from the keydir,
undo the write before retrying.  If this key is deleted sometime later,
and then a partial merge happens after that, we might see this value
reappear after the merge is done.

* Add file_truncate() to bitcask_file.erl.  TODO: do the same for the
NIF style I/O.

* Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl

Add UNIX epoch time to put_int NIF call, to avoid C use of time(3).

I hope this will eliminate a nasty source of nondeterminism during
PULSE testing.

Fix buildbot ?TESTDIR definition location problem

Uff da: most PULSE non-determinism problems fixed?

Fix merge race when keydir_put specifies old fileid & offset races with delete

Scenario, with 3 writers, 2 & 3 are racing:

* Writer 1: Put K, write @ {file 1, offset 63}
* Writer 2: Delete operation starts ... but there is no write to disk yet
* Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date ->
  but there is no write to disk yet
* Writer 2: writes a tombstone @ {3,48}
* Writer 2: Keydir conditional delete @ old location @ {1,63} is ok
* Writer 2: keydir delete returns from NIF-land
* Writer 3: merge copies data from {1, 63} -> {4, 42}
* Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to
  incorrect conditional validation: the record is gone, but bug
  permits put to return 'ok'.

Fix bug in previous commit: offset=0 is valid, do not check it

Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing

Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity

Adjust command frequency weightings for PULSE model

Add no_tombstones_after_reopen_test() to test desired condition

Avoid storing tombstones in the keydir when possible

When a Bitcask is opened and scan_key_files() is reading data
from disk and loading the RAM keydir, we now detect if the key
is a tombstone and, if so, do not store it in the keydir.

Normally, only hint files are scanned during startup.  However,
hint files have not stored enough information to confirm that
a key is/is not a tombstone.  I have added such a flag in a
backward-compatible way: the offset size has been reduced from
64 -> 63 bits, and the uppermost bit (which is assumed to be
0 in all cases -- we assume nobody has actually written a file
bit enough to require 64 bits to describe the offset) is used
to signal tombstone status.

An optional argument was given to the increment_file_id() NIF
to communicate to the NIF that a data file exists ... a fact
that would otherwise be lost if a hint/data file contains
only tombstones.

For testing purposes, fold() and fold_keys() are extended with
another argument to expose the presence of keydir tombstones.

Adjust timeouts and concurrency limits in bitcask_pulse.erl
to avoid the worst of false-positive errors when using the
PULSE model: {badrpc,timeout} nonsense.

Add checks to the PULSE model to try to catch improper tombstones in keydir

Dialyzer fixes

Add failing test: tombstones exist in keydir if hint files are destroyed

Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit

Fix intermittent bitcask_merge_worker intermittent timeout failure

NIF review: avoid looking at argv[1] directly

Remove unwanted oopsie foo.erl

Restore put will resize check for multi-fold

Restoring the check that will postpone freezing the keydir until puts
will require a structural change. Without this, the new multifold code
is crippled.

Redo merge/delete race fix

Consolidating retry on merge race logic in do_put, making delete simply
a wrapper around a put(tombstone) operation.
Some cleanup regarding the different tombstone versions

Add missing file truncate op for NIF mode

Plus a little unit test to make sure the file truncation operation
works.
4466fea
Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie Fix race with concurrent merges and deletes
Add regression test for GH #82

Change order of scan & fold: now oldest -> youngest

New regression test (currently broken) for recent scan change of oldest -> youngest

Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl

Write tombstone on delete only if there's a keydir entry, plus race bugfix

Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest

The NIF change fixes a long-standing latent bug: when put'ing a key
that does not exist, if there's a race with a merge, keydir_put_int()
would return 'ok' (error) rather than 'already_exists' (correct).  The
'already_exists' return value is a signal to the read-write owner of
the bitcask that the current append file must be closed and a new one
opened (with a larger fileid than any merge).

The tombstone change adds a new tombstone data format.  Old tombstones
will be handled correctly.  New tombstones for any key K contain
the fileid & offset of the key that it is deleting.  If the fileid
F still exists, then the tombstone will always be merged forward.
If the fileid F does not exist, then merging forward is not
necessary. When F was merged, the on-disk representation of key K
not be merged forward: K does not exist in the keydir (because it
was deleted by this tombstone), or it was replaced by a newer put.

FML: fix tombstone merging bug when K & K's tombstone are in same fileid

Originally found with bitcask_pulse, I deconstructed the test case to
help understand what was happening: the new EUnit test is
new_20131217_a_test_.

As a result of the puts, key #13 is written 3x to fileid #1 (normal,
tombstone, normal) and 1x to fileid #2 (normal @ the very beginning
of the file).  The merge creates fileid #3 and copies only the
tombstone (the normal entry isn't copied because it is out-of-date).
Before the close, the internal keydir contains the correct info
about key #13, but after the close and re-open, we see key #13's
entries: normal (and most recent) in fileid 32, and tombstone in
fileid #3, oops.

The fix is to remove all of the merge input fileids from the set of fileids
that will survive/exist after the merge is finished.

WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing

Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken

Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending

Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82

Fix unexported function errors, thanks Buildbot

Omnibus bugfixing, including:

* Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need
it to signal races with merge, alas.

* Add state to #filestate to be able to 'undo' last update to both a
data file and its hint file.  This probably means that we're going
to have to play some games with merge file naming, TBD, stay tuned.

* For bitcask:delete(), make the keydir delete conditional: if it fails,
redo the entire thing again.

* inner_merge_write() can have a race that, if a partial merge happens
at the proper time after, we see an old value reappearing.  Fix by
checking the return value of the keydir put, and if 'already_exists',
then undo the write.

* When do_put() has a race and gets 'already_exists' from the keydir,
undo the write before retrying.  If this key is deleted sometime later,
and then a partial merge happens after that, we might see this value
reappear after the merge is done.

* Add file_truncate() to bitcask_file.erl.  TODO: do the same for the
NIF style I/O.

* Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl

Add UNIX epoch time to put_int NIF call, to avoid C use of time(3).

I hope this will eliminate a nasty source of nondeterminism during
PULSE testing.

Fix buildbot ?TESTDIR definition location problem

Uff da: most PULSE non-determinism problems fixed?

Fix merge race when keydir_put specifies old fileid & offset races with delete

Scenario, with 3 writers, 2 & 3 are racing:

* Writer 1: Put K, write @ {file 1, offset 63}
* Writer 2: Delete operation starts ... but there is no write to disk yet
* Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date ->
  but there is no write to disk yet
* Writer 2: writes a tombstone @ {3,48}
* Writer 2: Keydir conditional delete @ old location @ {1,63} is ok
* Writer 2: keydir delete returns from NIF-land
* Writer 3: merge copies data from {1, 63} -> {4, 42}
* Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to
  incorrect conditional validation: the record is gone, but bug
  permits put to return 'ok'.

Fix bug in previous commit: offset=0 is valid, do not check it

Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing

Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity

Adjust command frequency weightings for PULSE model

Add no_tombstones_after_reopen_test() to test desired condition

Avoid storing tombstones in the keydir when possible

When a Bitcask is opened and scan_key_files() is reading data
from disk and loading the RAM keydir, we now detect if the key
is a tombstone and, if so, do not store it in the keydir.

Normally, only hint files are scanned during startup.  However,
hint files have not stored enough information to confirm that
a key is/is not a tombstone.  I have added such a flag in a
backward-compatible way: the offset size has been reduced from
64 -> 63 bits, and the uppermost bit (which is assumed to be
0 in all cases -- we assume nobody has actually written a file
bit enough to require 64 bits to describe the offset) is used
to signal tombstone status.

An optional argument was given to the increment_file_id() NIF
to communicate to the NIF that a data file exists ... a fact
that would otherwise be lost if a hint/data file contains
only tombstones.

For testing purposes, fold() and fold_keys() are extended with
another argument to expose the presence of keydir tombstones.

Adjust timeouts and concurrency limits in bitcask_pulse.erl
to avoid the worst of false-positive errors when using the
PULSE model: {badrpc,timeout} nonsense.

Add checks to the PULSE model to try to catch improper tombstones in keydir

Dialyzer fixes

Add failing test: tombstones exist in keydir if hint files are destroyed

Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit

Fix intermittent bitcask_merge_worker intermittent timeout failure

NIF review: avoid looking at argv[1] directly

Remove unwanted oopsie foo.erl

Restore put will resize check for multi-fold

Restoring the check that will postpone freezing the keydir until puts
will require a structural change. Without this, the new multifold code
is crippled.

Redo merge/delete race fix

Consolidating retry on merge race logic in do_put, making delete simply
a wrapper around a put(tombstone) operation.
Some cleanup regarding the different tombstone versions

Add missing file truncate op for NIF mode

Plus a little unit test to make sure the file truncation operation
works.

Fix unit tests

Scott's time fudge was leaking into other tests, causing them to see
static timestamps and sometimes fail.
The un_write code requires support for file:position style offset
parameters in our NIF version ({bof, 5}, {eof, -1}, etc).

Remove debug logging and clear on set fudged time
e7f9647
Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie Fix race with concurrent merges and deletes
Add regression test for GH #82

Change order of scan & fold: now oldest -> youngest

New regression test (currently broken) for recent scan change of oldest -> youngest

Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl

Write tombstone on delete only if there's a keydir entry, plus race bugfix

Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest

The NIF change fixes a long-standing latent bug: when put'ing a key
that does not exist, if there's a race with a merge, keydir_put_int()
would return 'ok' (error) rather than 'already_exists' (correct).  The
'already_exists' return value is a signal to the read-write owner of
the bitcask that the current append file must be closed and a new one
opened (with a larger fileid than any merge).

The tombstone change adds a new tombstone data format.  Old tombstones
will be handled correctly.  New tombstones for any key K contain
the fileid & offset of the key that it is deleting.  If the fileid
F still exists, then the tombstone will always be merged forward.
If the fileid F does not exist, then merging forward is not
necessary. When F was merged, the on-disk representation of key K
not be merged forward: K does not exist in the keydir (because it
was deleted by this tombstone), or it was replaced by a newer put.

FML: fix tombstone merging bug when K & K's tombstone are in same fileid

Originally found with bitcask_pulse, I deconstructed the test case to
help understand what was happening: the new EUnit test is
new_20131217_a_test_.

As a result of the puts, key #13 is written 3x to fileid #1 (normal,
tombstone, normal) and 1x to fileid #2 (normal @ the very beginning
of the file).  The merge creates fileid #3 and copies only the
tombstone (the normal entry isn't copied because it is out-of-date).
Before the close, the internal keydir contains the correct info
about key #13, but after the close and re-open, we see key #13's
entries: normal (and most recent) in fileid 32, and tombstone in
fileid #3, oops.

The fix is to remove all of the merge input fileids from the set of fileids
that will survive/exist after the merge is finished.

WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing

Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken

Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending

Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82

Fix unexported function errors, thanks Buildbot

Omnibus bugfixing, including:

* Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need
it to signal races with merge, alas.

* Add state to #filestate to be able to 'undo' last update to both a
data file and its hint file.  This probably means that we're going
to have to play some games with merge file naming, TBD, stay tuned.

* For bitcask:delete(), make the keydir delete conditional: if it fails,
redo the entire thing again.

* inner_merge_write() can have a race that, if a partial merge happens
at the proper time after, we see an old value reappearing.  Fix by
checking the return value of the keydir put, and if 'already_exists',
then undo the write.

* When do_put() has a race and gets 'already_exists' from the keydir,
undo the write before retrying.  If this key is deleted sometime later,
and then a partial merge happens after that, we might see this value
reappear after the merge is done.

* Add file_truncate() to bitcask_file.erl.  TODO: do the same for the
NIF style I/O.

* Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl

Add UNIX epoch time to put_int NIF call, to avoid C use of time(3).

I hope this will eliminate a nasty source of nondeterminism during
PULSE testing.

Fix buildbot ?TESTDIR definition location problem

Uff da: most PULSE non-determinism problems fixed?

Fix merge race when keydir_put specifies old fileid & offset races with delete

Scenario, with 3 writers, 2 & 3 are racing:

* Writer 1: Put K, write @ {file 1, offset 63}
* Writer 2: Delete operation starts ... but there is no write to disk yet
* Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date ->
  but there is no write to disk yet
* Writer 2: writes a tombstone @ {3,48}
* Writer 2: Keydir conditional delete @ old location @ {1,63} is ok
* Writer 2: keydir delete returns from NIF-land
* Writer 3: merge copies data from {1, 63} -> {4, 42}
* Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to
  incorrect conditional validation: the record is gone, but bug
  permits put to return 'ok'.

Fix bug in previous commit: offset=0 is valid, do not check it

Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing

Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity

Adjust command frequency weightings for PULSE model

Add no_tombstones_after_reopen_test() to test desired condition

Avoid storing tombstones in the keydir when possible

When a Bitcask is opened and scan_key_files() is reading data
from disk and loading the RAM keydir, we now detect if the key
is a tombstone and, if so, do not store it in the keydir.

Normally, only hint files are scanned during startup.  However,
hint files have not stored enough information to confirm that
a key is/is not a tombstone.  I have added such a flag in a
backward-compatible way: the offset size has been reduced from
64 -> 63 bits, and the uppermost bit (which is assumed to be
0 in all cases -- we assume nobody has actually written a file
bit enough to require 64 bits to describe the offset) is used
to signal tombstone status.

An optional argument was given to the increment_file_id() NIF
to communicate to the NIF that a data file exists ... a fact
that would otherwise be lost if a hint/data file contains
only tombstones.

For testing purposes, fold() and fold_keys() are extended with
another argument to expose the presence of keydir tombstones.

Adjust timeouts and concurrency limits in bitcask_pulse.erl
to avoid the worst of false-positive errors when using the
PULSE model: {badrpc,timeout} nonsense.

Add checks to the PULSE model to try to catch improper tombstones in keydir

Dialyzer fixes

Add failing test: tombstones exist in keydir if hint files are destroyed

Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit

Fix intermittent bitcask_merge_worker intermittent timeout failure

NIF review: avoid looking at argv[1] directly

Remove unwanted oopsie foo.erl

Restore put will resize check for multi-fold

Restoring the check that will postpone freezing the keydir until puts
will require a structural change. Without this, the new multifold code
is crippled.

Redo merge/delete race fix

Consolidating retry on merge race logic in do_put, making delete simply
a wrapper around a put(tombstone) operation.
Some cleanup regarding the different tombstone versions

Add missing file truncate op for NIF mode

Plus a little unit test to make sure the file truncation operation
works.

Fix unit tests

Scott's time fudge was leaking into other tests, causing them to see
static timestamps and sometimes fail.
The un_write code requires support for file:position style offset
parameters in our NIF version ({bof, 5}, {eof, -1}, etc).

Remove debug logging and clear on set fudged time
53bb6a8
Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie Bugfix: Race with first put/delete with a merge...
... where the biggest file in the keydir (name-wise) is 0 bytes,
e.g. created due to another merge race and was undone via un_write().

Failing test case:

    Cu2 = [[{set,{var,1},{call,bitcask_pulse,bc_open,[true]}},{set,{var,2},{call,bitcask_pulse,bc_close,[{var,1}]}},{set,{var,3},{call,bitcask_pulse,bc_open,[true]}},{set,{var,4},{call,bitcask_pulse,puts,[{var,3},{1,5},<<0,0,0>>]}},{set,{var,14},{call,bitcask_pulse,delete,[{var,3},1]}},{set,{var,15},{call,bitcask_pulse,bc_close,[{var,3}]}},{set,{var,16},{call,bitcask_pulse,bc_open,[true]}},{set,{var,18},{call,bitcask_pulse,puts,[{var,16},{2,6},<<0,0,0>>]}},{set,{var,21},{call,bitcask_pulse,merge,[{var,16}]}},{set,{var,22},{call,bitcask_pulse,delete,[{var,16},2]}},{set,{var,23},{call,bitcask_pulse,delete,[{var,16},2]}},{set,{var,24},{call,bitcask_pulse,fork_merge,[{var,16}]}},{set,{var,25},{call,bitcask_pulse,bc_close,[{var,16}]}},{set,{var,26},{call,bitcask_pulse,incr_clock,[]}},{set,{var,29},{call,bitcask_pulse,bc_open,[true]}},{set,{var,30},{call,bitcask_pulse,delete,[{var,29},1]}}],{1394,530114,962634},[{0,[]}]].
    eqc:check(bitcask_pulse:prop_pulse(), Cu2).

The last commands are: fork_merge, close, incr_clock, open, and delete key #1.

Error found:

    {exception,{'EXIT',{{badmatch,{error,{badmatch,{error,eexist}}}},
                        [{bitcask,do_put,6,[{file,"src/bitcask.erl"},{line,1297}]},
                         {bitcask,put,4,[{file,"src/bitcask.erl"},{line,267}]},
                         {bitcask,delete,2,[{file,"src/bitcask.erl"},{line,276}]},
                         {bitcask_pulse,delete,2,
                                        [{file,"test/bitcask_pulse.erl"},
                                         {line,857}]},
                         {eqc_statem,run_commands,2,
                                     [{file,"../src/eqc_statem.erl"},
                                      {line,737}]}]}}} /= ok
1c6193c
Scott Lystig Fritchie slfritchie referenced this pull request from a commit
Scott Lystig Fritchie slfritchie Fix race with concurrent merges and deletes
* A delete in the writer process may race with a merge writing a value
 for the same key on a file with id greater than the current write file.
 Re-opening the cask would end up loading the merged value and forgetting
 the delete. The delete path will now detect the race and retry the
 tombstone write on a file with id greater than the merge file until it
 prevails.
* A partial merge could end up removing the tombstone for a key that
 still has a value in one of the unmerged files. To detect this, a new
 kind of tombstone that contains a pointer to the value it is replacing
 has been added.
* Change order of scan & fold: now oldest -> youngest

Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl

Write tombstone on delete only if there's a keydir entry, plus race bugfix

The NIF change fixes a long-standing latent bug: when put'ing a key
that does not exist, if there's a race with a merge, keydir_put_int()
would return 'ok' (error) rather than 'already_exists' (correct).  The
'already_exists' return value is a signal to the read-write owner of
the bitcask that the current append file must be closed and a new one
opened (with a larger fileid than any merge).

The tombstone change adds a new tombstone data format.  Old tombstones
will be handled correctly.  New tombstones for any key K contain
the fileid & offset of the key that it is deleting.  If the fileid
F still exists, then the tombstone will always be merged forward.
If the fileid F does not exist, then merging forward is not
necessary. When F was merged, the on-disk representation of key K
not be merged forward: K does not exist in the keydir (because it
was deleted by this tombstone), or it was replaced by a newer put.

FML: fix tombstone merging bug when K & K's tombstone are in same fileid

Originally found with bitcask_pulse, I deconstructed the test case to
help understand what was happening: the new EUnit test is
new_20131217_a_test_.

As a result of the puts, key #13 is written 3x to fileid #1 (normal,
tombstone, normal) and 1x to fileid #2 (normal @ the very beginning
of the file).  The merge creates fileid #3 and copies only the
tombstone (the normal entry isn't copied because it is out-of-date).
Before the close, the internal keydir contains the correct info
about key #13, but after the close and re-open, we see key #13's
entries: normal (and most recent) in fileid 32, and tombstone in
fileid #3, oops.

The fix is to remove all of the merge input fileids from the set of fileids
that will survive/exist after the merge is finished.

WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing

Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken

Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending

Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82

Fix unexported function errors, thanks Buildbot

Omnibus bugfixing, including:

* Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need
it to signal races with merge, alas.

* Add state to #filestate to be able to 'undo' last update to both a
data file and its hint file.  This probably means that we're going
to have to play some games with merge file naming, TBD, stay tuned.

* For bitcask:delete(), make the keydir delete conditional: if it fails,
redo the entire thing again.

* inner_merge_write() can have a race that, if a partial merge happens
at the proper time after, we see an old value reappearing.  Fix by
checking the return value of the keydir put, and if 'already_exists',
then undo the write.

* When do_put() has a race and gets 'already_exists' from the keydir,
undo the write before retrying.  If this key is deleted sometime later,
and then a partial merge happens after that, we might see this value
reappear after the merge is done.

* Add file_truncate() to bitcask_file.erl.  TODO: do the same for the
NIF style I/O.

* Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl

Add UNIX epoch time to put_int NIF call, to avoid C use of time(3).

I hope this will eliminate a nasty source of nondeterminism during
PULSE testing.

Fix buildbot ?TESTDIR definition location problem

Uff da: most PULSE non-determinism problems fixed?

Fix merge race when keydir_put specifies old fileid & offset races with delete

Scenario, with 3 writers, 2 & 3 are racing:

* Writer 1: Put K, write @ {file 1, offset 63}
* Writer 2: Delete operation starts ... but there is no write to disk yet
* Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date ->
  but there is no write to disk yet
* Writer 2: writes a tombstone @ {3,48}
* Writer 2: Keydir conditional delete @ old location @ {1,63} is ok
* Writer 2: keydir delete returns from NIF-land
* Writer 3: merge copies data from {1, 63} -> {4, 42}
* Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to
  incorrect conditional validation: the record is gone, but bug
  permits put to return 'ok'.

Fix bug in previous commit: offset=0 is valid, do not check it

Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing

Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity

Adjust command frequency weightings for PULSE model

Add no_tombstones_after_reopen_test() to test desired condition

Avoid storing tombstones in the keydir when possible

When a Bitcask is opened and scan_key_files() is reading data
from disk and loading the RAM keydir, we now detect if the key
is a tombstone and, if so, do not store it in the keydir.

Normally, only hint files are scanned during startup.  However,
hint files have not stored enough information to confirm that
a key is/is not a tombstone.  I have added such a flag in a
backward-compatible way: the offset size has been reduced from
64 -> 63 bits, and the uppermost bit (which is assumed to be
0 in all cases -- we assume nobody has actually written a file
bit enough to require 64 bits to describe the offset) is used
to signal tombstone status.

An optional argument was given to the increment_file_id() NIF
to communicate to the NIF that a data file exists ... a fact
that would otherwise be lost if a hint/data file contains
only tombstones.

For testing purposes, fold() and fold_keys() are extended with
another argument to expose the presence of keydir tombstones.

Adjust timeouts and concurrency limits in bitcask_pulse.erl
to avoid the worst of false-positive errors when using the
PULSE model: {badrpc,timeout} nonsense.

Add checks to the PULSE model to try to catch improper tombstones in keydir

Dialyzer fixes

Add failing test: tombstones exist in keydir if hint files are destroyed

Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit

Fix intermittent bitcask_merge_worker intermittent timeout failure

NIF review: avoid looking at argv[1] directly

Remove unwanted oopsie foo.erl

Restore put will resize check for multi-fold

Restoring the check that will postpone freezing the keydir until puts
will require a structural change. Without this, the new multifold code
is crippled.

Redo merge/delete race fix

Consolidating retry on merge race logic in do_put, making delete simply
a wrapper around a put(tombstone) operation.
Some cleanup regarding the different tombstone versions

Add missing file truncate op for NIF mode

Plus a little unit test to make sure the file truncation operation
works.

Fix unit tests

Scott's time fudge was leaking into other tests, causing them to see
static timestamps and sometimes fail.
The un_write code requires support for file:position style offset
parameters in our NIF version ({bof, 5}, {eof, -1}, etc).

Remove debug logging and clear on set fudged time
b7f04b1
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 12, 2010
  1. Dave Smith
  2. Dave Smith

    Fix bz://877; keydir NIF should not hold a rwlock across invocations …

    dizzyd authored
    …(even though it works on most platforms).
    
    A new thread is now spun up on iteration start to hold the rwlock until iteration completes.
  3. Dave Smith

    Move iterator out to keydir_handle; multiple independent readers can …

    dizzyd authored
    …now fold (correctly) at the same time.
This page is out of date. Refresh to see the latest.
Showing with 111 additions and 24 deletions.
  1. +105 −24 c_src/bitcask_nifs.c
  2. +6 −0 c_src/erl_nif_compat.h
129 c_src/bitcask_nifs.c
View
@@ -66,7 +66,6 @@ KHASH_MAP_INIT_INT(fstats, bitcask_fstats_entry*);
typedef struct
{
khash_t(entries)* entries;
- khiter_t iterator;
khash_t(fstats)* fstats;
size_t key_count;
size_t key_bytes;
@@ -79,6 +78,11 @@ typedef struct
typedef struct
{
bitcask_keydir* keydir;
+ khiter_t iterator;
+ ErlNifTid il_thread; /* Iterator lock thread */
+ ErlNifMutex* il_signal_mutex;
+ ErlNifCond* il_signal;
+ int il_signal_flag;
} bitcask_keydir_handle;
typedef struct
@@ -121,7 +125,10 @@ static ERL_NIF_TERM ATOM_FALSE;
static ERL_NIF_TERM ATOM_FSTAT_ERROR;
static ERL_NIF_TERM ATOM_FTRUNCATE_ERROR;
static ERL_NIF_TERM ATOM_GETFL_ERROR;
+static ERL_NIF_TERM ATOM_ILT_CREATE_ERROR; /* Iteration lock thread creation error */
+static ERL_NIF_TERM ATOM_ITERATION_IN_PROCESS;
static ERL_NIF_TERM ATOM_ITERATION_NOT_PERMITTED;
+static ERL_NIF_TERM ATOM_ITERATION_NOT_STARTED;
static ERL_NIF_TERM ATOM_LOCK_NOT_WRITABLE;
static ERL_NIF_TERM ATOM_NOT_FOUND;
static ERL_NIF_TERM ATOM_NOT_READY;
@@ -195,13 +202,13 @@ ERL_NIF_TERM bitcask_nifs_keydir_new0(ErlNifEnv* env, int argc, const ERL_NIF_TE
bitcask_keydir_handle* handle = enif_alloc_resource_compat(env,
bitcask_keydir_RESOURCE,
sizeof(bitcask_keydir_handle));
+ memset(handle, '\0', sizeof(bitcask_keydir_handle));
// Now allocate the actual keydir instance. Because it's unnamed/shared, we'll
// leave the name and lock portions null'd out
bitcask_keydir* keydir = enif_alloc_compat(env, sizeof(bitcask_keydir));
memset(keydir, '\0', sizeof(bitcask_keydir));
keydir->entries = kh_init(entries);
- keydir->iterator = kh_begin(keydir->entries);
keydir->fstats = kh_init(fstats);
// Assign the keydir to our handle and hand it back
@@ -252,7 +259,6 @@ ERL_NIF_TERM bitcask_nifs_keydir_new1(ErlNifEnv* env, int argc, const ERL_NIF_TE
// Initialize hash tables
keydir->entries = kh_init(entries);
- keydir->iterator = kh_begin(keydir->entries);
keydir->fstats = kh_init(fstats);
// Be sure to initialize the rwlock and set our refcount
@@ -269,6 +275,7 @@ ERL_NIF_TERM bitcask_nifs_keydir_new1(ErlNifEnv* env, int argc, const ERL_NIF_TE
bitcask_keydir_handle* handle = enif_alloc_resource_compat(env,
bitcask_keydir_RESOURCE,
sizeof(bitcask_keydir_handle));
+ memset(handle, '\0', sizeof(bitcask_keydir_handle));
handle->keydir = keydir;
ERL_NIF_TERM result = enif_make_resource(env, handle);
enif_release_resource_compat(env, handle);
@@ -410,10 +417,6 @@ ERL_NIF_TERM bitcask_nifs_keydir_put_int(ErlNifEnv* env, int argc, const ERL_NIF
update_fstats(env, keydir, entry.file_id, 1, 1,
entry.total_sz, entry.total_sz);
- // Reset the iterator to ensure that someone doesn't cause a crash
- // by trying to interleave change operations with iterations
- keydir->iterator = kh_begin(keydir->entries);
-
RW_UNLOCK(keydir);
return ATOM_OK;
}
@@ -573,10 +576,6 @@ ERL_NIF_TERM bitcask_nifs_keydir_remove(ErlNifEnv* env, int argc, const ERL_NIF_
kh_del(entries, keydir->entries, itr);
- // Reset the iterator to ensure that someone doesn't cause a crash
- // by trying to interleave change operations with iterations
- keydir->iterator = kh_begin(keydir->entries);
-
enif_free_compat(env, entry);
}
@@ -601,6 +600,7 @@ ERL_NIF_TERM bitcask_nifs_keydir_copy(ErlNifEnv* env, int argc, const ERL_NIF_TE
bitcask_keydir_handle* new_handle = enif_alloc_resource_compat(env,
bitcask_keydir_RESOURCE,
sizeof(bitcask_keydir_handle));
+ memset(handle, '\0', sizeof(bitcask_keydir_handle));
// Now allocate the actual keydir instance. Because it's unnamed/shared, we'll
// leave the name and lock portions null'd out
@@ -608,7 +608,6 @@ ERL_NIF_TERM bitcask_nifs_keydir_copy(ErlNifEnv* env, int argc, const ERL_NIF_TE
new_handle->keydir = new_keydir;
memset(new_keydir, '\0', sizeof(bitcask_keydir));
new_keydir->entries = kh_init(entries);
- new_keydir->iterator = kh_begin(new_keydir->entries);
new_keydir->fstats = kh_init(fstats);
// Deep copy each item from the existing handle
@@ -652,17 +651,75 @@ ERL_NIF_TERM bitcask_nifs_keydir_copy(ErlNifEnv* env, int argc, const ERL_NIF_TE
}
}
+static void* iterator_lock_thread(void* arg)
+{
+ bitcask_keydir_handle* handle = (bitcask_keydir_handle*)arg;
+
+ // First, grab the read lock for iteration
+ R_LOCK(handle->keydir);
+
+ // Set the flag that we are ready to begin iteration
+ handle->il_signal_flag = 1;
+
+ // Signal the invoking thread to indicate that the read lock is acquired
+ // and then wait for a signal back to release the read lock.
+ enif_mutex_lock(handle->il_signal_mutex);
+ enif_cond_signal(handle->il_signal);
+ while (handle->il_signal_flag)
+ {
+ enif_cond_wait(handle->il_signal, handle->il_signal_mutex);
+ }
+
+ // Iterating thread is all done with the read lock; release it and exit
+ R_UNLOCK(handle->keydir);
+ enif_mutex_unlock(handle->il_signal_mutex);
+ return 0;
+}
+
ERL_NIF_TERM bitcask_nifs_keydir_itr(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
bitcask_keydir_handle* handle;
if (enif_get_resource(env, argv[0], bitcask_keydir_RESOURCE, (void**)&handle))
{
- bitcask_keydir* keydir = handle->keydir;
+ // If a iterator thread is already active for this keydir, bail
+ if (handle->il_thread)
+ {
+ return enif_make_tuple2(env, ATOM_ERROR, ATOM_ITERATION_IN_PROCESS);
+ }
+
+ // Create the mutex and signal
+ handle->il_signal_flag = 0;
+ handle->il_signal_mutex = enif_mutex_create("bitcask_itr_signal_mutex");
+ handle->il_signal = enif_cond_create("bitcask_itr_signal");
+
+ // Grab the mutex BEFORE spawning the thread; otherwise we might miss the signal
+ enif_mutex_lock(handle->il_signal_mutex);
+
+ // Spawn the lock thread
+ int rc = enif_thread_create("bitcask_itr_lock_thread", &(handle->il_thread),
+ iterator_lock_thread, handle, 0);
+ if (rc != 0)
+ {
+ // Failed to create the lock holder -- release the lock and cleanup
+ enif_mutex_unlock(handle->il_signal_mutex);
+ enif_cond_destroy(handle->il_signal);
+ enif_mutex_destroy(handle->il_signal_mutex);
+ handle->il_thread = 0;
+ return enif_make_tuple2(env, ATOM_ERROR,
+ enif_make_tuple2(env, ATOM_ILT_CREATE_ERROR, rc));
+ }
+
+ // Wait for the signal from the lock thread that iteration may commence
+ while (!handle->il_signal_flag)
+ {
+ enif_cond_wait(handle->il_signal, handle->il_signal_mutex);
+ }
+
+ // Ready to go; initialize the iterator and unlock the signal mutex
+ handle->iterator = kh_begin(handle->keydir->entries);
+ enif_mutex_unlock(handle->il_signal_mutex);
- // Grab the lock and initialize the iterator
- R_LOCK(keydir);
- keydir->iterator = kh_begin(keydir->entries);
return ATOM_OK;
}
else
@@ -679,11 +736,17 @@ ERL_NIF_TERM bitcask_nifs_keydir_itr_next(ErlNifEnv* env, int argc, const ERL_NI
{
bitcask_keydir* keydir = handle->keydir;
- while (keydir->iterator != kh_end(keydir->entries))
+ if (handle->il_signal_flag != 1)
{
- if (kh_exist(keydir->entries, keydir->iterator))
+ // Iteration not started!
+ return enif_make_tuple2(env, ATOM_ERROR, ATOM_ITERATION_NOT_STARTED);
+ }
+
+ while (handle->iterator != kh_end(keydir->entries))
+ {
+ if (kh_exist(keydir->entries, handle->iterator))
{
- bitcask_keydir_entry* entry = kh_key(keydir->entries, keydir->iterator);
+ bitcask_keydir_entry* entry = kh_key(keydir->entries, handle->iterator);
ErlNifBinary key;
// Alloc the binary and make sure it succeeded
@@ -705,13 +768,13 @@ ERL_NIF_TERM bitcask_nifs_keydir_itr_next(ErlNifEnv* env, int argc, const ERL_NI
enif_make_uint(env, entry->tstamp));
// Update the iterator to the next entry
- (keydir->iterator)++;
+ (handle->iterator)++;
return curr;
}
else
{
// No item in this slot; increment the iterator and keep looping
- (keydir->iterator)++;
+ (handle->iterator)++;
}
}
@@ -730,10 +793,25 @@ ERL_NIF_TERM bitcask_nifs_keydir_itr_release(ErlNifEnv* env, int argc, const ERL
if (enif_get_resource(env, argv[0], bitcask_keydir_RESOURCE, (void**)&handle))
{
- bitcask_keydir* keydir = handle->keydir;
+ if (handle->il_signal_flag != 1)
+ {
+ // Iteration not started!
+ return enif_make_tuple2(env, ATOM_ERROR, ATOM_ITERATION_NOT_STARTED);
+ }
+
+ // Lock the signal mutex, clear the il_signal_flag and tell the lock thread
+ // to drop the read lock
+ enif_mutex_lock(handle->il_signal_mutex);
+ handle->il_signal_flag = 0;
+ enif_cond_signal(handle->il_signal);
+ enif_mutex_unlock(handle->il_signal_mutex);
+ enif_thread_join(handle->il_thread, 0);
+
+ // Cleanup
+ enif_cond_destroy(handle->il_signal);
+ enif_mutex_destroy(handle->il_signal_mutex);
+ handle->il_thread = 0;
- // Unlock the keydir
- R_UNLOCK(keydir);
return ATOM_OK;
}
else
@@ -1144,7 +1222,10 @@ static int on_load(ErlNifEnv* env, void** priv_data, ERL_NIF_TERM load_info)
ATOM_FSTAT_ERROR = enif_make_atom(env, "fstat_error");
ATOM_FTRUNCATE_ERROR = enif_make_atom(env, "ftruncate_error");
ATOM_GETFL_ERROR = enif_make_atom(env, "getfl_error");
+ ATOM_ILT_CREATE_ERROR = enif_make_atom(env, "ilt_create_error");
+ ATOM_ITERATION_IN_PROCESS = enif_make_atom(env, "iteration_in_process");
ATOM_ITERATION_NOT_PERMITTED = enif_make_atom(env, "iteration_not_permitted");
+ ATOM_ITERATION_NOT_STARTED = enif_make_atom(env, "iteration_not_started");
ATOM_LOCK_NOT_WRITABLE = enif_make_atom(env, "lock_not_writable");
ATOM_NOT_FOUND = enif_make_atom(env, "not_found");
ATOM_NOT_READY = enif_make_atom(env, "not_ready");
6 c_src/erl_nif_compat.h
View
@@ -15,6 +15,12 @@ extern "C" {
#define enif_alloc_binary_compat enif_alloc_binary
#define enif_alloc_compat enif_alloc
#define enif_free_compat enif_free
+#define enif_cond_create erl_drv_cond_create
+#define enif_cond_destroy erl_drv_cond_destroy
+#define enif_cond_signal erl_drv_cond_signal
+#define enif_cond_broadcast erl_drv_cond_broadcast
+#define enif_cond_wait erl_drv_cond_wait
+#define ErlNifCond ErlDrvCond
#endif /* R13B04 */
#if ERL_NIF_MAJOR_VERSION == 2 && ERL_NIF_MINOR_VERSION == 0
Something went wrong with that request. Please try again.