Fix race with deleting stale input files from merge, h/t @evanmcc #86

Merged
merged 1 commit into from Mar 29, 2013

Conversation

Projects
None yet
3 participants
Contributor

slfritchie commented Mar 28, 2013

Evan found a failing QuickCheck + PULSE test case that causes a
fold to fail when racing with a merge. The race appears to be
very difficult to trigger via QC + PULSE, alas, even when
tweaking the operation weights to get more puts + folds + merge
operations to happen on average.

My proposed fix is to move the call to purge_setuid_files/1
to the init_keydir() function. init_keydir() knows if
we're opening the keydir for the first time (not_ready or
not), so purge the stale setuid files only in the not_ready
case.

Fix race with deleting stale input files from merge, h/t @evanmcc
Evan found a failing QuickCheck + PULSE test case that causes a
fold to fail when racing with a merge.  The race appears to be
very difficult to trigger via QC + PULSE, alas, even when
tweaking the operation weights to get more puts + folds + merge
operations to happen on average.

My proposed fix is to move the call to `purge_setuid_files/1`
to the `init_keydir()` function.  `init_keydir()` knows if
we're opening the keydir for the first time (`not_ready` or
not), so purge the stale setuid files only in the `not_ready`
case.
Contributor

evanmcc commented Mar 28, 2013

This looks good to me, but best to have another reviewer in the loop. PULSE checking ongoing here as well.

More detail on the race:
This can occur when there's an ongoing merge on a keydir that was frozen prior to the beginning of the merge. If the merge has scheduled deferred deletions, a new thread opening the cask read_write can come in and force those deletion to happen immediately. This is too early in some cases; an ongoing fold may still be iterating over these files, and if one is missing, all of the keys that a deleted file owns at that moment will be omitted from the fold.

Contributor

jonmeredith commented Mar 29, 2013

+1 merge. All unit tests and quickcheck tests pass. Ran it through it's paces on pulse for 30 mins

OK, passed 10323 tests

54.8238% incr_clock
9.0563% puts
8.9450% put
4.5027% get
3.6507% fork
3.3076% {needs_merge,false}
3.2887% {fork_merge,not_needed}
2.7448% bc_open
2.6975% delete
2.1199% {needs_merge,true}
1.8473% {fork_merge,ok}
0.8787% sync
0.4668% fold_keys
0.4580% bc_close
0.4422% fold
0.3107% {kill,merger}
0.3033% {fork_merge,already_queued}
0.1459% {kill,reader}
0.0101% {fork_merge,'EXIT'}
schedule:    Count: 10323   Min: 57   Max: 7097   Avg: 189.45   Total: 1955733
true
2> q().

slfritchie added a commit that referenced this pull request Mar 29, 2013

Merge pull request #86 from basho/slf-purge-stale-file-race
Fix race with deleting stale input files from merge, h/t @evanmcc

@slfritchie slfritchie merged commit 469cd7d into master Mar 29, 2013

1 check passed

default The Travis build passed
Details

@engelsanchez engelsanchez deleted the slf-purge-stale-file-race branch Mar 28, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment