Possible Race Condition in bitcask:merge #47

benmmurphy opened this Issue Aug 8, 2012 · 3 comments

4 participants


I don't understand the merging code very well. But I'm working under the assumption that if a merge is performed on a file that is currently being written to then data can be lost and this is why the merge function tries to exclude the file currently being written to.

when you don't pass in an array of files merge calls readable_files:

-spec merge(Dirname::string()) -> ok.
merge(Dirname) ->
    merge(Dirname, [], readable_files(Dirname)).

readable_files gets the current file being written to so it can ignore that files in list_data_files.

readable_files(Dirname) ->
    %% Check the write and/or merge locks to see what files are currently
    %% being written to. Generate our list excepting those.
    WritingFile = bitcask_lockops:read_activefile(write, Dirname),
    MergingFile = bitcask_lockops:read_activefile(merge, Dirname),
    list_data_files(Dirname, WritingFile, MergingFile).

list_data_files(Dirname, WritingFile, Mergingfile) ->
    %% Get list of {tstamp, filename} for all files in the directory then
    %% reverse sort that list and extract the fully-qualified filename.
    Files = bitcask_fileops:data_file_tstamps(Dirname),
    [F || {_Tstamp, F} <- reverse_sort(Files),
          F /= WritingFile,
          F /= Mergingfile].

however, i think there is a small race condition because the following can happen:

  1. current write file is F1
  2. merge function reads current write file as F1
  3. current write file is switched from F1 to F2
  4. list_data_files finds F1 and F2 and excludes F1
  5. merge starts working on F2 which is currently being written to
  6. data loss?

We'll have to ponder this a bit, but at first glance this will likely just cause the merge to crash when it tries to merge an incomplete file, worst case. I agree it's suspect, though -- thanks for pointing it out!


As Dizzy says, we'll need to take a better look at it. The way Riak uses bitcask is by calling bitcask_merge_worker:merge/3 ( https://github.com/basho/riak_kv/blob/master/src/riak_kv_bitcask_backend.erl#L373 ) but we want other bitcask users to be safe.


Many thanks for the report, sorry about the delay. Fixed by #68

@slfritchie slfritchie closed this Nov 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment