Browse files

MB-6638: use fork of file_sorter that doesn't use compression

We've found that file driver may close underlying file (either OS file
or gzFile) well before async read on it is done. With real OS file
that's mostly harmless, and kernel will synchronize racing close and
read (or any other syscall for same fd) and read will either succeed
just before close or it'll get EBADF (well, or read some other newly
opened innocent file).

But gzFile doesn't have any synchronization. So attempt to read it
concurrently or shortly after close may cause lots of problems.

Given we don't really use compressed feature of file sorter I'm just
cutting out one small piece of it that's using this efile feature.

Entire file_sorter.erl is forked inside couchdb source tree and single
change is made. It removes compressed option in file:open call in

NOTE: file_sorter.erl is building improper lists which dialyzer
authors choose to dislike. I had to disable that somewhat questionable
warning in order to keep code dialyzer-clean. "Fixing" file_sorter.erl
is IMHO not an option as we want to keep it as intact as possible.

Change-Id: I8ded572c18e236661d5b7df2a780c8f293fc689d
Reviewed-by: Aliaksey Artamonau <>
Reviewed-by: Filipe David Borba Manana <>
Tested-by: Aliaksey Kandratsenka <>
  • Loading branch information...
1 parent 5938ee3 commit 6bbe1cf89b2f6b5c9cf098b81c5ea60d339f8f0a Aliaksey Kandratsenka committed with alk Dec 26, 2012
@@ -198,6 +198,7 @@ rebuild_plt:
dialyzer: all $(COUCHDB_PLT)
dialyzer --plt $(COUCHDB_PLT) \
--verbose \
+ -Wno_improper_lists \
-pa src/couchdb \
-pa src/couch_set_view \
-pa src/couch_index_merger \
@@ -1174,7 +1174,7 @@ spawn_merge_worker(LessFun, TmpDir, Workers, FilesToMerge, DestFile) ->
{format, fun file_sorter_format_function/1}
- case file_sorter:merge(FilesToMerge, DestFile, SortOptions) of
+ case file_sorter_2:merge(FilesToMerge, DestFile, SortOptions) of
ok ->
{error, Reason} ->
@@ -29,6 +29,7 @@ CLEANFILES = $(compiled_files) $(doc_base)
source_files = \
erl_diag.erl \
file2.erl \
+ file_sorter_2.erl \
couch.erl \
couch_api_wrap.erl \
couch_api_wrap_httpc.erl \
@@ -94,6 +95,7 @@ EXTRA_DIST = $(source_files) couch_api_wrap.hrl couch_db.hrl couch_js_functions.
compiled_files = \
erl_diag.beam \
file2.beam \
+ file_sorter_2.beam \ \
couch.beam \
couch_api_wrap.beam \
@@ -812,7 +812,7 @@ initial_copy_compact(#db{docinfo_by_seq_btree=SrcBySeq,
SrcById, DestFd, []),
% no need to specify a sort function, the built in erlang sort will
% sort the terms by the first differing slot, which is id
- {ok, NewByIdRoot} = file_sorter:sort([TempFilepath],
+ {ok, NewByIdRoot} = file_sorter_2:sort([TempFilepath],
BtreeOutputFun,[{tmpdir, TempDir}]),
ok = file:delete(TempFilepath),
ok = couch_file:flush(DestFd),
Oops, something went wrong.

0 comments on commit 6bbe1cf

Please sign in to comment.