Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Improve fold speed for large files filled with small objects. #83

Merged
merged 1 commit into from Apr 23, 2013

Conversation

Projects
None yet
2 participants
Contributor

evanmcc commented Mar 19, 2013

This patch rewrites all of the file folds as subfolds of a function that folds over adaptively-sized chunks of the file in question, using return values as a simple protocol to communicate with the main folding function.

Some simple benchmarking. See test/bcfold_setup for info on key and value sizes.

pevm-fast-fold:

Report for run: small
Open:                     23843056
Datafile Key Fold:         8119489   6795835
Datafile Fold:            31599952   6795835

Report for run: medium
Open:                      1866682
Datafile Key Fold:         6934109    509849
Datafile Fold:             9501606    509849

Report for run: large
Open:                       205041
Datafile Key Fold:         7065257     20948
Datafile Fold:             8051616     20948

Report for run: cs
Open:                       150339
Datafile Key Fold:          573293      1024
Datafile Fold:             8601723      1024

master:

Report for run: small
Open:                    298579027
Datafile Key Fold:       397753135   6795835
Datafile Fold:           168100962   6795835

Report for run: medium
Open:                     22597198
Datafile Key Fold:        72591363    509849
Datafile Fold:            17351239    509849

Report for run: large
Open:                      1095118
Datafile Key Fold:         9041969     20948
Datafile Fold:             7874795     20948

Report for run: cs
Open:                       210885
Datafile Key Fold:          590771      1024
Datafile Fold:             7196062      1024

These tests still need to be re-run on some machines with slower disks, but I wanted to get some feedback on the code because I am concerned with the fragility/complexity of the current interface.

@ghost ghost assigned evanmcc Mar 19, 2013

@ghost ghost assigned engelsanchez Apr 15, 2013

@engelsanchez engelsanchez and 1 other commented on an outdated diff Apr 17, 2013

src/bitcask_fileops.erl
+ erlang:crc32(CRC0, Bytes),
+ Rem - ReadLen);
+ error ->
+ false
+ end;
+ _ -> false
+ end.
+
+read_crc(Fd) ->
+ case bitcask_io:file_read(Fd, ?HINT_RECORD_SZ) of
+ {ok, <<0:?TSTAMPFIELD,
+ 0:?KEYSIZEFIELD,
+ ExpectCRC:?TOTALSIZEFIELD,
+ (?MAXOFFSET):?OFFSETFIELD>>} ->
+ ExpectCRC;
+ _ -> 0
@engelsanchez

engelsanchez Apr 17, 2013

Contributor

The chances are infinitesimal, but it is possible for bitcask_io:file_read on line 335 to return a truncated Bytes if EOF was found early, and then for read_crc to return 0 and that somehow match the expected CRC. Why not return something that will always fail the comparison here, like the atom undefined?

@evanmcc

evanmcc Apr 17, 2013

Contributor

will do.

@engelsanchez engelsanchez commented on the diff Apr 17, 2013

src/bitcask_fileops.erl
+ N when N > ChunkSz0 ->
+ ChunkSz0 * 2;
+ _ -> ChunkSz0
+ end,
+ {Other, CS}
+ end,
+ case bitcask_io:file_read(Fd, ChunkSz) of
+ {ok, <<Bytes0/binary>>} ->
+ Bytes = <<Prev/binary, Bytes0/binary>>,
+ case FoldFn(Bytes, IntFoldFn, Acc0, 0, Args0,
+ byte_size(Bytes0) /= ChunkSz) of
+ %% foldfuns should return more when they don't have enough
+ %% bytes to satisfy their main binary match.
+ {more, Acc, Consumed, Args} ->
+ Rest =
+ case Consumed > byte_size(Bytes) of
@engelsanchez

engelsanchez Apr 17, 2013

Contributor

It's functionally equivalent, but here it could have been Consumed >= byte_size(Bytes)

@engelsanchez engelsanchez and 1 other commented on an outdated diff Apr 17, 2013

src/bitcask_fileops.erl
+ byte_size(Bytes0) /= ChunkSz) of
+ %% foldfuns should return more when they don't have enough
+ %% bytes to satisfy their main binary match.
+ {more, Acc, Consumed, Args} ->
+ Rest =
+ case Consumed > byte_size(Bytes) of
+ true -> <<>>;
+ false ->
+ <<_:Consumed/bytes, R/binary>> = Bytes,
+ R
+ end,
+ fold_file_loop(Fd, FoldFn, IntFoldFn,
+ Acc, Args, Rest, ChunkSz);
+ %% foldfuns should return skip when they have no need for
+ %% the rest of the binary that they've been handed.
+ %% see fold_int_loosp for proper usage.
@engelsanchez

engelsanchez Apr 17, 2013

Contributor

If you're back in here, maybe fix that typo? Reading loosp makes me giggly :)

@evanmcc

evanmcc Apr 19, 2013

Contributor

done

@engelsanchez engelsanchez and 1 other commented on an outdated diff Apr 17, 2013

src/bitcask_fileops.erl
error_logger:error_msg("Hintfile '~s' contains pointer ~p ~p "
"that is greater than total data size ~p\n",
[HintFile, Offset, TotalSz, DataSize]),
- {error, {incomplete_hint, 3}};
+ {error, {trunc_datafile, Acc0}}
+ end;
+%% error case where we've gotten to the end of the file without the CRC match
+fold_hintfile_loop(<<>>, _Fun, Acc, _Consumed, _Args, EOI) when EOI =:= true ->
@engelsanchez

engelsanchez Apr 17, 2013

Contributor

It looks as if we are not handling the case of a truncated hintfile. That is, it could be truncated in the middle of a hint record, which would trigger the 'more' case below, then come back here with EOI=true and some unprocessed bytes. That should return an incomplete_hint error too. Or am I missing something?

@evanmcc

evanmcc Apr 19, 2013

Contributor

So, most of the time this is going to be caught by the CRC check; we only use this hintfile fold on open. That said, there is a race there, and this will fail with a function_clause error. I'll add a case to handle that. Good catch.

Contributor

engelsanchez commented Apr 22, 2013

The last commit addressing the review comments looks good.

Here is a summary of running the accompanying perf tests on two systems. Overall it looks pretty good, with improvements across the board except for the CS like data which stays the same. I'm going to be running a few mixed data scenarios in riak and report back soon.

Results OSX 10.8.2 on SSD:

Small: Key fold 60x, datafile fold 5x
Medium: Key fold 10x, datafile fold 2x
Large: ~20-30% improvement
CS: Virtually the same

Results Ubuntu 12 on RAID:

Small: Key fold 24x, datafile fold 2-3x
Medium: Key fold 4x, datafile fold 2x
Large: Key fold 3x, datafile fold same
CS: Virtually the same

@engelsanchez engelsanchez commented on the diff Apr 22, 2013

src/bitcask_file.erl
@@ -64,6 +64,9 @@ file_read(Pid, Size) ->
file_write(Pid, Bytes) ->
file_request(Pid, {file_write, Bytes}).
+file_position(Pid, Position) ->
@engelsanchez

engelsanchez Apr 22, 2013

Contributor

One minor nitpick: we now have added file_position (Erlangy) and we have the existing file_seekbof (C-ish kind of). Consider standardizing those two: either file_seek or change file_seekbof to file_position_bof or something.

Contributor

engelsanchez commented Apr 22, 2013

Some startup time results. The data sets consist of objects with 1 byte values and variable key lengths following a uniform distribution. The _ff columns used the "fast fold" version of bitcask, _m is "master".

Size OSX_m OSX_ff Ubuntu_m Ubuntu_ff
1k 2m15s 0m45s 3m 1m
4k 0m41 0m27s 1m15s 0m46s
8k 1m18 1m10 1m12 0m53
32k 1m 1m 1m 1m

As expected, the size of the keys, specially near the small range are marked and disappear with ever larger values.

Contributor

engelsanchez commented Apr 23, 2013

I ran a test to see if handoff speed had changed at all with these changes applied. I had 4 partitions, 2 of which were transfered (1 node -> 2 node setup), filled with 100k objects. In the end there wasn't any difference.
I could run a bunch more tests with different sizes, randomized sizes, etc. But I think at this point it's all looking good.

👍 💃 ⛵️

Let's do the merge, let me eyeball it to make sure there wasn't any problems and call it a day, shall we @evanmcc ?

Contributor

evanmcc commented Apr 23, 2013

I'll go ahead and merge. I don't think that fold speed is a limiting
factor to handoff at this point, but it'll be interesting to see how things
look with and without this patch once the ongoing handoff work lands.

@ghost ghost assigned evanmcc Apr 23, 2013

Speed up open and various fold operations. This is comprised of
two major changes:
  - replace the visitor fold with an explict crc check.
  - rewrite all folds in terms a new large file fold function that
    adaptively prefetches (and jumps around when prefetching is
    the wrong thing to do) data so that we're operating on larger
    chunks of data, instead of doing a large number of small reads.

evanmcc added a commit that referenced this pull request Apr 23, 2013

Merge pull request #83 from basho/pevm-fast-fold
Improve fold speed for large files filled with small objects.

@evanmcc evanmcc merged commit 2d8f1b9 into master Apr 23, 2013

1 check failed

default The Travis build failed
Details
Contributor

engelsanchez commented Apr 24, 2013

Hmmm... Did you just rebase and squash before merging? I see all the history has been destroyed :(

Contributor

evanmcc commented Apr 24, 2013

Yeah. I was under the misapprehension that this was our standard, (and IMO
misguided) practice. Sorry, it didn't occur to me to open a new PR for the
squashed version, that's a much better option.

@jaredmorrow jaredmorrow deleted the pevm-fast-fold branch May 11, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment