Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.x optimize compactor #2754

Merged
merged 6 commits into from Apr 14, 2020
Merged

3.x optimize compactor #2754

merged 6 commits into from Apr 14, 2020

Conversation

davisp
Copy link
Member

@davisp davisp commented Apr 2, 2020

Overview

Reviving an old compactor optimization from PR #806. I'm also going to add in some work from #1006 to give us better visibility on compaction id_seq copy phases.

Testing recommendations

$ make check

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

@davisp davisp changed the base branch from master to 3.x Apr 2, 2020
@davisp davisp force-pushed the 3.x-optimize-compactor branch 3 times, most recently from 455d399 to 1fe106f Compare Apr 3, 2020
@davisp davisp marked this pull request as ready for review Apr 3, 2020
@davisp davisp force-pushed the 3.x-optimize-compactor branch from 1fe106f to ea11d1e Compare Apr 13, 2020
src/couch/src/couch_file.erl Outdated Show resolved Hide resolved
@davisp davisp force-pushed the 3.x-optimize-compactor branch from ea11d1e to 262db67 Compare Apr 13, 2020
@nickva
Copy link
Contributor

@nickva nickva commented Apr 13, 2020

Found a bug when running ./compact_bench.py -r 50 -n 5000000

 db shards/00000000-ffffffff/cbenchdb.1586802992 died with reason {badarith,[{erlang,div,[100
0,16291.3],[]},{couch_bt_engine_compactor,update_compact_task,1,[{file,"src/couch_bt_engine_compactor.erl"},{line,724}]},{couch_emsort,merge_back_bone,5,[{file,"s
rc/couch_emsort.erl"},{line,278}]},{couch_emsort,decimate,2,[{file,"src/couch_emsort.erl"},{line,265}]},{couch_emsort,merge,2,[{file,"src/couch_emsort.erl"},{line
,208}]},{couch_bt_engine_compactor,sort_meta_data,1,[{file,"src/couch_bt_engine_compactor.erl"},{line,515}]},{lists,foldl,3,[{file,"lists.erl"},{line,1263}]},{cou
ch_bt_engine_compactor,start,4,[{file,"src/couch_bt_engine_compactor.erl"},{line,75}]}]}

@davisp davisp force-pushed the 3.x-optimize-compactor branch 2 times, most recently from 138850e to a7b5f8b Compare Apr 13, 2020
@davisp
Copy link
Member Author

@davisp davisp commented Apr 13, 2020

Ooops. That div bug should be fixed now. Forgot to do integer divisions.

@davisp davisp force-pushed the 3.x-optimize-compactor branch from a7b5f8b to 45458fe Compare Apr 13, 2020
nickva
nickva approved these changes Apr 13, 2020
Copy link
Contributor

@nickva nickva left a comment

Excellent work.

I like the improved, detailed stats, and there is an impressive metadata copy phase speedup:

For instance I used my compact_bench to generate 100k docs with 1000 revision.

On 3.x branch:

./compact_bench.py  -r 1000 -n 100000

**************** num=100000,revisions=1000 ****************
Version: 3.0.0-cf5f963
 Compacting : 720.8s docs/s:138 revs/s:138735 fsize:2812248278

With this PR

./compact_bench.py  -r 1000 -n 100000

**************** num=100000,revisions=1000 ****************
Version: 3.0.0-45458fe
 Compacting : 425.4s docs/s:235 revs/s:235079 fsize:2812264662

40% improvement (!) for when doc revisions start bumping into the revs_limit

@davisp davisp force-pushed the 3.x-optimize-compactor branch from 45458fe to 1412219 Compare Apr 14, 2020
davisp and others added 6 commits Apr 14, 2020
This change adds a new `#comp_st{}` record that is used to pass
compaction state through the various compaction steps. There are zero
changes to the existing compaction logic. This merely sets the stage for
adding our docid copy optimization.
These functions allow the caller to append multiple terms or binaries to
a file and receive the file position and size for each individual
element. This is to optimize throughput in situations where we want to
write multiple pieces of independant data.
This uses the new couch_file:append_terms/2 function to write all chunks
in a single write call.
This updates couch_db_updater to use the new multi-IO API functions
(append_terms/pread_terms) in couch_file. This optimization benefits us
by no longer requiring the `couch_emsort:merge/1` step to copy
`#full_doc_info{}` records multiple times while also not being penalized
by signficantly increasing the number of calls through couch_file APIs.
Previously the sort and copy phases when handling document IDs was not
measured in _active_tasks. This adds size tracking to allow operators a
way to measure progress during those phases.

I'd like to thank Vitaly for the example in #1006 that showed a clean
way for tracking the size info in `couch_emsort`.

Co-Authored-By: Vitaly Goot <vitaly.goot@gmail.com>
@davisp davisp merged commit 123bf82 into 3.x Apr 14, 2020
1 check passed
@davisp davisp deleted the 3.x-optimize-compactor branch Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants