Mv level work4
Clone this wiki locally
- merged to "master" August 20, 2013
- ready for code review August 1, 2013
- development started July 22, 2013
History / Context
This branch is a collection of little changes. All are subtle changes that help smooth latencies. No big performance gains.
- fix all Google unit tests to run with Basho's leveldb changes
- finish mv-sst-fadvise via FADV_RANDOM support (completes work in mv-sst-fadvise)
- minor throttle adjustments in DoCompactionWork()
- InsertQueue0 code
"make check" unit tests
There are several source files that end with _test.cc that change in this branch. These are unit tests from Google. The unit tests are highly tied to file sizing and expected compaction output placements. This branch addresses changes to these internal assumptions to match Basho's adjustments.
mv-sst-fadvise branch needed to be released before the final pair of changes could be tested. Those two changes are now ready: set read-only files to FADV_RANDOM by default (util/env_posix.cc) and reset to FADV_SEQUENTIAL when a cached file subsequently becomes a compaction input (db/table_cache.cc). The result is better page cache management on Linux systems.
write throttle adjustments
Two minor throttle adjustments were made within DoCompactionWork() (db/db_impl.cc). First, the time spent waiting on higher priority files was previously not removed from measured total time to compact. This was in keeping with the idea of being conservative, even over conservative, in how the throttle was calculated. 6 months later we now know this conservative parameter was overkill.
The second change involves throttling "write batches". Riak 1.3 introduced active anti-entropy, AAE. Riak 1.3's throttle was over conservative and would horribly slow AAE when it submitted large write batches. Riak 1.4's throttle is much improved and no longer hurts AAE. And, restoring the proper throttling to write batches greatly improves long term latencies with Riak's 2i feature.
The ultimate goal of the next major release is to eliminate Basho's thread blocks and the hacked priority queues (InsertQueue2() and now InsertQueue0() in util/env_posix.cc). But time is not always available for big changes in any given release cycle. This quick hack helps during heavy write operations. Hopefully it will be eliminated by a proper, global worker queue model ... soon.
The current thread block model assigns multiple databases/vnodes to a group of background worker threads. A database can only have one pending work item across any of the three worker threads in its thread block. A database might have low priority compaction of level-1 or higher waiting in the low priority queue. This low priority compaction can be replaced by a higher priority level-0 or imm compaction, losing its place in the low priority queue. This hack lets the database regain some positioning in the low priority queue once its high priority compaction completes.