TokuMX Concurrency

TokuMX Concurrency (you're right, this is weird)

The Project

Create concurrent query benchmark. Add uncompressible data. Run out of memory.
Create concurrent update benchmark. Add invariant for correctness testing.
- http://github.com/leifwalsh/cortisol
Get profile data when running these benchmarks.
Create concurrent insert benchmark.
- Should support sequential and random keys.
Create concurrent delete benchmark.

The Performance Scaling Problem

Reads scale with client threads. Reads take a read lock on database's reader writer lock.
Writes do not scale. Writes take a write lock on database's reader writer lock.

TokuMX Locking Strategy

Delegate row level locking to the fractal tree's lock tree.
Take a read lock on the database's reader writer lock when reading a collection.
Take a read lock on the database's reader writer lock when writing a collection.
Take a write lock on the database's reader writer lock when changing the database's meta-data.

Implicit collection creation

Creating a collection requires a write lock on the meta-data since the meta-data is being changed.
When a write operation needs to implicitly create a collection, the read lock acquired for the write operation will be released, a write lock will be taken, and the write operation will be restarted.

Multi-key

If you insert a document that has an array as the value of an indexed field, this causes that index to become "multi-key". If an index is multi-key, queries will need to do de-duplication when reading from that index.
Currently, the metadata about whether an index is multi-key is stored in the (per-collection) NamespaceDetails object, which is stored on disk in the namespace index.
So when an index becomes multi-key, the same transaction must modify the namespace index, and this would require a write lock on the database, or at least on the collection.
We need to do some investigation to determine how to handle this.

Index Creation

Index creation occurs via an insert operation into the system indexes collection. Since index creation changes the meta-data, a write lock must be taken.

Drawbacks of database level locking

These operations block out reads and write on the other collections in the same database.

Creating a collection takes the database's write lock.
Dropping a collection takes the database's write lock.
Creating an index takes the database's write lock.
Dropping an index takes the database's write lock.

Design Alternatives

The fractal tree's lock tree could be used to transactionally manage meta-data locks.
These locks could be acquired and released as side effects of reading and writing rows in the namespace dictionary.
Alternately, these locks could be managed by the lock tree if APIs similar to BDB's lock APIs were supported.
The current lock tree does not support read locks, which are necessary to support the meta-data locking.

Benchmarks

In-memory

The in-memory tests used the following configuration:

--collections 1
--indexes 0
--documents 1000000
--fields 2
--padding 100
--compressibility 0.50

We ran in-memory tests on vanilla mongodb 2.4.0 and tokumx at 31531af, using cortisol at 52ab434, on roadrunner (8 cores, 8GB RAM, 3x1TB RAID0 array). Each test was 180 seconds, and each concurrency level was run twice.

At this revision of tokumx, we have a bug where every index is clustering. This caused the load phase to be slower, and for updates and saves to be slower.

In vanilla, the load speed was 22792.422844 inserts/s and the size was 1.2G. In tokumx, the load speed was 15092.933327 inserts/s and the size was 238M.

We ran the tokumx benchmarks again at revision 0626f30 (post lock breakup). This time, the load speed was 26245.151444 inserts/s and the size was 170M.

Out-of-memory

The out-of-memory tests used the following configuration:

--collections 1
--indexes 3
--documents 1000000000
--fields 5
--padding 0
--compressibility 0.50

The tests were run on lex2 (4 cores, 16GB RAM, single disk) on vanilla mongodb 2.4.1 and tokumx f3d68fc, using cortisol at df7cbb4. Each test was for 180 seconds and run twice and averaged.

In vanilla, the load speed was 29503.127449 inserts/s and the size was 202G. In tokumx, the load speed was 17042.512093 inserts/s and the size was 23G.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly