Skip to content

mv mutex cleanup

Matthew Von-Maszewski edited this page May 5, 2014 · 10 revisions

Status

  • merged to master
  • code complete May 4, 2014
  • development started April 30, 2014

History / Context

Each database / vnode in leveldb contains a mutex that protects the table file (.sst) management structures. Alll user activity must briefly acquire the mutex before a Read, Write, or Iterator operation. The background compactions must also briefly acquire the mutex before and after a compaction. This branch addresses two scenarios where a background compaction might hold the mutex for an extended time during heavy disk activities. Holding the mutex during compaction blocks user operations impacting throughput and latency.

Logging blocks

The first scenario deals with common logging statements in the compaction process. Here is a sample of the target messages:

2014/04/25-18:39:08.496084 7f8d767a2700 Level-0 table #15: started
2014/04/25-18:39:08.707713 7f8d767a2700 Level-0 table #15: 35220571 bytes, 29779 keys OK
2014/04/25-18:39:08.725394 7f8d767a2700 Delete type=0 #12
2014/04/25-18:39:08.727119 7f8d71798700 Compacting 6@0 + 0@1 files
2014/04/25-18:39:10.886134 7f8d71798700 Generated table #16: 178675 keys, 210224702 bytes
2014/04/25-18:39:10.887666 7f8d71798700 Compacted 6@0 + 0@1 files => 210224702 bytes
2014/04/25-18:39:10.889495 7f8d71798700 compacted to: files[ 0 1 0 0 0 0 0 ]

These messages were being generated while the compaction thread held the database mutex. The problem is that each message generated an fwrite() and fflush() call. Both calls, particularly the fflush(), could block for extended periods during heavy disk activity. Blocking during the logging became blocking of all user operations. The targeted log messages were moved and/or mutex unlocked to prevent blocking.

The changes only addressed common, regularly used log messages. Error logging, debug logging, and such were not changed.

Old file deletion

The second scenario addressed by this branch deals with the deletion of old .sst files. There is a routine in db/db_impl.cc called DeleteObsoleteFiles(). This routine was regularly called with the database mutex held. The routine reads the current disk directories for all levels and potentially deletes any .sst files that are no longer needed. Large databases can easily contain tens of thousands of directory entries (.sst files). The reading, processing, and deleting of the files can hold the mutex for extended periods. This branch no longer holds the mutex during these disk operations.

[Note: there is an old, unimplemented branch that reduced the number of times per minute that DeleteObsoleteFiles() would be called. That branch will likely be revived since its incremental benefits were likely hidden by the mutex problem.]

Branch description

db/db_impl.cc

DeleteObsoleteFiles() is called by the KeepOrDelete() routine. The original code assumed the database mutex, this->mutex_, was held. The new assumption is that the mutex is NOT held. The mutex is only locked if the version structures / routines are used. The changes move the "Deleted" log messages and all directory processing for deletes outside of the mutex lock.

WriteLevel0Table() addresses mutex and logging via simple copy/paste of the logging statements to a different position in the routine. BackgroundCall2() and BackgroundImmCompactCall() do the same thing.

Clone this wiki locally