Skip to content

mv aae segfault

Matthew Von-Maszewski edited this page Apr 6, 2016 · 6 revisions

Status

  • merged to develop - April 5, 2016
  • code complete - April 4, 2016
  • development started - April 2, 2016

History / Context

This is a discussion of changes made to eleveldb, Basho's Erlang to leveldb interface. Tradition is to place changes to either in this wiki.

Recent test of not yet released AAE improvements (active anti-entropy) stumbled across a coding bug in eleveldb.cc's async_iterator_move(). This function is extremely complex. The complexity exists to counter a large performance loss that occurred when Basho moved iterator operations off Erlang scheduler threads to independent worker threads. The threading change was forced because Erlang was killing off its own scheduler threads when it thought leveldb was taking "too long", fundamentally killing Riak.

Basho countered the performance loss by anticipating that an iterator move request is most likely to be followed by another iterator move request. The moment the code releases the current move's data to Erlang it automatically initiates the next move request. That lets Erlang process the current move's results while eleveldb simultaneously retrieves the next move. The next move is therefore already available to Erlang upon its next request instead of Erlang waiting for leveldb.

The AAE improvements experienced segfaults in more than one location within leveldb. But it turns out that all the locations were symptoms of one problem. Debugging via leveldb's assert() calls demonstrated that all segfaults were caused by iterator state changes happening in the midst of a seek operation. Further debugging isolated the seek operations to being after an iterator move operation that reached the end of the key space. A move operation reaching the key space end sets the iterator to "invalid" and clears an internal key buffer within the iterator object. This key buffer is also used by the seek operation to store its key.

All leveldb debug asserts() were the result of either the internal key buffer being cleared after the seek operation populated it or the iterator state changing to invalid in the middle of the seek operation. The only way this could happen is if a prefetch operation was still active during the seek operation. AAE code explicitly disables prefetch prior to sending a seek operation, except when the seek was preceded by an end of key space error. Testing demonstrated that attempting to disable prefetch after end of key space caused the code to deadlock.

Worse, a prefetch operation that hits the end of key space error does not disable the next prefetch, it initiates it. This next prefetch is the stray operation that interacts with the seek operation AAE sends when its move hits the end of key space.

Branch Description

The same bug appears in two locations within the eleveldb code base. The first location is within eleveldb.cc's async_iterator_move() function. The second location is workitem.cc's MoveTask::DoWork() function. async_iterator_move() executes on Erlang scheduler threads and negotiates returning iterator results from worker threads. MoveTask::DoWork() executes on the eleveldb worker threads and negotiates returning work to the Erlang scheduler thread. Both routines had the same problem. They needed to automatically disable future prefetch operations when the iterator reaches the end of the key space. The deadlock symptom was due to MoveTask::DoWork() not stopping prefetch. The segfault symptom was due to async_iterator_move() not stopping prefetch.

eleveldb repository: c_src/eleveldb.cc

async_iterator_move() handles all iterator seek and move requests. It manages all requests via one of three code blocks. The first code block is for all operations that are not prefetch related. The second code block is for prefetch operations where Erlang scheduler thread precedes the worker thread's prefetch completion. The third block is for prefetch operations where the worker thread has prefetch data waiting for the Erlang scheduler thread. It is this third block that required a correction.

The third block previously retrieved the prefetch data, which might actually be an end of key space flag (invalid_iterator error). And it then immediately requested a prefetch of the next key before returning the results of the current operation. Requesting the next key when the current key is an end of key space flag was wrong. This branch verifies that the end of key space has not been reached before initiating the prefetch request of the next key.

eleveldb repository: c_src/workitems.cc

MoveTask::DoWork() passes all iterator seek and move requests directly to leveldb. It executes on eleveldb worker threads. Upon completion of the leveldb iterator calls, MoveTask::DoWork() uses atomic operations to negotiate the method of delivering data to Erlang scheduler threads. If the worker thread has data before the Erlang scheduler thread requests the data, the worker thread does nothing in anticipation that the scheduler thread will later retrieve the data directly from the MoveTask object. If the worker thread finishes after the Erlang scheduler thread, the worker thread must send the data to Erlang via Erlang's messaging protocol. In this latter case, the worker thread will also initiate a prefetch of the next key.

MoveTask::DoWork() did test whether or not the iterator had reached the end of key space before initiating a prefetch. A deadlock condition occurred because necessary state flags were inconsistent with MoveTask::DoWork() not initiating a prefetch upon reaching the end of key space. Setting the state flag m_PrefetchStarted to false avoids the deadlock. This flag informs async_iterator_move() that no prefetch operation is pending.

eleveldb repository: test/iterators.erl

This Eunit test now includes additional steps that will fail both of the previous problems, but succeed due to this branch. The test does not necessarily trip the segfault. The segfault has only manifested under certain operating systems and hardware configurations after hundreds of thousands of move / seek operations.

eleveldb repository: c_src/build_deps.sh

This change is unrelated to the iterator bug. This is subtle script correction that allows the script operate properly on Solaris platforms.