You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background. Prior to Riak 0.14.2, all fold operations would block the relevant vnode and prevent the vnode from servicing requests. This was changed in Riak 1.0, with the introduction of asynchronous folds that used an async worker pool, as well as additions to the various backends to support async folds.
To support async folds, Bitcask freezes it's in-memory keydir and has async folds iterate over the frozen keydir, with new concurrent writes going to a pending keydir. Since the keydir is in-memory, Bitcask only allows a single frozen keydir. Multiple folds can reuse the same keydir, but only if there has not been writes since the keydir was frozen. If a fold is started, a write occurs, and then a new fold is started, the second fold will block until the first fold finishes, and then re-freeze the keydir.
The Problem. Blocking async folders is expected and not a big deal. However, when determining if a vnode should handoff data, Riak will end up calling riak_kv_vnode:is_empty which will call, for a Bitcask vnode, bitcask:is_empty. In Bitcask, the is_empty check is implemented through as a fold (start a fold and exit as soon as any key is found) to deal with tombstones, expired keys, etc. This fold is executed directly in the vnode pid, not an async worker, and will block the vnode in scenarios such as above.
There are two scenarios:
An existing fold is running (list keys, one of the folds used in mDC replication, etc) and handoff is triggered. The vnode will then block until the first fold finishes, servicing no requests and leading to an ever growing message queue.
Handoff is triggered (which starts a fold), and then handoff is re-triggered in the future. The vnode manage retriggers handoff periodically as a fault-tolerance mechanism. The handoff manager ensures that a handoff won't be started if already running. However, the is_empty check occurs before calling the handoff manager. So, handoff A to B, write, handoff A to B will cause the vnode to block on the second handoff request, again servicing no requests and leading to a growing message queue.
The text was updated successfully, but these errors were encountered:
Add bitcask:is_empty_estimate to quickly determine if a bitcask contains
no data. Currently, determining if a bitcask has data requires folding
over the keydir to ensure tombstones and expired keys are skipped.
However, this is a potentially blocking operation and no where in Riak do
we actually need perfect knowledge.
The estimate is determined from the bitcask stats, which may overcount
data, but will not undercount. Therefore, the estimated result may return
false when the bitcask is actually empty, but it will never return true
when there is data.
See issue: basho/riak_kv#423
Previously, determining if a bitcask was empty or not was accomplished
through a fold over the keydir which is a potentially blocking operation.
This commit changes riak_kv_bitcask_backend:is_empty to use the new
function bitcask:is_empty_estimate.
The estimate is determined from the bitcask stats, which may overcount
data, but will not undercount. Therefore, the estimated result may return
false when the bitcask is actually empty, but it will never return true
when there is data. In all cases where is_empty is currently used in
Riak, an estimate is acceptable. In the worst case, additional work may
be triggered that is unnecessary but safe (eg. folding over an empty
bitcask).
See issue: #423
Background. Prior to Riak 0.14.2, all fold operations would block the relevant vnode and prevent the vnode from servicing requests. This was changed in Riak 1.0, with the introduction of asynchronous folds that used an async worker pool, as well as additions to the various backends to support async folds.
To support async folds, Bitcask freezes it's in-memory keydir and has async folds iterate over the frozen keydir, with new concurrent writes going to a pending keydir. Since the keydir is in-memory, Bitcask only allows a single frozen keydir. Multiple folds can reuse the same keydir, but only if there has not been writes since the keydir was frozen. If a fold is started, a write occurs, and then a new fold is started, the second fold will block until the first fold finishes, and then re-freeze the keydir.
The Problem. Blocking async folders is expected and not a big deal. However, when determining if a vnode should handoff data, Riak will end up calling
riak_kv_vnode:is_empty
which will call, for a Bitcask vnode,bitcask:is_empty
. In Bitcask, theis_empty
check is implemented through as a fold (start a fold and exit as soon as any key is found) to deal with tombstones, expired keys, etc. This fold is executed directly in the vnode pid, not an async worker, and will block the vnode in scenarios such as above.There are two scenarios:
is_empty
check occurs before calling the handoff manager. So, handoff A to B, write, handoff A to B will cause the vnode to block on the second handoff request, again servicing no requests and leading to a growing message queue.The text was updated successfully, but these errors were encountered: