Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Riak KV vnodes can block in certain scenarios when using Bitcask #423

Closed
jtuple opened this issue Nov 2, 2012 · 0 comments
Closed

Riak KV vnodes can block in certain scenarios when using Bitcask #423

jtuple opened this issue Nov 2, 2012 · 0 comments
Assignees

Comments

@jtuple
Copy link
Contributor

jtuple commented Nov 2, 2012

Background. Prior to Riak 0.14.2, all fold operations would block the relevant vnode and prevent the vnode from servicing requests. This was changed in Riak 1.0, with the introduction of asynchronous folds that used an async worker pool, as well as additions to the various backends to support async folds.

To support async folds, Bitcask freezes it's in-memory keydir and has async folds iterate over the frozen keydir, with new concurrent writes going to a pending keydir. Since the keydir is in-memory, Bitcask only allows a single frozen keydir. Multiple folds can reuse the same keydir, but only if there has not been writes since the keydir was frozen. If a fold is started, a write occurs, and then a new fold is started, the second fold will block until the first fold finishes, and then re-freeze the keydir.

The Problem. Blocking async folders is expected and not a big deal. However, when determining if a vnode should handoff data, Riak will end up calling riak_kv_vnode:is_empty which will call, for a Bitcask vnode, bitcask:is_empty. In Bitcask, the is_empty check is implemented through as a fold (start a fold and exit as soon as any key is found) to deal with tombstones, expired keys, etc. This fold is executed directly in the vnode pid, not an async worker, and will block the vnode in scenarios such as above.

There are two scenarios:

  1. An existing fold is running (list keys, one of the folds used in mDC replication, etc) and handoff is triggered. The vnode will then block until the first fold finishes, servicing no requests and leading to an ever growing message queue.
  2. Handoff is triggered (which starts a fold), and then handoff is re-triggered in the future. The vnode manage retriggers handoff periodically as a fault-tolerance mechanism. The handoff manager ensures that a handoff won't be started if already running. However, the is_empty check occurs before calling the handoff manager. So, handoff A to B, write, handoff A to B will cause the vnode to block on the second handoff request, again servicing no requests and leading to a growing message queue.
@ghost ghost assigned jtuple Nov 2, 2012
jtuple added a commit to basho/bitcask that referenced this issue Nov 2, 2012
Add bitcask:is_empty_estimate to quickly determine if a bitcask contains
no data. Currently, determining if a bitcask has data requires folding
over the keydir to ensure tombstones and expired keys are skipped.
However, this is a potentially blocking operation and no where in Riak do
we actually need perfect knowledge.

The estimate is determined from the bitcask stats, which may overcount
data, but will not undercount. Therefore, the estimated result may return
false when the bitcask is actually empty, but it will never return true
when there is data.

See issue: basho/riak_kv#423
jtuple added a commit that referenced this issue Nov 2, 2012
Previously, determining if a bitcask was empty or not was accomplished
through a fold over the keydir which is a potentially blocking operation.
This commit changes riak_kv_bitcask_backend:is_empty to use the new
function bitcask:is_empty_estimate.

The estimate is determined from the bitcask stats, which may overcount
data, but will not undercount. Therefore, the estimated result may return
false when the bitcask is actually empty, but it will never return true
when there is data. In all cases where is_empty is currently used in
Riak, an estimate is acceptable. In the worst case, additional work may
be triggered that is unnecessary but safe (eg. folding over an empty
bitcask).

See issue: #423
@ghost ghost assigned reiddraper Nov 3, 2012
@evanmcc evanmcc closed this as completed Aug 9, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants