Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry on eleveldb lock errors during open for up to 1 minute. #395

Merged
merged 1 commit into from
Sep 14, 2012

Conversation

jonmeredith
Copy link
Contributor

If a vnode crashes, the eleveldb NIF will close the leveldb database cleanly, writing out any pending data. However, the vnode supervisor can restart the crashed vnode faster than the close process and trigger a lock error which fails the vnode start and takes riak down.

This patch adds a retry every two seconds, waiting up to a minute for the lock to clear.

If a vnode crashes then the eleveldb NIF will close the leveldb database
cleanly, writing out any pending data.  The vnode supervisor can restart
the crashed vnode faster than the close process and trigger a lock error.

This patch adds a retry every two seconds, waiting up to a minute
for the lock to clear.
@jonmeredith
Copy link
Contributor Author

Here's an on the console way to play with it...

lager:set_loglevel(lager_console_backend, debug).
Bytes = crypto:rand_bytes(1000000).
{ok,C}=riak:local_client().
F = fun(X) -> C:put(riak_object:new(<<"b">>,<<"k">>,Bytes)), X(X) end. 
[spawn(fun() -> F(F) end) || I <- lists:seq(1,10)].
timer:sleep(10000),
rpc:multicall(erlang,
                 apply,
                 [fun() ->
                         [ 
                          exit(Pid, kill) ||
                              {_,_,Pid} <-
                                  riak_core_vnode_manager:all_vnodes(riak_kv_vnode),
                              is_pid(Pid)
                         ]
                  end,
                  []]).

@ghost ghost assigned reiddraper Sep 14, 2012
@reiddraper
Copy link
Contributor

Nice tests. Looks good other than the two comments above.

jonmeredith pushed a commit that referenced this pull request Sep 14, 2012
Add retry on eleveldb lock errors during open for up to 1 minute.
@jonmeredith jonmeredith merged commit 1360c26 into 1.2 Sep 14, 2012
@seancribbs seancribbs deleted the jdm-eleveldb-retry-on-lock branch April 1, 2015 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants