Add retry on eleveldb lock errors during open for up to 1 minute. #395

Merged
merged 1 commit into from Sep 14, 2012

Conversation

Projects
None yet
2 participants
Contributor

jonmeredith commented Sep 14, 2012

If a vnode crashes, the eleveldb NIF will close the leveldb database cleanly, writing out any pending data. However, the vnode supervisor can restart the crashed vnode faster than the close process and trigger a lock error which fails the vnode start and takes riak down.

This patch adds a retry every two seconds, waiting up to a minute for the lock to clear.

Add retry on eleveldb lock errors during open for up to 1 minute.
If a vnode crashes then the eleveldb NIF will close the leveldb database
cleanly, writing out any pending data.  The vnode supervisor can restart
the crashed vnode faster than the close process and trigger a lock error.

This patch adds a retry every two seconds, waiting up to a minute
for the lock to clear.
Contributor

jonmeredith commented Sep 14, 2012

Here's an on the console way to play with it...

lager:set_loglevel(lager_console_backend, debug).
Bytes = crypto:rand_bytes(1000000).
{ok,C}=riak:local_client().
F = fun(X) -> C:put(riak_object:new(<<"b">>,<<"k">>,Bytes)), X(X) end. 
[spawn(fun() -> F(F) end) || I <- lists:seq(1,10)].
timer:sleep(10000),
rpc:multicall(erlang,
                 apply,
                 [fun() ->
                         [ 
                          exit(Pid, kill) ||
                              {_,_,Pid} <-
                                  riak_core_vnode_manager:all_vnodes(riak_kv_vnode),
                              is_pid(Pid)
                         ]
                  end,
                  []]).

@ghost ghost assigned reiddraper Sep 14, 2012

don't think you need the \n with lager

Contributor

jonmeredith replied Sep 14, 2012

It'll add one if I don't, so why make it do extra work :)

Contributor

reiddraper replied Sep 14, 2012

fair enough

how about a lager:debug here to say we've exhausted our retries?

Contributor

jonmeredith replied Sep 14, 2012

Could do - I deliberately pass back the last error I got, to I thought that would show up in logs... a little extra debug couldn't hurt.

Contributor

reiddraper replied Sep 14, 2012

Good point, nbd either way then

Contributor

reiddraper commented Sep 14, 2012

Nice tests. Looks good other than the two comments above.

jonmeredith added a commit that referenced this pull request Sep 14, 2012

Merge pull request #395 from basho/jdm-eleveldb-retry-on-lock
Add retry on eleveldb lock errors during open for up to 1 minute.

@jonmeredith jonmeredith merged commit 1360c26 into 1.2 Sep 14, 2012

1 check passed

default The Travis build passed
Details

@seancribbs seancribbs deleted the jdm-eleveldb-retry-on-lock branch Apr 1, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment