Add retry on eleveldb lock errors during open for up to 1 minute. #395

Merged
merged 1 commit into from Sep 14, 2012

Projects

None yet

2 participants

@jonmeredith
Contributor

If a vnode crashes, the eleveldb NIF will close the leveldb database cleanly, writing out any pending data. However, the vnode supervisor can restart the crashed vnode faster than the close process and trigger a lock error which fails the vnode start and takes riak down.

This patch adds a retry every two seconds, waiting up to a minute for the lock to clear.

@jonmeredith jonmeredith Add retry on eleveldb lock errors during open for up to 1 minute.
If a vnode crashes then the eleveldb NIF will close the leveldb database
cleanly, writing out any pending data.  The vnode supervisor can restart
the crashed vnode faster than the close process and trigger a lock error.

This patch adds a retry every two seconds, waiting up to a minute
for the lock to clear.
5032740
@jonmeredith
Contributor

Here's an on the console way to play with it...

lager:set_loglevel(lager_console_backend, debug).
Bytes = crypto:rand_bytes(1000000).
{ok,C}=riak:local_client().
F = fun(X) -> C:put(riak_object:new(<<"b">>,<<"k">>,Bytes)), X(X) end. 
[spawn(fun() -> F(F) end) || I <- lists:seq(1,10)].
timer:sleep(10000),
rpc:multicall(erlang,
                 apply,
                 [fun() ->
                         [ 
                          exit(Pid, kill) ||
                              {_,_,Pid} <-
                                  riak_core_vnode_manager:all_vnodes(riak_kv_vnode),
                              is_pid(Pid)
                         ]
                  end,
                  []]).

@reiddraper reiddraper was assigned Sep 14, 2012
@reiddraper

don't think you need the \n with lager

Contributor

It'll add one if I don't, so why make it do extra work :)

Contributor

fair enough

@reiddraper

how about a lager:debug here to say we've exhausted our retries?

Contributor

Could do - I deliberately pass back the last error I got, to I thought that would show up in logs... a little extra debug couldn't hurt.

Contributor

Good point, nbd either way then

@reiddraper
Contributor

Nice tests. Looks good other than the two comments above.

@jonmeredith jonmeredith merged commit 1360c26 into 1.2 Sep 14, 2012

1 check passed

default The Travis build passed
Details
@seancribbs seancribbs deleted the jdm-eleveldb-retry-on-lock branch Apr 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment