Allow parallel vnode initialization #274

Merged
merged 3 commits into from Feb 23, 2013

Projects

None yet

3 participants

@engelsanchez
Contributor

Solution to basho/platform_tasks#18

engelsanchez added some commits Feb 12, 2013
@engelsanchez engelsanchez Allow vnode init to happen in parallel
Added an extra state to riak_core_vnode to decouple vnode initialization
from process creation.
Added a function to block until initialization is truly finished.
Vnode init call now takes a list of indices or single index.
The list version creates vnode processes first, then waits for them to
initialize, which allows it to happen in parallel.
Used the above in riak_core_ring_handler when starting services
on first ring event.
Tests show I/O saturation at bitcask startup now, instead of the
serialized trickle we had.
This code still needs some work to handle edge cases, specially around
how to handle vnode initialization failing.
783bd60
@engelsanchez engelsanchez Add concurrency control to vnode initialization
Also, a bit of code cleanup.
New pmap with bounded concurrency utility added with unit tests.
6fc3a4e
@jonmeredith jonmeredith commented on the diff Feb 21, 2013
src/riak_core_util.erl
+ exit(normal)
+ end.
+
+incr_counter(CounterPid) ->
+ CounterPid ! {up, self()},
+ receive
+ {counter_value, N} -> N
+ after
+ 3000 ->
+ ?assert(false)
+ end.
+
+decr_counter(CounterPid) ->
+ CounterPid ! down.
+
+bounded_pmap_test_() ->
@jonmeredith
jonmeredith Feb 21, 2013 Contributor

Should #ifdef(TEST) around this

@engelsanchez
engelsanchez Feb 21, 2013 Contributor

It was added to an existing ifdef TEST region, so it doesn't show up in the diff, but it is.

@jonmeredith
jonmeredith Feb 21, 2013 Contributor

I think I'll make some phone calls or call a meeting or something.

@engelsanchez engelsanchez Fix cluster into to query vnode manager, not sup
With the parallel vnode change, there is more reason to not query the
supervisor directly, as its children may not have finished
initialization yet.
9f81cfd
@evanmcc
Contributor
evanmcc commented Feb 21, 2013

do we have any benchmarks or assessments of performance impact?

@engelsanchez
Contributor

Only on a developer laptop with an SSD, 16 partitions and 12GB of bitcask data. The riak_kv service took 4min+ to initialize without this, but only 1min + a few seconds with this change. All coming from being able to better saturate the I/O with concurrent work. This will be tested on real clusters ASAP. If you want to lend a hand, let me know. I know you have something set up and still warm that perhaps we could use? With the latest commit, now concurrency can be controlled with the previously existing vnode_rolling_start parameter. The default should be good (16), and you would get the old, sequential behavior if set to 1 or less. Tweaking on rotational drives should be interesting as I would expect it to turn around and get worse as you tune that setting up. I'm dying to see the real effect.

@evanmcc
Contributor
evanmcc commented Feb 23, 2013

code looks good

testing with 16 vnodes containing 320GB of data on a rotational disk raid system indicated a 10X speedup.

👍

@engelsanchez engelsanchez merged commit 7655482 into master Feb 23, 2013

1 check passed

default The Travis build passed
Details
@engelsanchez engelsanchez was assigned Feb 25, 2013
@jrwest jrwest referenced this pull request in basho/riak_search Feb 26, 2013
Merged

remove guard on riak_search_vnode:start_vnode/1 #140

@seancribbs seancribbs deleted the eas-parallel-vnode-init branch Apr 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment