Integration of background manager with Handoff and AAE tree rebuild #481

buddhisthead · 2013-12-13T23:47:46Z

Replaced by #484

NOTE: DO NOT MERGE YET (requires #475 + a rebase to remove the bg_manager itself from this PR)

The riak_core background manager PR is here: #475

The riak_repl PR is here: basho/riak_repl#488

The background manager allows subsystems to coordinate with locks (and tokens) in a non-blocking API when desired. In our case, we've seen several customer support issues related to a production cluster being overloaded by the combined weight of MDC fullsync replication in combination with either handoff, or AAE tree rebuilds (or all three!).

Each of these subsystems is being taught, for 2.0, to "take" a lock on the kv vnode/partition before it does a fold over that partition. Locks are returned when the process dies or exists normally. Each subsystem (at this point in time) is coded to keep working on other partitions if one partition is denied a lock via the return value of max_concurrency, which means progress is made even when one partition is busy.

Use of this feature can be bypassed in production. Besides a global skill switch on the background manager itself, the riak_core configuration supports two new parameters: aae_skip_bg_manager and handoff_skip_bg_manager, which defaults to false. Setting it to true will not call the background manager and proceed like it used to.

buddhisthead · 2013-12-13T23:50:18Z

Notes to self:

squash commits into a single entry.
add cuttlefish schema for bg_mgr configuration parameters

jrwest · 2013-12-14T00:39:01Z

src/riak_core_vnode.erl

@@ -145,6 +146,7 @@ init([Mod, Index, InitialInactivityTimeout, Forward]) ->
    process_flag(trap_exit, true),
    State = #state{index=Index, mod=Mod, forward=Forward,
                   inactivity_timeout=InitialInactivityTimeout},
+    try_set_vnode_lock_limit(Mod, Index),


As we discussed previously I'm not sure riak_core_vnode is the correct place to be doing this. As the PR states, the goal of this work is to limit folds over riak_*kv*_vnode's. This registers a lock per-partition, per module so there is also a {vnode_lock, riak_pipe_vnode, Partition} for each partition for example. This may have the desired effect of also providing the locks for riak_kv_vnode but the scope is too wide. I'll leave some comments elsewhere to try and illustrate.

It still seems to me that if we want to limit folds over riak_kv_vnode backends than riak_kv is the correct place to be registering these locks and acquiring them. The challenge is integrating that with handoff because handoff is a general mechanism in core and its hooks at the riak_kv level are insufficient for grabbing locks. The correct way is probably to extend these hooks but alas it is probably to late for that. As I previously expressed, I think we should skip handoff this go around because of these issues. Handoff can still be limited via the node-wide concurrency limit so provided exclusive access to the partition can be done via a mixture of two knobs instead of just one. I understand that handoff has been a large pain point and motivation for this work and that a more ideal approach is desired but we have to start being realistic about what we can get in with the time we have.

Without handoff this becomes a much simpler problem. Register the locks in riak_kv_vnode as {riak_kv_vnode_lock, Partition} and the Repl and AAE changes should basically remain the same. However, I really think the solution we should be discussing is doing locking across all backend folds that use ?FOLD_REQ

I agree that we should not register the pipe locks. I missed that. In the other integration points, I match on riak_kv_vnode and I think we should do the same here. There are plenty of other places in core that do the same thing - it's not like kv is completely isolated from core. I have run riak tests with this configuration and it works fine with handoff.

But maybe you have a good point about where to do the registration. We can do it from kv and still integrate with handoff.

We should definitely not match on the name of a specific vnode here. That is akin to a super-type being knowledgeable of its child types. In addition, riak_core is not only being used by Riak. This is registering locks on behalf of every vnode in every riak_core application. Core is certainly not the cleanest abstraction but the only references to a specific vnode are examples in tests.

Ok, I understand. Moving the registration to kv makes total sense to me. Sorry about the confusion.

* The goal is to allow riak sub-systems to coordinate use of shared resources, * e.g. protect from concurrent vnode folds on the same partition. * Locks and tokens have a settable maximum concurrency limit. * "Taken" locks and tokens are tracked in an ETS table. * max_concurrency is returned when the set limits are reached. * Processes that take locks are monitored. * Locks are released when the taking processes terminate. * Tokens are refreshed at a specified periodic rate. * Token processes are not monitored because tokens never "release". * A table manager is introduced to add persistence across process crashes, * and to allow proper table transfer to occur without losing the table. * An EQC test exercises the majority of the API. see test/bg_manager_eqc.erl * See the original PR for background manager here: #364

…le is unavailable * Allows callers to try again later

…gr, fix set_token_rate bug.

* Use per-vnode locks to guard concurrent kv folds on the same partition. * Include kv module name to lock to differentiate from pipe/other vnode types.

buddhisthead · 2013-12-18T02:59:04Z

replaced by #484

buddhisthead mentioned this pull request Dec 13, 2013

Integrate fullsync with the background manager basho/riak_repl#488

Merged

jrwest reviewed Dec 14, 2013
View reviewed changes

buddhisthead added 5 commits December 17, 2013 17:59

Concurrency API should return 'undefined' for limits when the ETS tab…

390134b

…le is unavailable * Allows callers to try again later

Remove history API, use unregistered to signal unavailability of bg-m…

b6f494c

…gr, fix set_token_rate bug.

Integrate vnode lock with background manager

656a19f

* Use per-vnode locks to guard concurrent kv folds on the same partition. * Include kv module name to lock to differentiate from pipe/other vnode types.

Use unregistered as a means to detect bg-mgr unavailability

f6274df

buddhisthead mentioned this pull request Dec 18, 2013

Background Manager Integration with Handoff #484

Merged

buddhisthead closed this Dec 18, 2013

jrwest modified the milestones: 2.0-beta, 2.0 Mar 24, 2014

seancribbs deleted the cet-bg-mgr-vnode-lock branch April 1, 2015 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration of background manager with Handoff and AAE tree rebuild #481

Integration of background manager with Handoff and AAE tree rebuild #481

buddhisthead commented Dec 13, 2013

buddhisthead commented Dec 13, 2013

jrwest Dec 14, 2013

buddhisthead Dec 14, 2013

jrwest Dec 14, 2013

buddhisthead Dec 14, 2013

buddhisthead commented Dec 18, 2013

Integration of background manager with Handoff and AAE tree rebuild #481

Integration of background manager with Handoff and AAE tree rebuild #481

Conversation

buddhisthead commented Dec 13, 2013

buddhisthead commented Dec 13, 2013

jrwest Dec 14, 2013

Choose a reason for hiding this comment

buddhisthead Dec 14, 2013

Choose a reason for hiding this comment

jrwest Dec 14, 2013

Choose a reason for hiding this comment

buddhisthead Dec 14, 2013

Choose a reason for hiding this comment

buddhisthead commented Dec 18, 2013