fix forced_ownership_handoff during resize #331

Merged
merged 1 commit into from Jun 24, 2013

Conversation

Projects
None yet
2 participants
Contributor

jrwest commented Jun 3, 2013

All resize operations remain in the ring's list of pending
changes until all complete. Prior to this change transfers would
only be triggered for the first forced_ownership_handoff operations.
Subsequent operations would only be triggered by vnode inactivity.

This commit modifies the use of forced_ownership_handoff during resize
to ensure that only resize operations that are still pending are in
the throttled transfer list.

This addresses the feature blocking issue in the second to last paragraph here


This PR can be verified using basho_bench w/ configs here:

  1. Build a 64-partition, 4-node cluster and run mapred_populate.config. verify everything is ok w/ mapred_verify.config.
  2. Kick off a resize with riak-admin cluster resize-ring 128, then plan and commit.
  3. Run mapred_verify.config over and over w/ something like while 1; do ./basho_bench mapred_verify.config; done.

w/o this change during 3. the resize will stall (no transfer output in logs is an easy way to verify this, also ring-status will not progress). w/ this change, you will see the resize progress despite the mapreduce traffic.

Note: mapreduce traffic is not necessary, any sufficient load that prevents all vnodes from reaching their inactivity timeout will do

@jrwest jrwest fix forced_ownership_handoff during resize
All resize operations remain in the ring's list of pending
changes until all complete. Prior to this change transfers would
only be triggered for the first forced_ownership_handoff operations.
Subsequent operations would only be triggered by vnode *inactivity*.

This commit modifies the use of forced_ownership_handoff during resize
to ensure that only resize operations that are still pending are in
the throttled transfer list.
d99549d
Contributor

jrwest commented Jun 3, 2013

forgot to mention, when running mapred_verify.config after the resize completes, you will see errors from basho_bench because the expected key counts will be wrong depending on the resize operation. For more information see the last paragraph here. This is expected behaviour for now and is not the subject of this PR.

Contributor

jtuple commented Jun 24, 2013

Reviewed code, nothing of consequence to note, all looks good there.

Retested against master to verify transfer stall. Then tested again using this branch merged to master, and no stall occurred. Awesomeness.

+1 merge away

@jrwest jrwest added a commit that referenced this pull request Jun 24, 2013

@jrwest jrwest Merge pull request #331 from basho/jrw-resize-foh-fix
fix forced_ownership_handoff during resize
7acc8c9

@jrwest jrwest merged commit 7acc8c9 into master Jun 24, 2013

1 check failed

default The Travis CI build failed
Details

jaredmorrow deleted the jrw-resize-foh-fix branch Jun 25, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment