New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AZ447: Improve partition claim algorithm to reduce unneeded partition transfers #91
Conversation
Add riak_core_new_claim:new_wants_claim/2 and new_claim/2. Merge in claim simulation code provided by Greg Nelson (grourk@dropcam.com). Add pretty_print function to riak_core_ring. The new claim function is designed to reduce the number of partition transfers that occur when rebalancing the ring, aiming as close to possible for minimal consistent hashing.
{node_count, 8}, | ||
{node_capacity, 24} | ||
], | ||
run(Opts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding two calls to run, one with the default claim and one with the new?
The claim simulation code is setup as an eunit test, and can be invoked through rebar: The code writes out a file in It should be easy to modify that file To test a real cluster, be sure to set lager debugging to the {riak_core, [
%% {choose_claim_fun, {riak_core_claim, default_choose_claim}},
%% {wants_claim_fun, {riak_core_claim, default_wants_claim}},
{choose_claim_fun, {riak_core_new_claim, new_claim}},
{wants_claim_fun, {riak_core_new_claim, new_wants_claim}}, There's also new pretty-printer for %% Single-line pretty-print of ring, grouped into target_n segments
> riak_core_ring:pretty_print(Ring, []).
abcc|abcc|abcc|abcc|abcc|abcc|abcc|abcc|abcc|abcc|abca|abba|abba|abba|abba|abba|
%% Include legend in pretty-print
> riak_core_ring:pretty_print(Ring, [legend]).
==================================== Nodes ====================================
Node a: 34.4% dev1@127.0.0.1
Node b: 32.8% dev2@127.0.0.1
Node c: 32.8% dev3@127.0.0.1
==================================== Ring =====================================
abcc|abcc|abcc|abcc|abcc|abcc|abcc|abcc|abcc|abcc|abca|abba|abba|abba|abba|abba|
%% Print using numeric ids instead of letters (useful for cluster > 26 nodes)
riak_core_ring:pretty_print(Ring, [numeric]).
["1","2","3","3","1","2","3","3","1","2","3","3","1","2","3","3","1","2","3",
"3","1","2","3","3","1","2","3","3","1","2","3","3","1","2","3","3","1","2",
"3","3","1","2","3","1","1","2","2","1","1","2","2","1","1","2","2","1","1",
"2","2","1","1","2","2","1"] |
Another interesting thing to point out is that the default For example, the old algorithm's ring progression going from 1 to 4 nodes clearly shows that with 3 nodes most N=3 preflists are on distinct nodes:
For the new algorithm, the default settings result in the 3 node ring having repeated nodes in most N=3 preflists, because the new algorithm doesn't maximize spread until after the number of nodes is greater than
If, however,
|
Code review, test runs, and local cluster testing look good. +1 to merge. |
Change claim_simulation.erl eunit test to run a simulation with both the new and old claim algorithm as suggested. Rename riak_core_new_claim:new_claim/2 to new_choose_claim/2 to match default_choose_claim/2. Fix two bugs in riak_core_new_claim.erl that are on code paths that cannot occur in 1.0 due to existing invariants, but should be fixed nevertheless: - Match error in prefilter_violations: change CNth to {CNth, _}. - Handle case where new_choose_claim fails to claim partitions by falling back to claim_rebalance_n.
bz://1067: remove unused try_cast and fallback functions from riak_kv_ut
This pull-request adds a new partition claim algorithm in riak_core_new_claim.erl that aims to reduce the number of unneeded partition transfers. However, the previous algorithm (default_claim) remains the default for 1.0.
The default algorithm falls back to re-diagonalization far too often, making it so that adding a new node to the more often than not requires more than half of the partitions to be handed off, rather than the expected 1/N (where N is the number of nodes, including the new node).
The new algorithm tries more closely hit the 1/N target while still maintaining the target_n spread constraint.
In addition to the new algorithm, this pull-request also merges in a claim simulation harness originally contributed by Greg Nelson at Dropcam (https://gist.github.com/992317). This can be used to compare the efficiency of the new and old algorithms.