Skip to content

ClaimV3#183

Closed
jonmeredith wants to merge 7 commits intojdm-claim-simfrom
jdm-claim-v3
Closed

ClaimV3#183
jonmeredith wants to merge 7 commits intojdm-claim-simfrom
jdm-claim-v3

Conversation

@jonmeredith
Copy link
Contributor

Claim V3 - unlike the v1/v2 algorithms, v3 treats claim as an optimization problem. In it's current form it creates a number of possible claim plans and evaluates them for violations, balance and diversity, choosing the 'best' plan.

Violations are a count of how many partitions owned by the same node are within target-n of one another. Lower is better, 0 is desired if at all possible.

Balance is a measure of the number of partitions owned versus the number of partitions wanted. Want is supplied to the algorithm by the caller as a list of node/counts. The score for deviation is the RMS of the difference between what the node wanted and what it has. Lower is better, 0 if all wants are mets.

Diversity measures how often nodes are close to one another in the preference list. The more diverse (spread of distances apart), the more evenly the responsibility for a failed node is spread across the cluster. Diversity is calculated by working out the count of each distance for each node pair
(currently distances are limited up to target N) and computing the RMS on that. Lower diversity score is better, 0 if nodes are perfectly diverse.

For testing and playing around, you may want to look at running riak_core_claim_sim:commission(). which will run comparisons against the v1/v2 claim algorithm for some preset combinations of ring size, number of nodes, and target n.

You could also play with riak_core_claim_sim:print_failure_analysis(Ring, TargetN, NumFailures) which prints reports on the console for how a ring does given a target N and number of nodes to fail.

This code is based on the jdm-claim-sim branch - conflicts with the cluster management changes will need to be resolved between the three branches before we can merge.

@ghost ghost assigned russelldb May 31, 2012
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the riak_core.app file, and in default_choose_params/1 this is target_n_val is 4 (

TN = app_helper:get_env(riak_core, target_n_val, 4),
) though the docs at the top of the file do say default target_n_val is 3…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably set at 4 until we have a filter on the fallback preflists to ensure node uniqueness as long as possible.

@russelldb
Copy link
Contributor

A few comments about comments and surprising debug output above. Tested using the provided eqc tests, and a bit of prodding by hand.

Wondering if we want to add metrics for time to claim, and maybe the claim "metrics" (diversified / diagonal / violations etc)?

All works for me +1 to merge.

Jon Meredith added 7 commits June 9, 2012 17:20
Claim V3 - unlike the v1/v2 algorithms, v3 treats claim as an optimization problem.
In it's current form it creates a number of possible claim plans and evaluates
them for violations, balance and diversity, choosing the 'best' plan.

Violations are a count of how many partitions owned by the same node are within target-n
of one another. Lower is better, 0 is desired if at all possible.

Balance is a measure of the number of partitions owned versus the number of partitions
wanted.  Want is supplied to the algorithm by the caller as a list of node/counts.  The
score for deviation is the RMS of the difference between what the node wanted and what it
has.  Lower is better, 0 if all wants are mets.

Diversity measures how often nodes are close to one another in the preference
list.  The more diverse (spread of distances apart), the more evenly the
responsibility for a failed node is spread across the cluster.  Diversity is
calculated by working out the count of each distance for each node pair
(currently distances are limited up to target N) and computing the RMS on that.
Lower diversity score is better, 0 if nodes are perfectly diverse.
diversify is still used if claimv3 cannot find a plan without
violations.

Redudece lager:debug output and made it more readable.
metadata changed by the claim function is not persisted to the
raw ring.
Also added a force_claim environment variable to trigger a one
time reclaim.
@jonmeredith
Copy link
Contributor Author

rebased and merged

@ghost ghost assigned rzezeski and jonmeredith Jun 11, 2012
@seancribbs seancribbs deleted the jdm-claim-v3 branch April 1, 2015 22:59
@seancribbs seancribbs restored the jdm-claim-v3 branch April 1, 2015 22:59
@martincox martincox deleted the jdm-claim-v3 branch June 14, 2019 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants