Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved location claim algorithm #1008

Merged
merged 32 commits into from Jun 12, 2023
Merged

Conversation

martinsumner
Copy link
Contributor

@martinsumner martinsumner commented May 26, 2023

This PR adds a new v4 claim algorithm, which significantly improves the efficiency of claim with location awareness.

The algorithm has a brute force approach, whereby it will in the plan stage potentially do an exhaustive search of the possible swaps and moves, in order to provide a ring which meets the target n_val for both nodes and locations. Firstly by trying to minimise the number of changes, and if that is unsuccessful it will attempt to find a valid solution without concern for the plan complexity.

To enable the feature, set via riak.conf:

choose_claim_fun -> choose_claim_v4
full_rebalance_onleave -> on
target_n_val -> 4 (default)
target_location_n_val -> 3` (default)

riak_test testa are added and updated in basho/riak_test#1375.

There is a write-up in docs/claim-version4.md.

In some cases cluster plans may take much longer than at present, potentially > 60s.

The new claim algorithm was developed by Ulf Norell and Thomas Arts.

martinsumner and others added 30 commits March 27, 2023 13:18
Add claim function and target_n_val configuration into cuttlefish.

Move modules around to try and make it more obvious where functions used in membership reside.
Stops fake wants being required to prompt claim on a location change.

Allow for a claim module to implement a sort_members_for_choose(Ring, Members, Owners) -> SortedMembers function, to pre-sort the members being passed into claim_rebalance.

Add further specs.
Location claim improved so that it will try to balance the spread of vnodes, if it reaches the end and is still unbalanced.

Also uses a stronger meets_target_n to fallback to sequential_claim more reliably on incorrect spacing (of vnodes across nodes, but not yet across locations).
Extended potential for test by determining what nodes are safe to both add and remove from loops, rather than simply relying on sequential order.
Resolves some test issues.  Also try harder to do safe removals when looking back in the ring (as opposed to the removal other additions)
* Support two transition changes

Where the second transition is triggered by a change of location.  Need to ensure that the location_changed status update is recognised in the ring

* Unrelated fix to remove reference to gen_fsm_compat

* unrelated fix to get rid of deprecation warning

* Testing claim

* The new claim algorithm as purely functional algorithm

* add new entry for version 5 claiming

* Refactor v5 into v4

* move impossible config test to place where we actually may enter recursion

* Documentation

The algorithm should be described in more detail in a markup document

* Allow configurations with zero nodes in location for better placement update

This works better when a location is emptied on nodes.
Less transfers.

* Keep order of nodes to avoid back translate issue

---------

Co-authored-by: Martin Sumner <martin.sumner@adaptip.co.uk>
Otherwise, if all vnodes have become excluded there is no escape from this condition (unless other traffic can trigger the creation of vnodes).  This is helpful in situations where transfers are performed on standby clusters with no other traffic.

This commit also logs a timing of the claim each time it is called.
To allow for the riak_core_ring_manager and riak_core_claimant to remember v4 solutions, they are shared via the state of the claimant
* Add an extra test to show owners may stay the same if only location changes

This is not always the case, but holds when there is a solution in the first place

* Fix type error that dialyzer could not find

* Introduce necessary conditions to fallback to version 2

* update tests

* Check whether it is worth to use brute force

* make historic values the norm

* Introduce nvals map type

* Take nr nodes into account when checking for brute force cond.

* Property to evaluate skipping brute force strategy

* QuickCheck property starts with choosing ring size.

* Remove fallback for necessary conditions

* Filter tests to get away with flakyness

* In order to re-run tests suite, remove strict precondition

* Check in test suite

* Replace claim_suite.suite by larger claim.suite

* Sometimes it is worth to brute_force to a zero node violation

* better documentation binring algorithm

* Run property with a sufficient condition
This reverts commit 6527033.
The cache of v4 solutions is required by the ring_manager and the claimant - so specifically update both of these processes each time.  Otherwise cache will be missed when the ring_manager calls to riak_core_claimant:ring_changed/2.

There is a fix to the last gasp check before writing the ring file.  prune_write_notify_ring function does not care if the write of a ring errors - so error for this function rather than crashing the ring manager.  This otherwise causes instability in location tests.
* Remove pre-computed test suite

* cleanup

* Make claim_eqc tests not fail on weird configs by supplying a diverse list of options
The leave call on a failure of simple_transfer will call sequential_claim - which is part of the v2 claim family.  Now we have v4, if this is configured it should call v4 as it does handle leaves.
Property temporarily changed to consider only failures with locations
* Use application env to read target_n_val

* Re-introduce v2 in riak_core_claim_eqc

* Move precondition to postcondition to also test less perfect cases

* Fixed error in transfer_node usage

* cleanup not using remove_from_cluster/5.
In this case - full_rebalance should be enabled
Add recommendation to use full_rebalance_on_leave for locations
@martinsumner
Copy link
Contributor Author

@systream - as you did the original location awareness, I suspect you may have an interest in testing this.

@martinsumner
Copy link
Contributor Author

basho/riak_test#1375

Copy link
Contributor

@ThomasArts ThomasArts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

* Change doc

Change introduction to refer to vnodes and nodes.  Removes the recommendation not to vary location_n_val and node_n_val.

* Update comments on having different target n_vals

* Further doc updates

* Update docs/claim-version4.md

* Update docs/claim-version4.md

* Update docs/claim-version4.md

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

* Update docs/claim-version4.md

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

---------

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
@martinsumner martinsumner merged commit 7b04e2c into develop-3.0 Jun 12, 2023
1 check passed
@martinsumner martinsumner deleted the mas-i1001-propclaim-leave branch June 12, 2023 11:52
martinsumner added a commit that referenced this pull request Jun 19, 2023
* Configuration and module alignment

Add claim function and target_n_val configuration into cuttlefish.

Move modules around to try and make it more obvious where functions used in membership reside.

* Remove deprecated v1 claim/wants

* Update framework for Claim

Stops fake wants being required to prompt claim on a location change.

Allow for a claim module to implement a sort_members_for_choose(Ring, Members, Owners) -> SortedMembers function, to pre-sort the members being passed into claim_rebalance.

Add further specs.

* Add choose_claim_v4

* Location claim improvements

Location claim improved so that it will try to balance the spread of vnodes, if it reaches the end and is still unbalanced.

Also uses a stronger meets_target_n to fallback to sequential_claim more reliably on incorrect spacing (of vnodes across nodes, but not yet across locations).

* Refinements to claim_v4

Extended potential for test by determining what nodes are safe to both add and remove from loops, rather than simply relying on sequential order.

* Correction following removal of log

* Count remove list not Excess to determine loops

* Better order of initial striping

Resolves some test issues.  Also try harder to do safe removals when looking back in the ring (as opposed to the removal other additions)

* A new claim algorithm (#1003)

* Support two transition changes

Where the second transition is triggered by a change of location.  Need to ensure that the location_changed status update is recognised in the ring

* Unrelated fix to remove reference to gen_fsm_compat

* unrelated fix to get rid of deprecation warning

* Testing claim

* The new claim algorithm as purely functional algorithm

* add new entry for version 5 claiming

* Refactor v5 into v4

* move impossible config test to place where we actually may enter recursion

* Documentation

The algorithm should be described in more detail in a markup document

* Allow configurations with zero nodes in location for better placement update

This works better when a location is emptied on nodes.
Less transfers.

* Keep order of nodes to avoid back translate issue

---------

Co-authored-by: Martin Sumner <martin.sumner@adaptip.co.uk>

* Claim API requires export of 2-arity choose function as well as 3-arity

* Always return indices

Otherwise, if all vnodes have become excluded there is no escape from this condition (unless other traffic can trigger the creation of vnodes).  This is helpful in situations where transfers are performed on standby clusters with no other traffic.

This commit also logs a timing of the claim each time it is called.

* Calculate swaps only once

* Remember v4 solutions via claimant

To allow for the riak_core_ring_manager and riak_core_claimant to remember v4 solutions, they are shared via the state of the claimant

* Long-running tests

* Adding an extra test (#1004)

* Add an extra test to show owners may stay the same if only location changes

This is not always the case, but holds when there is a solution in the first place

* Fix type error that dialyzer could not find

* Introduce necessary conditions to fallback to version 2

* update tests

* Check whether it is worth to use brute force

* make historic values the norm

* Introduce nvals map type

* Take nr nodes into account when checking for brute force cond.

* Property to evaluate skipping brute force strategy

* QuickCheck property starts with choosing ring size.

* Remove fallback for necessary conditions

* Filter tests to get away with flakyness

* In order to re-run tests suite, remove strict precondition

* Check in test suite

* Replace claim_suite.suite by larger claim.suite

* Sometimes it is worth to brute_force to a zero node violation

* better documentation binring algorithm

* Run property with a sufficient condition

* Revert "Long-running tests"

This reverts commit 6527033.

* Test adjustments

* Test adjustments

* Add support for configured target_location_n_val

* Memoise fixes

The cache of v4 solutions is required by the ring_manager and the claimant - so specifically update both of these processes each time.  Otherwise cache will be missed when the ring_manager calls to riak_core_claimant:ring_changed/2.

There is a fix to the last gasp check before writing the ring file.  prune_write_notify_ring function does not care if the write of a ring errors - so error for this function rather than crashing the ring manager.  This otherwise causes instability in location tests.

* Example configurations saves in source format (#1005)

* Remove pre-computed test suite

* cleanup

* Make claim_eqc tests not fail on weird configs by supplying a diverse list of options

* Add full-rebalance for v4

The leave call on a failure of simple_transfer will call sequential_claim - which is part of the v2 claim family.  Now we have v4, if this is configured it should call v4 as it does handle leaves.

* Support leave in prop_claim

* Update - to use correct claim_fun on leave

Property temporarily changed to consider only failures with locations

* Use application env to read target_n_val (#1007)

* Use application env to read target_n_val

* Re-introduce v2 in riak_core_claim_eqc

* Move precondition to postcondition to also test less perfect cases

* Fixed error in transfer_node usage

* cleanup not using remove_from_cluster/5.

* Add warning if simple_transfer produces unbalanced result

In this case - full_rebalance should be enabled

* only_swap/swap_only confusion

Add recommendation to use full_rebalance_on_leave for locations

* Update riak_core_claim_eqc.erl

* Mas i1001 docupdate (#1009)

* Change doc

Change introduction to refer to vnodes and nodes.  Removes the recommendation not to vary location_n_val and node_n_val.

* Update comments on having different target n_vals

* Further doc updates

* Update docs/claim-version4.md

* Update docs/claim-version4.md

* Update docs/claim-version4.md

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

* Update docs/claim-version4.md

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

---------

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>

---------

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants