Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance handoff repair tests #1369

Merged
merged 6 commits into from
Feb 6, 2023
Merged

Enhance handoff repair tests #1369

merged 6 commits into from
Feb 6, 2023

Conversation

martinsumner
Copy link

verify_rejoin was a completely broken test. It is now fixed, with a section added at the end to prove the impact of the configuration on fallback repairs.

verify_handoff extended to alter the batch_threhsold_count.

The test has two functions:

Issue 1759 (KV).  Confirm that through all the leaves and rejoins false AAE does not kick in because old state is left lingering.

Issue 994 (Core).  Confirm the controlling read-repair to fallbacks works as expected.  Nodes that are configured not to repair fallbacks don't, but still repair primaries.
Resolved "rogue repairs" caused by sequence of events sparked from key_amnesia prompted repairs, some of which occurred before transfer completed.
Extend verify_rejoin to test partition repair via the new riak_client:repair_node() function.
@martinsumner martinsumner merged commit b095fd7 into develop-3.0 Feb 6, 2023
martinsumner added a commit that referenced this pull request Feb 7, 2023
* Enhance handoff repair tests

* Use riak stats to confirm

The test has two functions:

Issue 1759 (KV).  Confirm that through all the leaves and rejoins false AAE does not kick in because old state is left lingering.

Issue 994 (Core).  Confirm the controlling read-repair to fallbacks works as expected.  Nodes that are configured not to repair fallbacks don't, but still repair primaries.

* Clean of repairs - vnode_id issue

Resolved "rogue repairs" caused by sequence of events sparked from key_amnesia prompted repairs, some of which occurred before transfer completed.

* Log on failure

* Add test for partition repair

Extend verify_rejoin to test partition repair via the new riak_client:repair_node() function.

* Reduce race chances
tburghart added a commit to OpenRiak/riak_test that referenced this pull request May 21, 2024
  Test PR PW when set by bucket properties (basho#1366)
  Extend test to use riak_client (basho#1367)
  Fix Riak version check in test
  Add tolerance for the occasional tombstone left hanging
  Issues of stability post partition
  Mas i1847 putapi (basho#1370)
  Enhance handoff repair tests (basho#1369)
  Mas delete and transfer (basho#1371)
  Test reliability improvements (basho#1372)
  Add AAE
  Update verify_rejoin.erl (basho#1373)
  Test range repl with node_confirms set to 2 (basho#1374)

wday-contrib 1100 1548

Co-authored-by: Martin Sumner <martinsumner@users.noreply.github.com>
Co-authored-by: Ted Burghart <tburghart@users.noreply.github.com>
Co-authored-by: Fred Dushin <fadushin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants