HDDS-9592. Replication Manager: Save UNHEALTHY replicas with highest BCSID for a QUASI_CLOSED container#5794
Merged
siddhantsangwan merged 5 commits intoapache:masterfrom Dec 20, 2023
Conversation
…BCSID for a QUASI_CLOSED container
Contributor
Author
|
I've added some tests to |
sodonnel
approved these changes
Dec 18, 2023
Contributor
sodonnel
left a comment
There was a problem hiding this comment.
Thanks for working on this. LGTM.
Contributor
Author
|
@sodonnel Thanks for the review. Merging to the master branch. |
symious
pushed a commit
that referenced
this pull request
Dec 20, 2023
…BCSID for a QUASI_CLOSED container (#5794)
jojochuang
pushed a commit
to jojochuang/ozone
that referenced
this pull request
Feb 1, 2024
…ith highest BCSID for a QUASI_CLOSED container (apache#5794) (cherry picked from commit faa1990) Change-Id: Id7e86699c64a48eaa0005018cfaffb348243aeef
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
A
QUASI_CLOSEDcontainer may have someUNHEALTHYreplicas with the same sequence id as the container, while there are no healthy replicas with the correct sequence id. SuchUNHEALTHYreplicas cannot be deleted and must be kept around.If the DN hosting such an
UNHEALTHYreplica is put in decommission, then decommission will stay blocked because theUNHEALTHYcannot be lost, but at the same time RM currently does nothing about it. We try to do something about these vulnerableUNHEALTHYreplicas in this PR so that decommission can be successful.Changes introduced:
VulnerableUnhealthyReplicasHandler, leverages the existingreplicaCount.getVulnerableUnhealthyReplicasAPI to find suchUNHEALTHYreplicas. If found, the container is marked as under replicated and added to the under replication queue.RatisUnderReplicationHandler. It tries to find a new target DN for eachUNHEALTHYreplica and sends replicate commands. The logic is similar to what we have already done for legacy RM. Some additional changes were required to correctly find out the used and excluded nodes to pass into the placement policy API for finding target DNs.replicaSet.isHealthyEnoughForOfflineAPI.The third point above basically solves ReplicationManager: Unhealthy replicas of a sufficiently replicated container can block decommissioning. If required, this can be split off into its own PR since this one is quite large.
Need to add some more tests toTestRatisUnderReplicationHandler.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9592
How was this patch tested?
New tests.