New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-6940. EC: Skip the EC container for balancer #3547
Conversation
|
||
// remove EC containers | ||
containerIDSet.removeIf(containerID -> { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be cleaner to move the try-catch block from lambda to the isECContainer() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @myskov. What's more, we could merge these conditions that require a container lookup. Something like this:
// single removeIf
containerIDSet.removeIf(this::shouldBeExcluded);
...
}
boolean shouldBeExcluded(ContainerID id) {
ContainerInfo container;
try {
container = containerManager.getContainer(id);
} catch (ContainerNotFoundException e) {
LOG.warn(...);
return true;
}
return isClosed(container) || ... || isECContainer(container);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestions. Added.
@siddhantsangwan thanks for this patch. can you explain why we should skip EC container for now, and when can we take EC container into account? |
@JacksonYao287 We have fully done the actual recovery work. We also relaxed placement checks in replication flows currently. Unless we have ready analysis balancer can work with current version EC RM, we can switch off EC container from balancer just to avoid, not to go in wrong way. ( similar to how we were returning from RM for EC containers until before we started offline recovery work) I see you have another patch where you are working to make Balancer work with EC. If you tested and works with balancer, we can remove the flag and enable it. |
@umamaheswararao there are two steps when balancing container: the only difference for EC container in step 1 is the placement policy, this is what is done in HDDS-6533, and i think it has nothing to do with RM. we can merge is first. for EC container, we can just skip move in legacy in RM until we complete the RM related work. if container balancer will select an EC container but the move will fail since we skip it now. so theoretically,there might be a scenario that an EC container might always be selected for move ,but move always fail and the utilization of the datanode will never change, which will lead to Infinite loop。 please notice this. we can fix this after RM related work is completed |
Thanks for the reviews.
To avoid this scenario, we can merge this PR to exclude EC containers from balancing for now. We can then proceed with #3455 and finally include EC containers (by undoing the code in this PR) once steps 1 and 2 are complete. What do you all think? |
@siddhantsangwan i am ok with your proposal. lets skip EC container at both containerBalancer side and RM side until we complete refactoring RM |
Thank you @siddhantsangwan and @JacksonYao287 for the discussion and coming to a conclusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @siddhantsangwan for updating the patch, LGTM.
@myskov @JacksonYao287 Can you also review the latest changes please? |
I have just merged this as we have received approval. I have overlooked the above question from @siddhantsangwan sorry. Feel free to file a followup if you have any points to cover. |
HDDS-6940. EC: Skip the EC container for balancer (apache#3547) (cherry picked from commit 6afe31a) Change-Id: Ie4194f2122aedc789fe4339792afccec41e30709
What changes were proposed in this pull request?
Excluding EC containers from balancing by removing them from the set of Candidate containers selected in
ContainerBalancerSelectionCriteria
.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6940
How was this patch tested?
Existing UTs.