Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-6940. EC: Skip the EC container for balancer #3547

Merged
merged 2 commits into from Jul 13, 2022

Conversation

siddhantsangwan
Copy link
Contributor

What changes were proposed in this pull request?

Excluding EC containers from balancing by removing them from the set of Candidate containers selected in ContainerBalancerSelectionCriteria.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6940

How was this patch tested?

Existing UTs.


// remove EC containers
containerIDSet.removeIf(containerID -> {
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be cleaner to move the try-catch block from lambda to the isECContainer() method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @myskov. What's more, we could merge these conditions that require a container lookup. Something like this:

    // single removeIf
    containerIDSet.removeIf(this::shouldBeExcluded);
    ...
  }

  boolean shouldBeExcluded(ContainerID id) {
    ContainerInfo container;
    try {
      container = containerManager.getContainer(id);
    } catch (ContainerNotFoundException e) {
      LOG.warn(...);
      return true;
    }

    return isClosed(container) || ... || isECContainer(container);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestions. Added.

@JacksonYao287
Copy link
Contributor

@siddhantsangwan thanks for this patch. can you explain why we should skip EC container for now, and when can we take EC container into account?

@adoroszlai adoroszlai added the EC label Jun 28, 2022
@umamaheswararao
Copy link
Contributor

@JacksonYao287 We have fully done the actual recovery work. We also relaxed placement checks in replication flows currently. Unless we have ready analysis balancer can work with current version EC RM, we can switch off EC container from balancer just to avoid, not to go in wrong way. ( similar to how we were returning from RM for EC containers until before we started offline recovery work)

I see you have another patch where you are working to make Balancer work with EC. If you tested and works with balancer, we can remove the flag and enable it.

@JacksonYao287
Copy link
Contributor

@umamaheswararao there are two steps when balancing container:
1 containerBalancer will select a candidate container from a source over-utilized Dn, and find an under-utilized Dn as a target
2 call legacyRm#move to take this move action.

the only difference for EC container in step 1 is the placement policy, this is what is done in HDDS-6533, and i think it has nothing to do with RM. we can merge is first.

for EC container, we can just skip move in legacy in RM until we complete the RM related work.

@siddhantsangwan

if container balancer will select an EC container but the move will fail since we skip it now. so theoretically,there might be a scenario that an EC container might always be selected for move ,but move always fail and the utilization of the datanode will never change, which will lead to Infinite loop。 please notice this. we can fix this after RM related work is completed

@siddhantsangwan
Copy link
Contributor Author

Thanks for the reviews.

if container balancer will select an EC container but the move will fail since we skip it now. so theoretically,there might be a scenario that an EC container might always be selected for move ,but move always fail and the utilization of the datanode will never change, which will lead to Infinite loop。 please notice this. we can fix this after RM related work is completed

To avoid this scenario, we can merge this PR to exclude EC containers from balancing for now. We can then proceed with #3455 and finally include EC containers (by undoing the code in this PR) once steps 1 and 2 are complete. What do you all think?

@JacksonYao287
Copy link
Contributor

@siddhantsangwan i am ok with your proposal. lets skip EC container at both containerBalancer side and RM side until we complete refactoring RM

@umamaheswararao
Copy link
Contributor

Thank you @siddhantsangwan and @JacksonYao287 for the discussion and coming to a conclusion.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @siddhantsangwan for updating the patch, LGTM.

@siddhantsangwan
Copy link
Contributor Author

@myskov @JacksonYao287 Can you also review the latest changes please?

@umamaheswararao umamaheswararao merged commit 6afe31a into apache:master Jul 13, 2022
@umamaheswararao
Copy link
Contributor

I have just merged this as we have received approval. I have overlooked the above question from @siddhantsangwan sorry. Feel free to file a followup if you have any points to cover.

duongkame pushed a commit to duongkame/ozone that referenced this pull request Aug 16, 2022
HDDS-6940. EC: Skip the EC container for balancer (apache#3547)

(cherry picked from commit 6afe31a)
Change-Id: Ie4194f2122aedc789fe4339792afccec41e30709
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants