HDDS-8369. Decommissioning with rack aware placement policy does not replicate to correct rack. by ashishkumar50 · Pull Request #4556 · apache/ozone

ashishkumar50 · 2023-04-11T05:02:01Z

What changes were proposed in this pull request?

Currently decommissioned node is also considered to determine rack which leads to undesired result causing wrong rack selection. Now we don't consider decommissioned node for rack aware policy algorithm.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8369

How was this patch tested?

Verified in real test environment.

…replicate to correct rack.

krishnaasawa1 · 2023-04-11T10:52:38Z

...rg/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackAware.java

      // are on different racks.
      for (int i = 0; i < excludedNodesCount; i++) {
        for (int j = i + 1; j < excludedNodesCount; j++) {
+          if (excludedNodes.get(j).isDecomissioned()) {


excludedNodes should not have decommissioned node , isn't replicationManager should remove decommissioned node from excludedNodes before invoking rack awareness.

@krishnaasawa1 thanks for review. Removed decommissioned node from excludedNodes in replicationManager also. We can keep check in topology too for not considering as part of rack aware?

When picking new nodes, you have to pass in all the nodes that are not allowed to be used for a new replica, and that should include all the nodes that have a replica already, as they cannot be used for a new replica. The issue here, is that SCMContainerPlacementRackAware is using "excluded" nodes to mean something like "existing nodes I want to replace", but nodes could be excluded for other reasons too, such as being overloaded etc.

We changed the interface to pass "usedNodes" and "ExcludedNodes" separately, but until https://issues.apache.org/jira/browse/HDDS-7226 is implemented, the usedNodes are not used in SCMContainerPlacementRackAware.

Looking around the code, it is not strictly needed to exclude a decommissioning node, as the placement policies check if any selected node is valid, and one of those checks is the ensure the node is IN_SERVICE.

Therefore I don't think we need the check for "isDecommissioned" here. Either we should fix the placement policy to use usedNodes correctly, or we should just not pass the decommissioned node in the exclude list to begin with.

…replicate to correct rack.

ashishkumar50 · 2023-04-11T11:20:50Z

@ChenSammi , @siddhantsangwan Can you please help to review.

sodonnel · 2023-04-11T14:59:03Z

...src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java

        // maintenance nodes, as the replicas will remain present in the
        // container manager, even when they go dead.
        .filter(r -> getNodeStatus(r.getDatanodeDetails()).isHealthy()
+            && !r.getDatanodeDetails().isDecomissioned()


I don't think this is the correct place to filter out decommissioned nodes. This is forming a list of replication source nodes - it is valid to replicate from a decommissioning host, and in some cases it is necessary.

The correct place to filter out the decommission / maintenance nodes would be in the method "replicateAnyWithTopology", as that is where the exclude list is formed to pass into the placement policy. But we would also need to take care of this in the new replication manager.

Hi @sodonnel , thanks for the review. Handled for both Legacy and New replication manager.

…replicate to correct rack.

sodonnel · 2023-04-17T10:50:29Z

I think we should look at the difficulty of fixing https://issues.apache.org/jira/browse/HDDS-7226 before proceeding with this fix. It would be easy for other code areas to fall into a similar trap to this one, and ultimately we need to fix https://issues.apache.org/jira/browse/HDDS-7226 anyway.

arp7 · 2023-04-17T16:14:45Z

Considering this is a two line fix, is there any harm in letting this go in and then take on HDDS-7226?

sodonnel · 2023-04-17T16:30:54Z

It needs a unit test, at least in the non legacy RM. I also think it just papers over a problem and we should fix it correctly by addressing HDDS-7226 and then adjusting the code to use usedNodes and excludedNodes correctly. If something changes in the topology, then it could end up returning the decommissioning node as we are no longer excluding it.

Fixing HDDS-7226 should not be difficult and it will give a more robust solution going forward.

arp7 · 2023-04-17T16:37:26Z

Agreed, we need a UT at least. I am not sure how much work HDDS-7226 is going to be, and also Ashish mentioned we will need a follow up fix after HDDS-7226 to fix this specific issue. So we need to balance the short-term problem with the long-term fix.

Ashish can you discuss with Stephen and decide a path forward?

sodonnel · 2023-04-17T16:39:44Z

Its debatable that this is actually a problem too - the placement policy says the container should be on at least 2 racks, not exactly 2 racks. The "problem" is that it is ending up on more than 2 racks, which isn't really a problem.

sodonnel · 2023-04-17T16:48:33Z

I also wonder about other scenarios. Eg, lets say we have 3 replicas, and one is unhealthy (scrubber found a problem with it, and marked it bad). Right now, when we make the call to the placement policy we will pass the 2 good nodes and the unhealthy replica node as excluded and ask for 1 new node - which is effectively the same as passing 2 good nodes and a decommission node, as it will still confuse the placement algorithm.

What we really need to do is pass used nodes as nodes 1 and 2, as they are going to stay, and pass the unhealthy replica node as excluded so we don't select that node for a new copy. But we need that other Jira implemented to have usedNodes working in the RackAwarePlacementPolicy.

So I think the fix here addresses a partial solution for a specific scenario, but leaves other parts unfixed.

…lacement policy.

ashishkumar50 · 2023-04-24T10:26:15Z

@sodonnel , I have updated PR for just Legacy RM which will continue to use old interface without usedNodes.
New RM will use the new interface which will use both "used and exclude nodes".
Rack aware policy to handle both scenario is handled in below PR.
#4614
Until legacy RM code exist decommissioning scenario will continue to work with this fix.

adoroszlai · 2023-04-28T14:11:52Z

Thanks @ashishkumar50 for the patch, @krishnaasawa1, @sodonnel for the review.

ashishkr200 added 3 commits April 10, 2023 20:13

HDDS-8369. Decommissioning with rack aware placement policy does not …

bb72ce8

…replicate to correct rack.

HDDS-8369. Decommissioning with rack aware placement policy does not …

397903f

…replicate to correct rack.

HDDS-8369. Decommissioning with rack aware placement policy does not …

bb27bc9

…replicate to correct rack.

krishnaasawa1 reviewed Apr 11, 2023

View reviewed changes

HDDS-8369. Decommissioning with rack aware placement policy does not …

b348ba6

…replicate to correct rack.

adoroszlai requested a review from sodonnel April 11, 2023 11:45

sodonnel reviewed Apr 11, 2023

View reviewed changes

HDDS-8369. Decommissioning with rack aware placement policy does not …

2ceb1df

…replicate to correct rack.

HDDS-8369. Handle for Legacy RM for decommissioning with rack aware p…

eae3455

…lacement policy.

sodonnel approved these changes Apr 28, 2023

View reviewed changes

adoroszlai added the scm label Apr 28, 2023

adoroszlai merged commit 6eebe45 into apache:master Apr 28, 2023

Conversation

ashishkumar50 commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

krishnaasawa1 Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

ashishkumar50 Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

sodonnel Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

ashishkumar50 commented Apr 11, 2023

Uh oh!

sodonnel Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

sodonnel Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

ashishkumar50 Apr 14, 2023

Choose a reason for hiding this comment

Uh oh!

sodonnel commented Apr 17, 2023

Uh oh!

arp7 commented Apr 17, 2023

Uh oh!

sodonnel commented Apr 17, 2023

Uh oh!

arp7 commented Apr 17, 2023

Uh oh!

sodonnel commented Apr 17, 2023

Uh oh!

sodonnel commented Apr 17, 2023

Uh oh!

ashishkumar50 commented Apr 24, 2023

Uh oh!

adoroszlai commented Apr 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ashishkumar50 commented Apr 11, 2023 •

edited

Loading