HDDS-5437. Move dead DN out of topology#2435
Conversation
1eb45d2 to
4ff0f0e
Compare
|
@ChenSammi @sodonnel PTAL, thanks! |
dineshchitlangia
left a comment
There was a problem hiding this comment.
@JacksonYao287 LGTM
|
Thanks @JacksonYao287 for working on this. When a DN registered, the DatanodeDetails instance which presents DN is added into both Topology tree and nodeStateManager. nodeStateManager is used more frequently than Topology to get a DN's info. Would you add a Precondition check that
And would you add a new UT for it. |
|
thanks @ChenSammi for the review!
i will make this change |
thank @dineshchitlangia for the review! |
|
@ChenSammi PTAL, thanks! |
|
|
||
| // First set the node to IN_MAINTENANCE and ensure the container replicas | ||
| // are not removed on the dead event | ||
| datanode1 = nodeManager.getNodeByUuid(datanode1.getUuidString()); |
There was a problem hiding this comment.
How does this change test this feature? I think we need a test (or extend this test) to ensure the node is removed from the topology when the dead handler is called and another test to ensure it is put back when the healthy event is fired.
There was a problem hiding this comment.
thanks @sodonnel for the review!
How does this change test this feature?
for(DatanodeInfo node : nodeStateMap.getAllDatanodeInfos()) {
.....
case STALE:
// Move the node to DEAD if the last heartbeat time is less than
// configured dead-node interval.
updateNodeState(node, deadNodeCondition, status,
NodeLifeCycleEvent.TIMEOUT);
......
}
this is the current code in NodeStateManager#checkNodesHealth, and it is the only place where a DEAD event is fired and DeadNodeHandler is called. So we can see here when judging the state of all the nodes , we get all the DatanodeInfos from nodeStateMap directly. when a datonode registers itself to scm, it will be added to nodeStateMap. but let us see nodeStateMap#add.
public void addNode(......)
...........
nodeMap.put(id, new DatanodeInfo(datanodeDetails, nodeStatus,
layoutInfo));
............
}
we can see that a new object is created here and added to the map, so the point here is that, the registered DatanodeDetails is not the one which is stored in nodeMap and got by DeadNodeHandler when DEAD event is fired . when running test , it will lead to the failure of Preconditions.checkState in DeadNodeHandler. so i think the current test code does not notice this , and this is why the change here is made.
I think we need a test (or extend this test) to ensure the node is removed from the topology when the dead handler is called and another test to ensure it is put back when the healthy event is fired.
yea , although i have add a Preconditions.checkState in the handlers , i think it`s better off adding this . i will do it
dd86d36 to
c885d5d
Compare
|
@sodonnel @ChenSammi can you please take a look? |
|
seems a flaky case, not caused by this patch |
|
thanks @sodonnel for the review |
What changes were proposed in this pull request?
Move dead DN out of topology
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5437
How was this patch tested?
unit test