Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-5277. Recon shows operational status as "DECOMMISSIONING" for "DECOMMISSIONED" DNs #2286

Merged
merged 3 commits into from Jun 2, 2021

Conversation

avijayanhwx
Copy link
Contributor

What changes were proposed in this pull request?

Recon relies on DN heartbeats to understand Node operational state. If the node goes down before it reports itself as DECOMMISSIONED, then there is a loss of information on Recon side. It is the SCM that moves a node form DECOMMISSIONING to DECOMMISSIONED state first, then the Datanode persists the state change locally, and then heartbeats with the new state to SCM & Recon subsequently. If the DN is shutdown before it can heartbeat the state change to Recon, then Recon lives with the stale information.

This patch adds a DeadNodeHandler for Recon that updates the node operational status information from SCM if needed. It also adds a step in the Recon SCM Sync Task that updates the node operational status from SCM for every dead node as seen from Recon.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5277

How was this patch tested?

Manually tested using ozone decommissioning CLI.

Aravindan Vijayan added 2 commits May 26, 2021 14:22
@avijayanhwx avijayanhwx requested a review from sodonnel May 26, 2021 23:32
@avijayanhwx avijayanhwx requested a review from smengcl May 27, 2021 16:16
@sodonnel
Copy link
Contributor

sodonnel commented Jun 1, 2021

I think there is another scenario this change won't catch.

  1. Node is put to maintenance with a timeout value.
  2. Node goes to IN_MAINTENANCE and is then stopped and goes dead.
  3. Node does not come back in time, and SCM automatically expires the maintenance, moving the host to IN_SERVICE + DEAD.

In this case the dead node handler would fire once on Recon, and populate the dead nodes array, then the state would be checked and the array cleared. The later transition to IN_SERVICE + DEAD would never get seen by Recon.

I wonder if it would be simpler to forget about the dead node handler on Recon, and just sync up all nodes on SCM with the nodes on Recon on each run (one every 5 minutes I think). Even with a few 1000 nodes, this should not have a big overhead.

In saying that, using the dead node handler is a nice optimisation.

We could ask SCM for a list of nodes which are only in the DEAD state and sync up that group. There is an existing API for that in SCMNodeManager:

 /**
   * Returns all datanode that are in the given states. Passing null for one of
   * of the states acts like a wildcard for that state. This function works by
   * taking a snapshot of the current collection and then returning the list
   * from that collection. This means that real map might have changed by the
   * time we return this list.
   *
   * @param opState The operational state of the node
   * @param health The health of the node
   * @return List of Datanodes that are known to SCM in the requested states.
   */
  @Override
  public List<DatanodeDetails> getNodes(
      NodeOperationalState opState, NodeState health) {
    return nodeStateManager.getNodes(opState, health)
        .stream()
        .map(node -> (DatanodeDetails)node).collect(Collectors.toList());
  }

I think that would cover both scenarios.

@avijayanhwx
Copy link
Contributor Author

I think there is another scenario this change won't catch.

  1. Node is put to maintenance with a timeout value.
  2. Node goes to IN_MAINTENANCE and is then stopped and goes dead.
  3. Node does not come back in time, and SCM automatically expires the maintenance, moving the host to IN_SERVICE + DEAD.

In this case the dead node handler would fire once on Recon, and populate the dead nodes array, then the state would be checked and the array cleared. The later transition to IN_SERVICE + DEAD would never get seen by Recon.

I wonder if it would be simpler to forget about the dead node handler on Recon, and just sync up all nodes on SCM with the nodes on Recon on each run (one every 5 minutes I think). Even with a few 1000 nodes, this should not have a big overhead.

In saying that, using the dead node handler is a nice optimisation.

We could ask SCM for a list of nodes which are only in the DEAD state and sync up that group. There is an existing API for that in SCMNodeManager:

 /**
   * Returns all datanode that are in the given states. Passing null for one of
   * of the states acts like a wildcard for that state. This function works by
   * taking a snapshot of the current collection and then returning the list
   * from that collection. This means that real map might have changed by the
   * time we return this list.
   *
   * @param opState The operational state of the node
   * @param health The health of the node
   * @return List of Datanodes that are known to SCM in the requested states.
   */
  @Override
  public List<DatanodeDetails> getNodes(
      NodeOperationalState opState, NodeState health) {
    return nodeStateManager.getNodes(opState, health)
        .stream()
        .map(node -> (DatanodeDetails)node).collect(Collectors.toList());
  }

I think that would cover both scenarios.

Thanks for the review @sodonnel. The current patch does a general sync through the PipelineSyncTask as well. Doesn't that handle that case? To be fair, the DeadNodeHandler logic that I have added is redundant here.

@sodonnel
Copy link
Contributor

sodonnel commented Jun 1, 2021

I think you are correct and I miss-understood how the change works. I will check again in the morning.

@sodonnel
Copy link
Contributor

sodonnel commented Jun 2, 2021

I understand this better now. The DeadNodeHandler will catch the case where a node is stopped before it reports its own status to Recon.

The sync in PipelineSyncTask would catch the maintenance timing out scenario too.

This change LGTM, +1.

@avijayanhwx avijayanhwx merged commit f3f258a into apache:master Jun 2, 2021
@avijayanhwx
Copy link
Contributor Author

Thanks for the review @sodonnel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants