Skip to content

HDDS-15034. Query SCM status for ozone admin upgrade status command#10084

Open
dombizita wants to merge 8 commits intoapache:HDDS-14496-zdufrom
dombizita:HDDS-15034
Open

HDDS-15034. Query SCM status for ozone admin upgrade status command#10084
dombizita wants to merge 8 commits intoapache:HDDS-14496-zdufrom
dombizita:HDDS-15034

Conversation

@dombizita
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

After #10011 is merged the hardcoded placeholder responses can be removed and connect it to SCM for real values. Based on @errose28's suggestion I used HDDSLayoutVersionManager to check the finalization status of SCM and added a new counter to SCMNodeManager to keep track of the number of DNs finalized and used that for the ozone admin upgrade status output.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15034

How was this patch tested?

Added tests, green CI on my fork: https://github.com/dombizita/ozone/actions/runs/24517013218

@dombizita dombizita requested review from errose28 and sodonnel April 17, 2026 09:13
@github-actions github-actions Bot added the zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496 label Apr 17, 2026
@dombizita
Copy link
Copy Markdown
Contributor Author

Thank you for the review @sodonnel, addressed your comments in the latest commit.

@dombizita
Copy link
Copy Markdown
Contributor Author

Thanks @sodonnel, based on you comments and offline discussion I agree that it's safer and easier to just get the count each time while iterating through the nodes and not store it as a counter, which could go out of sync because of corner cases


static boolean shouldFinalize(UpgradeFinalization.Status scmUpgradeStatus,
int finalizedDatanodes, int healthyDatanodes) {
return UpgradeFinalization.Status.FINALIZATION_REQUIRED.equals(scmUpgradeStatus)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShouldFinalize should be true when SCM is finalized and all DNs are finalized as it is the trigger for other components to finalize. I think the existing enum states are quite confusing:

  public enum Status {
    ALREADY_FINALIZED,          => I think its this value if it startups up already finalized?
    STARTING_FINALIZATION, 
    FINALIZATION_IN_PROGRESS,
    FINALIZATION_DONE,         => It gets here if it was REQUIRED and then gets finalized.
    FINALIZATION_REQUIRED, => Starts up unfinalized.
  }

So I think here we need to check:

if ((UpgradeFinalization.Status.FINALIZATION_DONE.equals(scmUpgradeStatus)
    || UpgradeFinalization.Status.ALREADY_FINALIZED.equals(scmUpgradeStatus))
    && finalizedDatanodes == healthyDatanodes) {
  // Should finalize == true
} else {
  // should finalize == false
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have isScmFinalized above, so we can just reuse it.

So its just scmFinalized && numDBs == finalizedDNs and then can drop this method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, thanks, I only added the check for numDBs == finalizedDNs in the previous round of review, forgot to fix the SCM finalization check, as OM will poll this status.
Shouldn't the CLI output be more specific about this? Like "OM should finalize"?

out().println("Update status:");
out().println(" SCM Finalized: " + status.getScmFinalized());
out().println(" Datanodes finalized: " + status.getNumDatanodesFinalized());
out().println(" Total Datanodes: " + status.getNumDatanodesTotal());
out().println(" Should Finalize: " + status.getShouldFinalize());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea we can tweak the CLI output, as what is there is just kind of a placeholder and we still need to add the JSON part. Possibly the Should Finalize part doesn't even need to be in the human readable part and only in the JSON response. We can consider it more in another PR.

Copy link
Copy Markdown
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version looks good if we get green CI.

Copy link
Copy Markdown
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @dombizita. I don't think we are quite ready to merge yet.

try {
// Only check HEALTHY nodes. STALE/DEAD nodes will be told to
// finalize when they recover.
if (!nodeManager.getNodeStatus(dn).isHealthy()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't address the previous comment because it is only checking the health state returned by getNodeStatus, not the operational state. We can probably just loop over nodeManager.getNodes(NodeStatus.inServiceHealthy()) then not need to skip any entries in the loop.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wonder if we should let any decommission(ed / ing) and maintenance modes finalize if they are healthy (meaning they are still heartbeating). If they are heartbeating they should either finalize or stop heartbeating and go dead. They can be forced to finalize if they ever transition back to in_service but I don't think that needs a re-register so it would be another code path to worry about later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we should heartbeat to all nodes we can reach that they should finalize. We don't need to count all those nodes towards the HDDS finalization exit criteria, but maybe we should. It is simpler to reason that all live nodes have finalized, and we don't need to worry about decom/maintenance at all. I was considering a scenario where a decom node has just been shut down and might block finalization for the full heartbeat timeout, but that is only 10 minutes.

Along the same lines, I'm wondering if we should also count stale nodes towards the total count and wait for them to either finalize or go dead. If we exclude them from the count, we cannot say that all registered nodes have finalized because stale nodes will not have to re-register when they heartbeat again. If we include them, there could be one or a few slow nodes that periodically go stale and are also not processing the commands which would hold up OM.

Mostly I think we just need to make a decision on what states we count and why.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering a scenario where a decom node has just been shut down and might block finalization for the full heartbeat timeout, but that is only 10 minutes.

Yea, I think we can live with that. People should not be upgrading clusters with nodes in strange states IMO, as it just adds to potential problems.

On stale nodes, there are:

stale : unfinalized
stale : finalized

The second we can just count as a finalized node and not worry about it. The second is a problem.

It can either go to dead and we can ignore it, or it can go back to healthy and may or may not finalize and go stale again. I guess if it is able to go stale -> healthy, it should must have heartbeat and hence picked up its finalize command, but it may not process it if the DN is in bad shape, and it may not heartbeat again.

I think our upgrade write path design could handle a node that slips through un-finalized? Or do we put some extra logic in to fence it out until it finalizes? For a stale node, all its Ratis pipelines get closed on transition to stale anyway, so it is effectively out of the write path at that point.

Comment on lines +197 to +199
private final int numFinalizedDatanodes;
private final int totalHealthyDatanodes;
private final int numUnfinalizedDatanodes;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need three fields here? I think we can just track total healthy in-service nodes and number of finalized nodes. The number of unfinalized nodes can be derived from that. These are the only two fields the upgrade status API uses.

The finalization wait code can be changed to check that the healthy node count equals the finalized count. We could also wrap that in an allNodesFinalized method inside DatanodeFinalizationCount.

getNumUnfinalizedDatanodes has no test coverage, but I think it should be removed. getTotalHealthyDatanodes also has no test coverage, which should be added.

}
}

public static DatanodeFinalizationCounts getNumFinalizedDatanodes(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an instance method in NodeManager, not a static method that takes an instance as its first/only argument. DatanodeFinalizationCounts can be moved there as well, or moved to a standalone class.

.setNumDatanodesTotal(10)
.setShouldFinalize(true)
.build();
UpgradeFinalization.Status scmUpgradeStatus = scm.getLayoutVersionManager().getUpgradeState();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This upgrade status enum is going to be removed in the new version manager. We will only use the proto facing version for compatibility with the older APIs. This call can be changed to scm.getLayoutVersionManager().needsFinalization() which easily maps to the same call in the new version manager when we switch. From here we can:

  • Remove the isScmFinalized helper method
  • Add a DatanodeFinalizationCounts#allNodesFinalized method as suggested here
  • Inline the shouldFinalize method to just a variable: shouldFinalize = scmFinalized && datanodeFinalizationCounts.allNodesFinalized()
  • Inline buildUpgradeStatus to invoke the builder directly in this method since the labelled builder setters are clearer than the unlabeled method parameters and it is now just a passthrough.

This minimizes the amount of upgrade specific logic in the server side translator.

Comment on lines +791 to +792
nonFinalizedNode.setPersistedOpState(
HddsProtos.NodeOperationalState.DECOMMISSIONED);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have test coverage for all ineligible health and op states to make the intent clear. It should be fast to enumerate all of them.

HddsProtos.ReplicationFactor.THREE).getContainerInfoList().size());
}

@Test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the simplifications mentioned above I don't think we need all these tests, it looks like the AI went overboard here. These aren't actually testing the queryUpgradeStatus method either. Looking at the other methods in this class it looks like we usually only cover them with integration tests. So in this PR I would switch all test usage of queryUpgradeFinalizationProgress to queryUpgradeStatus for test coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants