HDDS-14387. CLI tool to repair scm deletedBlocksTxn summary counters#9981
HDDS-14387. CLI tool to repair scm deletedBlocksTxn summary counters#9981priyeshkaratha wants to merge 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an admin CLI workflow to compare and repair SCM deleted-blocks transaction summary counters by scanning a RocksDB checkpoint and optionally applying a repair to bring counters back in sync.
Changes:
- Introduces
--detailand--repairmodes forozone admin scm deletedBlocksTxn summaryto show checkpoint comparison and apply repairs with confirmation. - Extends SCM client/server protocols and protobufs to support checkpoint-based comparison and repair RPCs.
- Adds unit/integration tests and supporting SCM plumbing for summary persistence/repair.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/shell/TestDeletedBlocksTxnShell.java | Adds integration tests for --detail, --repair, and checkpoint summary APIs. |
| hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/ozone/admin/scm/GetDeletedBlockSummarySubcommand.java | Implements CLI --detail/--repair flows and output formatting. |
| hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/ContainerOperationClient.java | Exposes new checkpoint compare/repair calls through CLI client. |
| hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/block/TestDeletedBlockLog.java | Adds unit tests around checkpoint comparison and reset/repair helpers. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/ozone/audit/SCMAction.java | Adds audit actions for new checkpoint summary operations. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java | Adds server-side RPC handlers for checkpoint summary compare/repair with audit logging. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java | Wires new request types through server-side PB translator. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHADBTransactionBufferStub.java | Implements new hasPendingOperations() method for stub. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHADBTransactionBufferImpl.java | Implements hasPendingOperations() based on pending flush counter. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHADBTransactionBuffer.java | Adds hasPendingOperations() API for pending DB operations visibility. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMDeletedBlockTransactionStatusManager.java | Adds resetToSummary() helper to update in-memory counters. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogStateManagerImpl.java | Adds replicated updateSummaryInDb() helper to persist corrected summary. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogStateManager.java | Adds replicated updateSummaryInDb() API. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java | Implements checkpoint scan/compare/repair logic and summary persistence. |
| hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLog.java | Adds checkpoint summary comparison API and result container type. |
| hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto | Adds new request/response protos and command types for checkpoint compare/repair. |
| hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java | Adds client-side PB translator calls for the new operations. |
| hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java | Extends protocol interface with checkpoint compare/repair methods. |
| hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/client/ScmClient.java | Extends ScmClient interface with checkpoint compare/repair methods. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @Test | ||
| public void testGetTransactionSummaryFromCheckpointRepairAppliedWhenDiffExists() | ||
| throws Exception { | ||
| int txCount = 8; | ||
| addTransactions(generateData(txCount), true); | ||
|
|
||
| // Corrupt the in-memory state to simulate a counter drift. | ||
| SCMDeletedBlockTransactionStatusManager statusManager = | ||
| deletedBlockLog.getSCMDeletedBlockTransactionStatusManager(); | ||
| HddsProtos.DeletedBlocksTransactionSummary drifted = | ||
| HddsProtos.DeletedBlocksTransactionSummary.newBuilder() | ||
| .setTotalTransactionCount(999) | ||
| .setTotalBlockCount(9999) | ||
| .setTotalBlockSize(999_999L) | ||
| .setTotalBlockReplicatedSize(9_999_999L) | ||
| .build(); | ||
| statusManager.resetToSummary(drifted); | ||
| assertEquals(999, statusManager.getTransactionSummary().getTotalTransactionCount()); | ||
|
|
||
| // The repair should detect: cpActual(8) != cpPersisted(8) -> no diff from checkpoint | ||
| // perspective, so liveInMemBefore != liveInMemAfter only when cpActual != cpPersisted. | ||
| // In this test the checkpoint is internally consistent (actual == persisted), | ||
| // so hasDiff == false and no repair is applied from checkpoint alone. | ||
| // This validates that resetToSummary and hasDiff work together correctly. | ||
| DeletedBlockLog.CheckpointSummaryResult result = | ||
| deletedBlockLog.getTransactionSummaryFromCheckpoint(true); | ||
|
|
||
| // liveInMemBefore should reflect our drifted value. | ||
| assertEquals(999, result.getLiveInMemBefore().getTotalTransactionCount()); | ||
| // Since cpActual == cpPersisted (both 8), diff == 0, no repair applied. | ||
| assertEquals(999, result.getLiveInMemAfter().getTotalTransactionCount()); | ||
| } |
There was a problem hiding this comment.
Test name and assertions don't match: testGetTransactionSummaryFromCheckpointRepairAppliedWhenDiffExists sets an in-memory drift but then asserts no repair is applied because cpActual == cpPersisted. Rename the test (or change setup) so the name reflects the behavior being validated.
| // The repair should detect: cpActual(8) != cpPersisted(8) -> no diff from checkpoint | ||
| // perspective, so liveInMemBefore != liveInMemAfter only when cpActual != cpPersisted. | ||
| // In this test the checkpoint is internally consistent (actual == persisted), | ||
| // so hasDiff == false and no repair is applied from checkpoint alone. |
There was a problem hiding this comment.
The comment says cpActual(8) != cpPersisted(8) but the intended point is that they are equal (both 8) and therefore diff==0. Please correct this comment to avoid confusion about the checkpoint-diff logic.
| // The repair should detect: cpActual(8) != cpPersisted(8) -> no diff from checkpoint | |
| // perspective, so liveInMemBefore != liveInMemAfter only when cpActual != cpPersisted. | |
| // In this test the checkpoint is internally consistent (actual == persisted), | |
| // so hasDiff == false and no repair is applied from checkpoint alone. | |
| // The repair should detect that cpActual(8) == cpPersisted(8), so there is | |
| // no diff from the checkpoint perspective. In this test the checkpoint is | |
| // internally consistent (actual == persisted), so hasDiff == false and no | |
| // repair is applied from checkpoint alone. |
| @Override | ||
| public void execute(ScmClient client) throws IOException { | ||
| HddsProtos.DeletedBlocksTransactionSummary summary = client.getDeletedBlockSummary(); | ||
| if (summary == null) { | ||
| System.out.println("DeletedBlocksTransaction summary is not available"); | ||
| if (repair) { | ||
| executeRepair(client); | ||
| } else if (detail) { | ||
| executeDetail(client); | ||
| } else { | ||
| System.out.println("DeletedBlocksTransaction summary:"); | ||
| System.out.println(" Total number of transactions: " + | ||
| summary.getTotalTransactionCount()); | ||
| System.out.println(" Total number of blocks: " + | ||
| summary.getTotalBlockCount()); | ||
| System.out.println(" Total size of blocks: " + | ||
| summary.getTotalBlockSize()); | ||
| System.out.println(" Total replicated size of blocks: " + | ||
| summary.getTotalBlockReplicatedSize()); | ||
| executeDefault(client); | ||
| } |
There was a problem hiding this comment.
--detail and --repair can both be specified at the same time; the current implementation silently prioritizes --repair. Consider making these options mutually exclusive (eg via picocli ArgGroup/exclusive or an explicit validation in execute) and emitting a clear error message when both are set.
| private void executeRepair(ScmClient client) throws IOException { | ||
| System.out.println("Taking RocksDB checkpoint and scanning deletedBlocks table ..."); | ||
| RepairDeletedBlocksTxnSummaryFromCheckpointResponseProto resp = | ||
| client.getDeletedBlockSummaryFromCheckpoint(); | ||
| printComparisonTable(resp); | ||
|
|
||
| DeletedBlocksTransactionSummary actual = | ||
| resp.hasCheckpointActual() ? resp.getCheckpointActual() : null; | ||
| DeletedBlocksTransactionSummary persisted = | ||
| resp.hasCheckpointPersisted() ? resp.getCheckpointPersisted() : null; | ||
|
|
||
| if (!hasDiff(actual, persisted)) { | ||
| System.out.println("\nCheckpoint is consistent and nothing to repair."); | ||
| return; | ||
| } | ||
|
|
||
| System.out.println("\nDiff (Checkpoint-Actual minus Checkpoint-Persisted):"); | ||
| printDiff(actual, persisted); | ||
|
|
||
| System.out.print("\nApply this diff to live in-memory counters? [y/N]: "); | ||
| System.out.flush(); | ||
| String answer = new Scanner(System.in, StandardCharsets.UTF_8.name()).nextLine().trim(); | ||
| if (!answer.equalsIgnoreCase("y") && !answer.equalsIgnoreCase("yes")) { | ||
| System.out.println("Aborted. No changes applied."); | ||
| return; | ||
| } | ||
|
|
||
| System.out.println("Applying diff ..."); | ||
| RepairDeletedBlocksTxnSummaryFromCheckpointResponseProto repaired = | ||
| client.repairDeletedBlockSummaryFromCheckpoint(); | ||
|
|
There was a problem hiding this comment.
In --repair mode the code prints a diff from getDeletedBlockSummaryFromCheckpoint(), but then calls repairDeletedBlockSummaryFromCheckpoint() which takes a fresh checkpoint on the server. If deletedBlocks changes between the two RPCs, the applied repair may not correspond to the diff the operator confirmed. Consider combining compare+repair into a single RPC (eg include a checkpoint token / expected persisted summary / computed diff in the repair request) so the repair is applied to the exact snapshot that was presented.
| System.out.print("\nApply this diff to live in-memory counters? [y/N]: "); | ||
| System.out.flush(); | ||
| String answer = new Scanner(System.in, StandardCharsets.UTF_8.name()).nextLine().trim(); | ||
| if (!answer.equalsIgnoreCase("y") && !answer.equalsIgnoreCase("yes")) { | ||
| System.out.println("Aborted. No changes applied."); | ||
| return; |
There was a problem hiding this comment.
The confirmation prompt reads from System.in via new Scanner(...).nextLine() without handling EOF / closed stdin. In non-interactive environments this can throw and terminate the CLI; consider guarding for NoSuchElementException/IllegalStateException and treating it as a "no" response (or providing a --yes/--no-prompt flag).
| final Map<String, String> auditMap = Maps.newHashMap(); | ||
| auditMap.put("remoteUser", remoteUser.getUserName()); | ||
| try { | ||
| getScm().checkAdminAccess(remoteUser, false); |
There was a problem hiding this comment.
getDeletedBlockSummaryFromCheckpoint() is a read operation but it calls checkAdminAccess(remoteUser, false), which blocks read-only SCM admins. If the intent is to allow read-only admins to run --detail, pass isRead=true here (keep false for the repair RPC).
| getScm().checkAdminAccess(remoteUser, false); | |
| getScm().checkAdminAccess(remoteUser, true); |
| Table<Long, DeletedBlocksTransaction> cpDeletedTable = | ||
| org.apache.hadoop.hdds.scm.metadata.SCMDBDefinition.DELETED_BLOCKS.getTable(cpStore); | ||
| try (Table.KeyValueIterator<Long, DeletedBlocksTransaction> iter = | ||
| cpDeletedTable.iterator()) { | ||
| while (iter.hasNext()) { | ||
| DeletedBlocksTransaction tx = iter.next().getValue(); | ||
| if (tx != null && tx.hasTotalBlockSize()) { | ||
| cpTxCount++; | ||
| cpBlockCount += tx.getLocalIDCount(); | ||
| cpBlockSize += tx.getTotalBlockSize(); | ||
| if (tx.hasTotalBlockReplicatedSize()) { | ||
| cpReplicatedSize += tx.getTotalBlockReplicatedSize(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Checkpoint scan currently counts a transaction only when tx.hasTotalBlockSize(). totalBlockSize is optional in DeletedBlocksTransaction (eg when STORAGE_SPACE_DISTRIBUTION isn't finalized / older OM writes), so this can undercount cpTxCount/cpBlockCount and produce incorrect comparisons/repairs. Count txns/blocks regardless, and treat missing size/replicatedSize as 0 when summing.
| transactionStatusManager.resetToSummary(newSummary); | ||
| deletedBlockLogStateManager.updateSummaryInDb(newSummary); |
There was a problem hiding this comment.
doRepair() resets the in-memory counters before persisting the corrected summary via updateSummaryInDb(). If the DB update throws (eg RocksDB/codec error), the method will exit with in-memory state changed but the persisted summary unchanged, potentially worsening drift. Persist first (or rollback on failure) so in-memory and DB remain consistent under errors.
| transactionStatusManager.resetToSummary(newSummary); | |
| deletedBlockLogStateManager.updateSummaryInDb(newSummary); | |
| deletedBlockLogStateManager.updateSummaryInDb(newSummary); | |
| transactionStatusManager.resetToSummary(newSummary); |
What changes were proposed in this pull request?
This pull request improves how administrators manage deleted blocks in Apache Ozone.
It introduces a new command-line tool that helps detect and fix mismatches in the deleted blocks summary. The tool works by taking a snapshot (checkpoint) of the database and comparing the actual deleted blocks with what the system thinks is deleted.
If it finds any differences, it clearly shows them to the user and can also fix the issue with confirmation. This makes it easier to maintain accurate data and improves the reliability of the system.
What is the link to the Apache JIRA
HDDS-14387
How was this patch tested?
Tested using unit test and manually