Skip to content

HDDS-14699. Fix orphan snapshot versions handling when snapshot chain tableKey mapping is stale#9810

Merged
jojochuang merged 4 commits intoapache:masterfrom
smengcl:HDDS-14699-fix-orphan-check
Feb 24, 2026
Merged

HDDS-14699. Fix orphan snapshot versions handling when snapshot chain tableKey mapping is stale#9810
jojochuang merged 4 commits intoapache:masterfrom
smengcl:HDDS-14699-fix-orphan-check

Conversation

@smengcl
Copy link
Contributor

@smengcl smengcl commented Feb 23, 2026

What changes were proposed in this pull request?

In isSnapshotPurged() check, snapshot chain tableKey returning null should not be the sole indicator for judging whether the snapshot is still active or not.

isSnapshotPurged() incorrectly returning true causes checkOrphanSnapshotVersions() to incorrectly removing active snapshot's YAML metadata (in OmSnapshotLocalDataManagerService runs). This in turn causes NPE in CacheLoader when attempting to load the snapshot.

  • Fixed isSnapshotPurged() check
  • Added relevant debug logging messages that might be helpful at some point
  • Add entries to snapshotIdToTableKey in case of Raft replay in OMSnapshotCreateResponse#addToDBBatch (which was a bug / an oversight previously)
  • Check getMeta() nullness first, throw OMEx if null instead of NPE

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14699

How was this patch tested?

  • Added test testCheckOrphanSnapshotVersionsWithStaleSnapshotChain
  • Manually tested downstream that with the patch, the issue does not appear in the system test suite again

@smengcl smengcl requested a review from jojochuang February 23, 2026 18:23
@smengcl smengcl self-assigned this Feb 23, 2026
Copilot AI review requested due to automatic review settings February 23, 2026 18:23
@smengcl smengcl added bug Something isn't working snapshot https://issues.apache.org/jira/browse/HDDS-6517 labels Feb 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where active snapshots were incorrectly marked as purged when the SnapshotChainManager's snapshotIdToTableKey map became stale, leading to NullPointerException in the snapshot cache loader.

Changes:

  • Fixed isSnapshotPurged() to fall back to transactionInfo when tableKey lookup returns null, treating snapshots without purge transaction info as active
  • Added addSnapshotToTableKey() method to update the snapshotIdToTableKey map during Raft log replay in OMSnapshotCreateResponse#addToDBBatch
  • Added debug logging throughout SnapshotChainManager to aid in diagnosing snapshot chain synchronization issues

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
OmSnapshotManager.java Fixed isSnapshotPurged() logic to handle null tableKey by falling back to transactionInfo check; added debug logging
SnapshotChainManager.java Added addSnapshotToTableKey() method for Raft replay; enhanced debug logging in getTableKey(), addSnapshot(), and removeFromSnapshotIdToTable()
OMSnapshotCreateResponse.java Added call to addSnapshotToTableKey() in addToDBBatch() to maintain snapshotIdToTableKey map during Raft replay
TestOmSnapshotLocalDataManager.java Added regression test testCheckOrphanSnapshotVersionsWithStaleSnapshotChain() to verify fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from the debug log message, the rest is good to go

@jojochuang jojochuang merged commit 56b9b80 into apache:master Feb 24, 2026
89 of 90 checks passed
@jojochuang
Copy link
Contributor

Thanks @smengcl merged.

@smengcl
Copy link
Contributor Author

smengcl commented Feb 25, 2026

Thanks @jojochuang for reviewing and merging this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants