Skip to content

HBASE-29296 Missing critical snapshot expiration checks #6970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

dParikesit
Copy link

Jira: HBASE-29296

In HBase it is crucial to prevent expired snapshots returned to clients to ensure correctness. There have been existing efforts (e.g., HBASE-27671 and HBASE-28704) adding snapshot expiration checks in different scenarios to avoid such issues. However, we found such protection is not consistent. Specifically, several operations still miss such checks in the latest hbase version (5dafa9e). Their patterns are similar to the previous tickets mentioned above. In practice, we observed expired snapshots still returning to clients successfully without generating any alarms.

We have written test cases to prove these issues can be reproduced successfully (see attached). We also attach the manual steps in case anyone is interested.

Your insights are very much appreciated. We will continue following up this issue until it is resolved.

Reproducing steps (3 scenarios in total)

  1. Restore
    Doing a restore on full backup will succeed even if the snapshot has expired. This expiration can happen if during the backup, hbase.master.snapshot.ttl was set.

Steps to reproduce this bug:
A. Start an HBase cluster, and set hbase.master.snapshot.ttl config value
B. Create a table
C. Create a full backup using hbase backup create full hdfs://host5:9000/data/backup -t tableName
D. Wait until the snapshot has expired
E. Restore the table using hbase restore hdfs://host5:9000/data/backup <backup_id>
F. Check that the table is restored successfully

We propose to add a snapshot expiration check on RestoreTool.java:createAndRestoreTable to prevent this issue.

  1. Incremental backup
    Incremental backup is done based on a previous full backup. Incremental backup will succeed even if the full backup has expired.

Steps to reproduce this bug:
A. Start an HBase cluster, and set hbase.master.snapshot.ttl config value
B. Create a table
C. Create a full backup using hbase backup create full hdfs://host5:9000/data/backup -t tableName
D. Wait until the snapshot has expired
E. Create an incremental backup using hbase backup create incremental hdfs://host5:9000/data/backup -t tableName
F. Check that the backup succeed

We propose to add a snapshot expiration check on IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.

  1. Snapshot procedure

We found that it is possible to create a snapshot with a TTL value so low that it will expire before the SnapshotProcedure has finished. The SnapshotProcedure will finish normally as if the snapshot is fine.

Steps to reproduce this bug:
A. Start an HBase cluster and create a table
B. Create a snapshot using hbase shell with TTL=1
snapshot 'mytable', 'snapshot1234', {TTL => 1}
C. Check that the command finished without an error, and the snapshot has expired

This behavior is only possible if the user accidentally sets the TTL to be too low or if the SnapshotProcedure is interrupted after the SNAPSHOT_WRITE_SNAPSHOT_INFO but before it’s fully finished.

We propose to add an expiration check in the SNAPSHOT_COMPLETE_SNAPSHOT phase right before the snapshot is marked as completed to ensure that the snapshot hasn’t expired before the SnapshotProcedure is considered successfully finished.

Granted, we’re not sure whether this one is a bug or an intended behavior

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@dParikesit
Copy link
Author

Hi @guluo2016 and @Apache9, could you help confirming this bug? It is related to HBASE-28704 that you participated in before.

Thank you for your time.

snapshot.getCreationTime(), EnvironmentEdgeManager.currentTime())
) {
throw new SnapshotTTLExpiredException(ProtobufUtil.createSnapshotDesc(snapshot));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!
For case 3:
I think we don't need to perform checks now, because even if the snapshot has expired, as long as we perform proper checks during the restore process, it should be fine. Let's hear what @Apache9 thinks about it.

If we go with your approach, i think we should also add the same validation logic in SnapshotManager.java, since HBase also supports creating snapshots without using snapshot procedure, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!

I agree that it might be fine because we perform checks during the restore process. I just thought that redundant check would make it more consistent. Let's hear what @Apache9 thinks of it.

If we go with your approach, i think we should also add the same validation logic in SnapshotManager.java, since HBase also supports creating snapshots without using snapshot procedure, right?

Oh that's right, I missed that one. Thanks for noticing it, I'll put the check if we agree that preemptive checking is needed

@guluo2016
Copy link
Contributor

Hi @dParikesit , Could you please continue updating the code with your approach? I'd be happy to review this PR.
Thanks

@guluo2016 guluo2016 requested a review from Copilot July 4, 2025 14:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds missing snapshot TTL expiration checks across snapshot procedures, backup restore, and incremental backup flows to prevent expired snapshots from being returned to clients. It also includes new tests to validate early expiration behavior.

  • Introduces SnapshotTTLExpiredException checks in SnapshotProcedure, RestoreTool, and IncrementalTableBackupClient
  • Adds new unit test TestSnapshotProcedureEarlyExpiration for snapshot procedures
  • Expands backup/restore tests in TestBackupRestoreExpiry for full and incremental expiration scenarios

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotProcedure.java Throw SnapshotTTLExpiredException in the SNAPSHOT_COMPLETE_SNAPSHOT state if the snapshot has expired
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/RestoreTool.java Add expiration check before restoring table metadata and snapshots
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java Add expiration check in verifyCfCompatibility for incremental backups
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSnapshotProcedureEarlyExpiration.java New test ensuring a snapshot fails if it expires during the procedure
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupRestoreExpiry.java Tests for restore and incremental backup expiration errors
Comments suppressed due to low confidence (2)

hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupRestoreExpiry.java:107

  • [nitpick] Test method names should start with a lowercase letter to follow Java naming conventions (e.g., testSequentially).
  public void TestSequentially() throws Exception {

hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupRestoreExpiry.java:106

  • [nitpick] Combining both full and incremental expiration scenarios in a single @Test method reduces isolation. Split into two @Test methods for clearer test reporting and isolation of failures.
  @Test

if (
SnapshotDescriptionUtils.isExpiredSnapshot(snapshot.getTtl(),
snapshot.getCreationTime(), EnvironmentEdgeManager.currentTime())
) {
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The TTL expiration check is duplicated in multiple classes. Consider extracting a shared helper method to centralize this logic and reduce code duplication.

Copilot uses AI. Check for mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this suggestion is a good idea because it is easier to see the thrown exception directly in the caller function. What do you think?

@dParikesit
Copy link
Author

Great! I have added the check in TakeSnapshotHandler.java (which is called from SnapshotManager.java). Thanks for reviewing.

@dParikesit dParikesit requested a review from guluo2016 July 5, 2025 03:03
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 26s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for branch
+1 💚 mvninstall 3m 6s master passed
+1 💚 compile 3m 36s master passed
+1 💚 checkstyle 0m 46s master passed
+1 💚 spotbugs 1m 57s master passed
+1 💚 spotless 0m 46s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 2m 58s the patch passed
+1 💚 compile 3m 36s the patch passed
-0 ⚠️ javac 0m 29s /results-compile-javac-hbase-backup.txt hbase-backup generated 1 new + 124 unchanged - 0 fixed = 125 total (was 124)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 44s the patch passed
+1 💚 spotbugs 2m 11s the patch passed
+1 💚 hadoopcheck 11m 14s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 42s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
40m 9s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6970
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 567e34cecea8 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8676f9a
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-server hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for branch
+1 💚 mvninstall 3m 18s master passed
+1 💚 compile 1m 16s master passed
+1 💚 javadoc 0m 42s master passed
+1 💚 shadedjars 6m 9s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 14s the patch passed
+1 💚 compile 1m 17s the patch passed
+1 💚 javac 1m 17s the patch passed
+1 💚 javadoc 0m 41s the patch passed
+1 💚 shadedjars 6m 4s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 216m 58s hbase-server in the patch passed.
+1 💚 unit 10m 57s hbase-backup in the patch passed.
256m 30s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6970
Optional Tests javac javadoc unit compile shadedjars
uname Linux dc08a543bda3 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 8676f9a
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/3/testReport/
Max. process+thread count 5429 (vs. ulimit of 30000)
modules C: hbase-server hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6970/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

} finally {
EnvironmentEdgeManager.reset();
backupAdmin.close();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The org.junit.Assert.assertThrows() method efficiently and concisely tests exception scenarios, outperforming direct try-catch usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants