-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29296 Missing critical snapshot expiration checks #6970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hi @guluo2016 and @Apache9, could you help confirming this bug? It is related to HBASE-28704 that you participated in before. Thank you for your time. |
snapshot.getCreationTime(), EnvironmentEdgeManager.currentTime()) | ||
) { | ||
throw new SnapshotTTLExpiredException(ProtobufUtil.createSnapshotDesc(snapshot)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
For case 3:
I think we don't need to perform checks now, because even if the snapshot has expired, as long as we perform proper checks during the restore process, it should be fine. Let's hear what @Apache9 thinks about it.
If we go with your approach, i think we should also add the same validation logic in SnapshotManager.java
, since HBase also supports creating snapshots without using snapshot procedure, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review!
I agree that it might be fine because we perform checks during the restore process. I just thought that redundant check would make it more consistent. Let's hear what @Apache9 thinks of it.
If we go with your approach, i think we should also add the same validation logic in SnapshotManager.java, since HBase also supports creating snapshots without using snapshot procedure, right?
Oh that's right, I missed that one. Thanks for noticing it, I'll put the check if we agree that preemptive checking is needed
Hi @dParikesit , Could you please continue updating the code with your approach? I'd be happy to review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds missing snapshot TTL expiration checks across snapshot procedures, backup restore, and incremental backup flows to prevent expired snapshots from being returned to clients. It also includes new tests to validate early expiration behavior.
- Introduces
SnapshotTTLExpiredException
checks inSnapshotProcedure
,RestoreTool
, andIncrementalTableBackupClient
- Adds new unit test
TestSnapshotProcedureEarlyExpiration
for snapshot procedures - Expands backup/restore tests in
TestBackupRestoreExpiry
for full and incremental expiration scenarios
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SnapshotProcedure.java | Throw SnapshotTTLExpiredException in the SNAPSHOT_COMPLETE_SNAPSHOT state if the snapshot has expired |
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/RestoreTool.java | Add expiration check before restoring table metadata and snapshots |
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java | Add expiration check in verifyCfCompatibility for incremental backups |
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSnapshotProcedureEarlyExpiration.java | New test ensuring a snapshot fails if it expires during the procedure |
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupRestoreExpiry.java | Tests for restore and incremental backup expiration errors |
Comments suppressed due to low confidence (2)
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupRestoreExpiry.java:107
- [nitpick] Test method names should start with a lowercase letter to follow Java naming conventions (e.g.,
testSequentially
).
public void TestSequentially() throws Exception {
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupRestoreExpiry.java:106
- [nitpick] Combining both full and incremental expiration scenarios in a single
@Test
method reduces isolation. Split into two@Test
methods for clearer test reporting and isolation of failures.
@Test
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupRestoreExpiry.java
Outdated
Show resolved
Hide resolved
if ( | ||
SnapshotDescriptionUtils.isExpiredSnapshot(snapshot.getTtl(), | ||
snapshot.getCreationTime(), EnvironmentEdgeManager.currentTime()) | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The TTL expiration check is duplicated in multiple classes. Consider extracting a shared helper method to centralize this logic and reduce code duplication.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this suggestion is a good idea because it is easier to see the thrown exception directly in the caller function. What do you think?
Great! I have added the check in |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
} finally { | ||
EnvironmentEdgeManager.reset(); | ||
backupAdmin.close(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The org.junit.Assert.assertThrows()
method efficiently and concisely tests exception scenarios, outperforming direct try-catch
usage.
Jira: HBASE-29296
In HBase it is crucial to prevent expired snapshots returned to clients to ensure correctness. There have been existing efforts (e.g., HBASE-27671 and HBASE-28704) adding snapshot expiration checks in different scenarios to avoid such issues. However, we found such protection is not consistent. Specifically, several operations still miss such checks in the latest hbase version (5dafa9e). Their patterns are similar to the previous tickets mentioned above. In practice, we observed expired snapshots still returning to clients successfully without generating any alarms.
We have written test cases to prove these issues can be reproduced successfully (see attached). We also attach the manual steps in case anyone is interested.
Your insights are very much appreciated. We will continue following up this issue until it is resolved.
Reproducing steps (3 scenarios in total)
Doing a restore on full backup will succeed even if the snapshot has expired. This expiration can happen if during the backup,
hbase.master.snapshot.ttl
was set.Steps to reproduce this bug:
A. Start an HBase cluster, and set
hbase.master.snapshot.ttl
config valueB. Create a table
C. Create a full backup using
hbase backup create full hdfs://host5:9000/data/backup -t tableName
D. Wait until the snapshot has expired
E. Restore the table using
hbase restore hdfs://host5:9000/data/backup <backup_id>
F. Check that the table is restored successfully
We propose to add a snapshot expiration check on RestoreTool.java:createAndRestoreTable to prevent this issue.
Incremental backup is done based on a previous full backup. Incremental backup will succeed even if the full backup has expired.
Steps to reproduce this bug:
A. Start an HBase cluster, and set
hbase.master.snapshot.ttl
config valueB. Create a table
C. Create a full backup using
hbase backup create full hdfs://host5:9000/data/backup -t tableName
D. Wait until the snapshot has expired
E. Create an incremental backup using
hbase backup create incremental hdfs://host5:9000/data/backup -t tableName
F. Check that the backup succeed
We propose to add a snapshot expiration check on IncrementalTableBackupClient.java:verifyCfCompatibility to prevent this issue.
We found that it is possible to create a snapshot with a TTL value so low that it will expire before the SnapshotProcedure has finished. The SnapshotProcedure will finish normally as if the snapshot is fine.
Steps to reproduce this bug:
A. Start an HBase cluster and create a table
B. Create a snapshot using hbase shell with TTL=1
snapshot 'mytable', 'snapshot1234', {TTL => 1}
C. Check that the command finished without an error, and the snapshot has expired
This behavior is only possible if the user accidentally sets the TTL to be too low or if the SnapshotProcedure is interrupted after the
SNAPSHOT_WRITE_SNAPSHOT_INFO
but before it’s fully finished.We propose to add an expiration check in the
SNAPSHOT_COMPLETE_SNAPSHOT
phase right before the snapshot is marked as completed to ensure that the snapshot hasn’t expired before the SnapshotProcedure is considered successfully finished.Granted, we’re not sure whether this one is a bug or an intended behavior