-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29350: Ensure Cleanup of Continuous Backup WALs After Last Backup is Force Deleted #7090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses HBASE-29350 by ensuring that continuous backup WALs are properly cleaned up after a last backup is force deleted. Key changes include:
- Enhancements to test coverage by adding tests for backup deletion cleanup and single backup force deletion.
- Refactoring of backup deletion logic with helper methods for cleanup of WALs, removal of continuous backup metadata, and disabling of replication peers.
- Updates to production code in BackupCommands and BackupSystemTable to improve backup cleanup procedures.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupDeleteWithCleanup.java | Added new setup/teardown logic and tests to verify force deletion cleanups. |
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupBase.java | Introduced an overloaded createBackupRequest to support continuous backup settings. |
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java | Changed deletion of backup metadata from addColumns to addColumn per table entry. |
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java | Added helper methods for disabling replication peers, removing backup table metadata, and deleting WAL files during cleanup. |
Comments suppressed due to low confidence (2)
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java:924
- Consider using a logging framework (e.g., Log4j or SLF4J) instead of System.out.println for improved control over logging levels and output consistency.
System.out.println("No full backups found. Cleaning up all WALs and disabling replication peer.");
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java:1013
- Consider replacing System.out.println with a proper logging mechanism to ensure warnings are handled consistently in production.
System.out.println("WARNING: Failed to delete contents under WAL directory: " + backupWalDir + ". Error: " + e.getMessage());
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just few minor changes.
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java
Show resolved
Hide resolved
System.out.println("Deleted all contents under WAL directory: " + backupWalDir); | ||
} | ||
} catch (IOException e) { | ||
System.out.println("WARNING: Failed to delete contents under WAL directory: " + backupWalDir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
System.out.println("WARNING: Failed to delete contents under WAL directory: " + backupWalDir | |
System.err.println("WARNING: Failed to delete contents under WAL directory: " + backupWalDir |
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java
Show resolved
Hide resolved
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupBase.java
Outdated
Show resolved
Hide resolved
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupDeleteWithCleanup.java
Show resolved
Hide resolved
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupDeleteWithCleanup.java
Outdated
Show resolved
Hide resolved
FileSystem fs = FileSystem.get(conf); | ||
Path walPath = new Path(backupWalDir); | ||
if (fs.exists(walPath)) { | ||
FileStatus[] contents = fs.listStatus(walPath); | ||
for (FileStatus item : contents) { | ||
fs.delete(item.getPath(), true); // recursive delete of each child | ||
} | ||
System.out.println("Deleted all contents under WAL directory: " + backupWalDir); | ||
} | ||
} catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit and out of scope.
there are potential problems of this delete wal directory if the amount of WALs are huge list of files.
- if there is many paths need to be removed for WAL, this may take long on non-HDFS filesystem (but our use cases are always using HDFS)
- we don't have any knowledge if this may take times and we don't have a preview for deleting what files. also without the instrument of LOGGER DEBUG to tell if something is being deleted, this lacks visibility..
I found most of other code did the same, so, just FYI I'm wondered we should introduce LOGGER DEBUG and see if that would at least tell the CLI user that something is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right — the backup and restore framework needs proper logging. I've already created a JIRA for this: HBASE-29374. We'll address this issue as part of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I agree with @taklwu's comment about how we don't necessarily know how long a WAL dir will take to delete due to what its size could be.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
TestPointInTimeRestore is not working , do you know why ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but let's clarify the failure of TestPointInTimeRestore before moving further.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
These tests are passing on my local system, so I’m not sure why they’re failing here. I did notice the following in the logs: |
but in the https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7090/3/testReport/ , I saw the test
|
da1e2f5
to
a1e6bfb
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
No description provided.