Skip to content

HBASE-29350: Ensure Cleanup of Continuous Backup WALs After Last Backup is Force Deleted #7090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 23, 2025

Conversation

vinayakphegde
Copy link
Contributor

No description provided.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@taklwu taklwu requested a review from Copilot June 11, 2025 22:57
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses HBASE-29350 by ensuring that continuous backup WALs are properly cleaned up after a last backup is force deleted. Key changes include:

  • Enhancements to test coverage by adding tests for backup deletion cleanup and single backup force deletion.
  • Refactoring of backup deletion logic with helper methods for cleanup of WALs, removal of continuous backup metadata, and disabling of replication peers.
  • Updates to production code in BackupCommands and BackupSystemTable to improve backup cleanup procedures.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupDeleteWithCleanup.java Added new setup/teardown logic and tests to verify force deletion cleanups.
hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupBase.java Introduced an overloaded createBackupRequest to support continuous backup settings.
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupSystemTable.java Changed deletion of backup metadata from addColumns to addColumn per table entry.
hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java Added helper methods for disabling replication peers, removing backup table metadata, and deleting WAL files during cleanup.
Comments suppressed due to low confidence (2)

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java:924

  • Consider using a logging framework (e.g., Log4j or SLF4J) instead of System.out.println for improved control over logging levels and output consistency.
System.out.println("No full backups found. Cleaning up all WALs and disabling replication peer.");

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BackupCommands.java:1013

  • Consider replacing System.out.println with a proper logging mechanism to ensure warnings are handled consistently in production.
System.out.println("WARNING: Failed to delete contents under WAL directory: " + backupWalDir + ". Error: " + e.getMessage());

Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just few minor changes.

System.out.println("Deleted all contents under WAL directory: " + backupWalDir);
}
} catch (IOException e) {
System.out.println("WARNING: Failed to delete contents under WAL directory: " + backupWalDir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
System.out.println("WARNING: Failed to delete contents under WAL directory: " + backupWalDir
System.err.println("WARNING: Failed to delete contents under WAL directory: " + backupWalDir

Comment on lines 1003 to 1027
FileSystem fs = FileSystem.get(conf);
Path walPath = new Path(backupWalDir);
if (fs.exists(walPath)) {
FileStatus[] contents = fs.listStatus(walPath);
for (FileStatus item : contents) {
fs.delete(item.getPath(), true); // recursive delete of each child
}
System.out.println("Deleted all contents under WAL directory: " + backupWalDir);
}
} catch (IOException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit and out of scope.

there are potential problems of this delete wal directory if the amount of WALs are huge list of files.

  1. if there is many paths need to be removed for WAL, this may take long on non-HDFS filesystem (but our use cases are always using HDFS)
  2. we don't have any knowledge if this may take times and we don't have a preview for deleting what files. also without the instrument of LOGGER DEBUG to tell if something is being deleted, this lacks visibility..

I found most of other code did the same, so, just FYI I'm wondered we should introduce LOGGER DEBUG and see if that would at least tell the CLI user that something is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — the backup and restore framework needs proper logging. I've already created a JIRA for this: HBASE-29374. We'll address this issue as part of that.

Copy link

@kgeisz kgeisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I agree with @taklwu's comment about how we don't necessarily know how long a WAL dir will take to delete due to what its size could be.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@taklwu
Copy link
Contributor

taklwu commented Jun 17, 2025

TestPointInTimeRestore is not working , do you know why ?

Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's clarify the failure of TestPointInTimeRestore before moving further.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@vinayakphegde
Copy link
Contributor Author

LGTM, but let's clarify the failure of TestPointInTimeRestore before moving further.

These tests are passing on my local system, so I’m not sure why they’re failing here. I did notice the following in the logs:
TestTimedOut: test timed out after 780 seconds.
This might be the cause of the failure.

@taklwu
Copy link
Contributor

taklwu commented Jun 18, 2025

but in the https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7090/3/testReport/ , I saw the test org.apache.hadoop.hbase.backup.TestPointInTimeRestore. failed as assertion error

java.lang.AssertionError: Backup should succeed expected:<0> but was:<1>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.apache.hadoop.hbase.backup.TestPointInTimeRestore.setUpBackups(TestPointInTimeRestore.java:101)
	at org.apache.hadoop.hbase.backup.TestPointInTimeRestore.setupBeforeClass(TestPointInTimeRestore.java:70)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:840)

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-28957 Compile Tests _
+1 💚 mvninstall 3m 7s HBASE-28957 passed
+1 💚 compile 0m 31s HBASE-28957 passed
-0 ⚠️ checkstyle 0m 9s /buildtool-branch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 0m 28s HBASE-28957 passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 6s the patch passed
+1 💚 compile 0m 31s the patch passed
+1 💚 javac 0m 31s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 9s /buildtool-patch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 0m 36s the patch passed
+1 💚 hadoopcheck 12m 1s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
-1 ❌ spotless 0m 49s patch has 1 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
30m 24s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7090/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7090
JIRA Issue HBASE-29350
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 57077e93a0f0 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / 73cb513
Default Java Eclipse Adoptium-17.0.11+9
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7090/5/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7090/5/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-28957 Compile Tests _
+1 💚 mvninstall 3m 3s HBASE-28957 passed
+1 💚 compile 0m 19s HBASE-28957 passed
+1 💚 javadoc 0m 13s HBASE-28957 passed
+1 💚 shadedjars 5m 55s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 0m 20s the patch passed
+1 💚 javac 0m 20s the patch passed
+1 💚 javadoc 0m 13s the patch passed
+1 💚 shadedjars 6m 0s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 22m 44s /patch-unit-hbase-backup.txt hbase-backup in the patch failed.
43m 20s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7090/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7090
JIRA Issue HBASE-29350
Optional Tests javac javadoc unit compile shadedjars
uname Linux 9f5009e5d52f 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / 73cb513
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7090/5/testReport/
Max. process+thread count 3614 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7090/5/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@taklwu taklwu merged commit f0ec33b into apache:HBASE-28957 Jun 23, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants