Skip to content

HDDS-15166. Increased cleanup interval and purged partial reports during cleanup.#10250

Open
SaketaChalamchala wants to merge 4 commits into
apache:masterfrom
SaketaChalamchala:HDDS-15166
Open

HDDS-15166. Increased cleanup interval and purged partial reports during cleanup.#10250
SaketaChalamchala wants to merge 4 commits into
apache:masterfrom
SaketaChalamchala:HDDS-15166

Conversation

@SaketaChalamchala
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

After HDDS-14829 snapshot diff cleanup has been decoupled with snapshot diff job re-submission. So, the job does not need to be aggressively cleaned up to allow re-submission and risk missing the actual job status because it has been cleaned up.
If a snapshot diff job generates a partial report before it is classified as failed / is cancelled then, the cleanup skips removal of the report entries because totalDiffEntries is not updated for these jobs and cleanup only cleans jobs with totalDiffEntries > 0.

Proposed to

  1. Increase the snapshot diff cleanup interval to 1h. Job status of failed/rejected/cancelled jobs can be retrieved for up to 1h before it is cleaned up. Job status of completed job is still retained for the configured time.
  2. Diff cleanup cleans up all entries of jobs in the purged table regardless of whether totalDiffEntries > 0 i.e., cleanup job does a deleteRange for all jobs in the purged table.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15166

How was this patch tested?

Unit tests

@SaketaChalamchala SaketaChalamchala added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label May 12, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts snapshot diff cleanup behavior so failed/rejected/cancelled jobs remain queryable longer and report cleanup also removes partial reports for purged jobs with zero recorded diff entries.

Changes:

  • Increases the default snapshot diff cleanup interval from 1 minute to 60 minutes.
  • Updates cleanup logic to delete report ranges for every purged job, regardless of totalDiffEntries.
  • Adds a unit test covering partial report cleanup for a zero-entry failed job.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDiffCleanupService.java Updates cleanup accounting/logging and removes the totalDiffEntries > 0 guard before report range deletion.
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/service/TestSnapshotDiffCleanupService.java Adds cleanup coverage for zero-entry purged jobs and drops test column families during teardown.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/OMConfigKeys.java Changes the default cleanup interval constant to 60 minutes.
hadoop-hdds/common/src/main/resources/ozone-default.xml Updates the default cleanup interval configuration value to 60m.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hadoop-hdds/common/src/main/resources/ozone-default.xml
@jojochuang jojochuang marked this pull request as ready for review May 15, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants