Skip to content

HDDS-6962. [Snapshot] Background Service to delete irrelevant SST files in a snapshot.#3883

Merged
sadanand48 merged 17 commits intoapache:HDDS-6517-Snapshotfrom
sadanand48:HDDS-6962
Dec 6, 2022
Merged

HDDS-6962. [Snapshot] Background Service to delete irrelevant SST files in a snapshot.#3883
sadanand48 merged 17 commits intoapache:HDDS-6517-Snapshotfrom
sadanand48:HDDS-6962

Conversation

@sadanand48
Copy link
Contributor

@sadanand48 sadanand48 commented Oct 25, 2022

What changes were proposed in this pull request?

This PR introduces a background service SSTFilteringService to delete irrelevant SST files of a snapshot, It makes use of RocksDB#deleteFile
API to do so which only permits the deletion of last level of SST files ie if there are n levels of SST's at a given point , the service will delete only nth level SST's. This is still beneficial as the last level SST's are the bulkiest and deletion of these can help save space. On successful deletion , it will update a marker file filtered-snapshots and write the snapshot ID of the processed Snapshot to it. Only the keyTable, fileTable & directoryTable keys are deleted as they can grow very large compared to other tables.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6962

How was this patch tested?

Unit tests

  • Background Service to cleanup SST Files
  • Filter out and add only relevant SST Files from Compaction DAG output

@sadanand48 sadanand48 marked this pull request as ready for review October 25, 2022 14:56
@sadanand48 sadanand48 changed the title HDDS-6962.[Snapshot] Background Service to delete irrelevant SST files in a snapshot. HDDS-6962. [Snapshot] Background Service to delete irrelevant SST files in a snapshot. Oct 25, 2022
@sadanand48 sadanand48 marked this pull request as draft October 25, 2022 17:06
smengcl added a commit to smengcl/hadoop-ozone that referenced this pull request Nov 2, 2022
@GeorgeJahad
Copy link
Contributor

This is still beneficial as the last level SST's are the bulkiest

This surprises me. I always thought the first level was the bulkiest. Is there any documentation you can point me to?

@sadanand48
Copy link
Contributor Author

sadanand48 commented Nov 3, 2022

This surprises me. I always thought the first level was the bulkiest. Is there any documentation you can point me to?

Compaction is triggered when number of files in a level reaches a threshold size and the threshold size increases as the levels increase so the last level files will be biggest in size
https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#levels-target-size

Sadanand Shenoy added 7 commits November 4, 2022 01:20
(cherry picked from commit d4bf31123a243b4569bbd0ecd66a382ce13c126c)
(cherry picked from commit ebb11917f0a08732e366bf60ab811740cc9d355d)
@sadanand48 sadanand48 marked this pull request as ready for review November 9, 2022 11:52
smengcl added a commit to smengcl/hadoop-ozone that referenced this pull request Nov 9, 2022
…elete irrelevant SST files in a snapshot

https: //github.com/apache/pull/3883
@sadanand48 sadanand48 added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Nov 24, 2022
@prashantpogde
Copy link
Contributor

@sadanand48 can we resolve the pending comments and merge this PR ?

Copy link
Contributor

@prashantpogde prashantpogde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sadanand48 sadanand48 merged commit f77dfa6 into apache:HDDS-6517-Snapshot Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants