HDDS-7524. Compaction DAG node pruning#4045
Merged
smengcl merged 21 commits intoapache:HDDS-6517-Snapshotfrom Jan 3, 2023
Merged
HDDS-7524. Compaction DAG node pruning#4045smengcl merged 21 commits intoapache:HDDS-6517-Snapshotfrom
smengcl merged 21 commits intoapache:HDDS-6517-Snapshotfrom
Conversation
smengcl
reviewed
Dec 14, 2022
Contributor
smengcl
left a comment
There was a problem hiding this comment.
Thanks @hemantk-12 . Core logic looks good to me. Some comments inline.
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStoreBuilder.java
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Outdated
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Outdated
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Outdated
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Outdated
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
...-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java
Outdated
Show resolved
Hide resolved
Addressed cosmetic and indentation comments. Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
smengcl
reviewed
Dec 14, 2022
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
Outdated
Show resolved
Hide resolved
GeorgeJahad
reviewed
Dec 14, 2022
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Outdated
Show resolved
Hide resolved
smengcl
reviewed
Dec 21, 2022
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
smengcl
approved these changes
Dec 22, 2022
Contributor
|
Thanks @hemantk-12 for the patch. Thanks @GeorgeJahad and @prashantpogde for reviewing this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
To generate faster diff between snapshots, we maintain a compaction DAG in memory. Whenever compaction happens, related SST file nodes get added to the DAG. Over time, DAG will keep on increasing and may cause memory pressure or become a bottleneck. To solve this, we can prune the unnecessary SST file nodes from the DAG since we have a concept of the oldest snapshot with compaction history.
Related doc: https://docs.google.com/document/d/19ZOT-lY2WS8nDj1s9HGVyUY3kgnKIfv-Eck--SokaDU/
This change proposes the traversal and pruning of the DAG.
Idea here is to remove the nodes and arcs which were created before snapshot, to be deleted, was created because they are not needed to generate the diff anymore.
pruneBackwardDagprunes the backward DAG and removes the nodes and arcs from the upstream of the level at which snapshot was taken.pruneForwardDagprunes the forward DAG and removes the nodes and arcs from the downstream of the level at which snapshot was taken.Let's take an example of the following diagram (Backward DAG).
Snapshots were taken at level1, level-3 and level-5
Snapshot-1: 000015.sst, 000013.sst, 000011.sst, 000009.sst
Snapshot-2: 000027.sst, 000030.sst, 000028.sst, 000031.sst, 000029.sst, 000039.sst, 000037.sst, 000035.sst, 000033.sst
Snapshot-3: 000059.sst, 000055.sst, 000056.sst, 000060.sst, 000057.sst, 000058.sst
If Snapshot-1 and Snapshot-2 need to be pruned, we can simply prune upstream of level-3 in backward DAG and downstream of level-3 in Forward DAG. So all the nodes from level-1, level-2 and level-3 and arcs between level-1 & level-2, level-2 & level-3 and outgoing/incoming arcs from/to level-3 will be removed from both forward and backward DAGs.
What is the link to the Apache JIRA
How was this patch tested?