Skip to content

HDDS-7524. Compaction DAG node pruning#4045

Merged
smengcl merged 21 commits intoapache:HDDS-6517-Snapshotfrom
hemantk-12:HDDS-7524
Jan 3, 2023
Merged

HDDS-7524. Compaction DAG node pruning#4045
smengcl merged 21 commits intoapache:HDDS-6517-Snapshotfrom
hemantk-12:HDDS-7524

Conversation

@hemantk-12
Copy link
Contributor

@hemantk-12 hemantk-12 commented Dec 5, 2022

What changes were proposed in this pull request?

To generate faster diff between snapshots, we maintain a compaction DAG in memory. Whenever compaction happens, related SST file nodes get added to the DAG. Over time, DAG will keep on increasing and may cause memory pressure or become a bottleneck. To solve this, we can prune the unnecessary SST file nodes from the DAG since we have a concept of the oldest snapshot with compaction history.
Related doc: https://docs.google.com/document/d/19ZOT-lY2WS8nDj1s9HGVyUY3kgnKIfv-Eck--SokaDU/

This change proposes the traversal and pruning of the DAG.
Idea here is to remove the nodes and arcs which were created before snapshot, to be deleted, was created because they are not needed to generate the diff anymore.
pruneBackwardDag prunes the backward DAG and removes the nodes and arcs from the upstream of the level at which snapshot was taken.
pruneForwardDag prunes the forward DAG and removes the nodes and arcs from the downstream of the level at which snapshot was taken.

Let's take an example of the following diagram (Backward DAG).

reverseGraph

Snapshots were taken at level1, level-3 and level-5
Snapshot-1: 000015.sst, 000013.sst, 000011.sst, 000009.sst
Snapshot-2: 000027.sst, 000030.sst, 000028.sst, 000031.sst, 000029.sst, 000039.sst, 000037.sst, 000035.sst, 000033.sst
Snapshot-3: 000059.sst, 000055.sst, 000056.sst, 000060.sst, 000057.sst, 000058.sst

If Snapshot-1 and Snapshot-2 need to be pruned, we can simply prune upstream of level-3 in backward DAG and downstream of level-3 in Forward DAG. So all the nodes from level-1, level-2 and level-3 and arcs between level-1 & level-2, level-2 & level-3 and outgoing/incoming arcs from/to level-3 will be removed from both forward and backward DAGs.

What is the link to the Apache JIRA

How was this patch tested?

  • Unit tests.

@prashantpogde prashantpogde added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Dec 6, 2022
@hemantk-12 hemantk-12 marked this pull request as ready for review December 12, 2022 20:12
Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hemantk-12 . Core logic looks good to me. Some comments inline.

Addressed cosmetic and indentation comments.

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
@smengcl smengcl merged commit 9fe6d10 into apache:HDDS-6517-Snapshot Jan 3, 2023
@smengcl
Copy link
Contributor

smengcl commented Jan 3, 2023

Thanks @hemantk-12 for the patch. Thanks @GeorgeJahad and @prashantpogde for reviewing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants