HDDS-8314. [Snapshot] SnapDiff job and compaction DAG/SST file pruning synchronization by hemantk-12 · Pull Request #4553 · apache/ozone

hemantk-12 · 2023-04-09T00:06:10Z

What changes were proposed in this pull request?

Currently, it is possible that when snapDiff job is running we may loose some of compaction DAG if snapshots timing is overlapping with time by which snapshot becomes stale in compaction DAG.
Other thing could happen is that DAG returned some SST file/s as diff but those files are removed by RocksDBCheckpointDiffer#pruneOlderSnapshotsWithCompactionHistory while reading them to generate diff report.

For example, if one or both of the snapshots of snapDiff job are 30 days old and compaction DAG pruning service is removing 30 days older snapshots, we could be in that situation.

This change is to achieve the synchronization between compaction DAG update (appending and pruning) and SnapDiff job so that snapDiff report is complete and correct instead of partial and incorrect.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8314

How was this patch tested?

Existing tests as of now. More tests will be added as part of HDDS-8315 and HDDS-8389.

prashantpogde · 2023-04-25T18:43:17Z

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java

+   * to DAG), compaction DAG pruning job (to removes older snapshot's from DAG)
+   * or a snap diff job (reads compaction DAG).
+   */
+  private final Object compactionDagLock = new Object();


We should make RocksDBCheckpointDiffer a singleton class otherwise this could still be a problem with future changes.

I don't think it is a good idea to make it singleton in this case because RocksDBCheckpointDiffer object is based on RocksDB's db directory.

Discussed offline. Changing it to singleton will protect from multiple instances get created for RocksDBCheckpointDiffer. We want to have only one object of RocksDBCheckpointDiffer throughout OM process.

After changing RocksDBCheckpointDiffer to simply singleton, most of OM unit and integration tests failed because each test is creating a new instance of OmMetadataManagerImpl which initializes RBDStore and RocksDBCheckpointDiffer. I tired to fix the tests by creating only one instance of OmMetadataManagerImpl per test class but that doesn’t work either. Assertions are failing in the case.

https://github.com/hemantk-12/ozone/actions/runs/4813598710/jobs/8570328203

One way to fix this, is to have one instance of per RocksDBCheckpointDiffer RocksDB dir and keep it in memory. Which solves the unit test failure and is close to what we want achieve. I made the changes accordingly.

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java

prashantpogde

Other than couple minor comments, the overall changes look good.

…urn unique object per DB dir.

prashantpogde · 2023-04-27T20:21:04Z

Thanks @hemantk-12 for the patch.

hemantk-12 force-pushed the HDDS-8314 branch 3 times, most recently from 03bb2b4 to 338a274 Compare April 10, 2023 16:21

SnapDiff job and compaction DAG/SST file pruning synchronization

e656786

hemantk-12 force-pushed the HDDS-8314 branch from 338a274 to e656786 Compare April 10, 2023 22:12

smengcl added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Apr 13, 2023

Merge branch 'master' into HDDS-8314

f729d36

hemantk-12 marked this pull request as ready for review April 20, 2023 22:02

Fixed checkstyle

3122fb2

prashantpogde reviewed Apr 25, 2023

View reviewed changes

...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java Outdated Show resolved Hide resolved

prashantpogde approved these changes Apr 25, 2023

View reviewed changes

hemantk-12 added 6 commits April 26, 2023 14:30

Changed RocksDBCheckpointDiffer to singleton pattern

f0106a5

Merge branch 'master' into HDDS-8314

7c47de6

Fixed the build failure after resolving conflicts

5079197

Change the singleton implementation of RocksDBCheckpointDiffer to ret…

c582460

…urn unique object per DB dir.

Removed compactionDagLock and used this to synchronization

b27bcde

Fixed unit test

372dedc

prashantpogde merged commit c20881a into apache:master Apr 27, 2023

hemantk-12 mentioned this pull request Aug 3, 2023

HDDS-9118. Added an invalidateCacheEntry() in RocksDBCheckpointDifferHolder to close cached instance of RocksDBCheckpointDiffer #5145

Merged

hemantk-12 deleted the HDDS-8314 branch October 28, 2024 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-8314. [Snapshot] SnapDiff job and compaction DAG/SST file pruning synchronization#4553

HDDS-8314. [Snapshot] SnapDiff job and compaction DAG/SST file pruning synchronization#4553
prashantpogde merged 9 commits intoapache:masterfrom
hemantk-12:HDDS-8314

hemantk-12 commented Apr 9, 2023 •

edited

Loading

Uh oh!

prashantpogde Apr 25, 2023

Uh oh!

hemantk-12 Apr 26, 2023

Uh oh!

hemantk-12 Apr 26, 2023

Uh oh!

hemantk-12 Apr 27, 2023

Uh oh!

Uh oh!

prashantpogde left a comment

Uh oh!

prashantpogde commented Apr 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hemantk-12 commented Apr 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

prashantpogde Apr 25, 2023

Choose a reason for hiding this comment

Uh oh!

hemantk-12 Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

hemantk-12 Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

hemantk-12 Apr 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prashantpogde left a comment

Choose a reason for hiding this comment

Uh oh!

prashantpogde commented Apr 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hemantk-12 commented Apr 9, 2023 •

edited

Loading