Skip to content

HDDS-8314. [Snapshot] SnapDiff job and compaction DAG/SST file pruning synchronization#4553

Merged
prashantpogde merged 9 commits intoapache:masterfrom
hemantk-12:HDDS-8314
Apr 27, 2023
Merged

HDDS-8314. [Snapshot] SnapDiff job and compaction DAG/SST file pruning synchronization#4553
prashantpogde merged 9 commits intoapache:masterfrom
hemantk-12:HDDS-8314

Conversation

@hemantk-12
Copy link
Contributor

@hemantk-12 hemantk-12 commented Apr 9, 2023

What changes were proposed in this pull request?

Currently, it is possible that when snapDiff job is running we may loose some of compaction DAG if snapshots timing is overlapping with time by which snapshot becomes stale in compaction DAG.
Other thing could happen is that DAG returned some SST file/s as diff but those files are removed by RocksDBCheckpointDiffer#pruneOlderSnapshotsWithCompactionHistory while reading them to generate diff report.

For example, if one or both of the snapshots of snapDiff job are 30 days old and compaction DAG pruning service is removing 30 days older snapshots, we could be in that situation.

This change is to achieve the synchronization between compaction DAG update (appending and pruning) and SnapDiff job so that snapDiff report is complete and correct instead of partial and incorrect.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8314

How was this patch tested?

Existing tests as of now. More tests will be added as part of HDDS-8315 and HDDS-8389.

@hemantk-12 hemantk-12 force-pushed the HDDS-8314 branch 3 times, most recently from 03bb2b4 to 338a274 Compare April 10, 2023 16:21
@smengcl smengcl added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Apr 13, 2023
@hemantk-12 hemantk-12 marked this pull request as ready for review April 20, 2023 22:02
* to DAG), compaction DAG pruning job (to removes older snapshot's from DAG)
* or a snap diff job (reads compaction DAG).
*/
private final Object compactionDagLock = new Object();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make RocksDBCheckpointDiffer a singleton class otherwise this could still be a problem with future changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a good idea to make it singleton in this case because RocksDBCheckpointDiffer object is based on RocksDB's db directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Changing it to singleton will protect from multiple instances get created for RocksDBCheckpointDiffer. We want to have only one object of RocksDBCheckpointDiffer throughout OM process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing RocksDBCheckpointDiffer to simply singleton, most of OM unit and integration tests failed because each test is creating a new instance of OmMetadataManagerImpl which initializes RBDStore and RocksDBCheckpointDiffer. I tired to fix the tests by creating only one instance of OmMetadataManagerImpl per test class but that doesn’t work either. Assertions are failing in the case.

https://github.com/hemantk-12/ozone/actions/runs/4813598710/jobs/8570328203

One way to fix this, is to have one instance of per RocksDBCheckpointDiffer RocksDB dir and keep it in memory. Which solves the unit test failure and is close to what we want achieve. I made the changes accordingly.

Copy link
Contributor

@prashantpogde prashantpogde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than couple minor comments, the overall changes look good.

@prashantpogde
Copy link
Contributor

Thanks @hemantk-12 for the patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants