Skip to content

HDDS-13960. Convert FlatResource Lock into DAG based lock ordering for the snapshot lock#9328

Merged
swamirishi merged 3 commits intoapache:masterfrom
swamirishi:HDDS-13960
Nov 25, 2025
Merged

HDDS-13960. Convert FlatResource Lock into DAG based lock ordering for the snapshot lock#9328
swamirishi merged 3 commits intoapache:masterfrom
swamirishi:HDDS-13960

Conversation

@swamirishi
Copy link
Contributor

What changes were proposed in this pull request?

The current leveled resource implementation is not ideal for having an implementation since it only works on mask which can mean that only one resource can be there at one level. However for more complex locking levels this is not sufficient. Proposal here is to convert FlatResource into DAGLeveledResource with which one can enforce more complex ordering to ensure lock ordering.

This would be useful for ensuring SNAPSHOT_DB_CONTENT_LOCK is always acquired before SNAPSHOT_DB_HANDLE_LOCK. This is to avoid any sort of deadlock later on. The same implementation can be also extended to bootstrap lock.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13960

How was this patch tested?

Added unit test

@swamirishi swamirishi added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Nov 19, 2025
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR needs to rebase. Apart from a very small variable name suggestion, and the test code change which I don't understand, the rest is starightforward.

for (Resource otherResource : resources) {
String[] otherResourceName = generateResourceName(otherResource);
String[] flatResourceName = generateResourceName(flatResource);
String[] flatResourceName = generateResourceName(dagLeveledResource);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit update variable name

Suggested change
String[] flatResourceName = generateResourceName(dagLeveledResource);
String[] dagResourceName = generateResourceName(dagLeveledResource);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +167 to +178
CompletableFuture<Void> future = new CompletableFuture<>();
CompletableFuture.runAsync(() -> {
try (BatchOperation batchOperation = getOmMetadataManager().getStore().initBatchOperation()) {
response.addToDBBatch(getOmMetadataManager(), batchOperation);
getOmMetadataManager().getStore().commitBatchOperation(batchOperation);
} catch (IOException e) {
future.completeExceptionally(e);
return;
}
future.complete(null);
});
future.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand this change. It made the original code asynchronous but then it waits for the result to return, so it has essentially the same effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is because the now the lock acquisition can only happen in a different thread. The test thread currently holds lock for snapshot which would hold Snapshot_DB_LOCK and now that we are preventing SNAPSHOT_CONTENT_LOCK to be acquired while SNAPSHOT_DB_LOCK is held by the same thread the test thread cannot flush the response. Separate thread would be able to run it.

*
* The enum implements the {@code Resource} interface, providing a name for identification and
* an associated {@link IOzoneManagerLock.ResourceManager} to manage its locking behavior.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little more explanation will be nice.

Add: DAGLeveledResource defines the order in which resources can be locked. Attempting to lock the resources out of order will throw a RuntimeException. For instance, acquiring SNAPSHOT_DB_CONTENT_LOCK followed by SNAPSHOT_DB_LOCK is allowed; acquiring SNAPSHOT_LOCAL_DATA_LOCK followed by SNAPSHOT_DB_CONTENT_LOCK is not permitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/**
* Performs a depth-first traversal (DFS) on the directed acyclic graph (DAG)
* of {@link DAGLeveledResource} objects, processing dependencies and updating
* the lock-dependent adjacency set with resource relationships.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After executing this method, lockDependentAdjacencySet is populated with the transitive set of dependent child resources of a given resource type.

…r the snapshot lock

Change-Id: If951c89d3977bc711d083da7c26778beee46dab9
Change-Id: I42ef3debf4fda6738dad382b534eb38871b50db5
Change-Id: I01cd34eaf9e86f3e9776da81b4494aab4bad7802
@swamirishi swamirishi marked this pull request as ready for review November 25, 2025 12:24
@swamirishi swamirishi merged commit bc06db9 into apache:master Nov 25, 2025
55 checks passed
@swamirishi
Copy link
Contributor Author

thank you @jojochuang for reviewing the patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments