HDDS-13960. Convert FlatResource Lock into DAG based lock ordering for the snapshot lock#9328
Conversation
jojochuang
left a comment
There was a problem hiding this comment.
The PR needs to rebase. Apart from a very small variable name suggestion, and the test code change which I don't understand, the rest is starightforward.
| for (Resource otherResource : resources) { | ||
| String[] otherResourceName = generateResourceName(otherResource); | ||
| String[] flatResourceName = generateResourceName(flatResource); | ||
| String[] flatResourceName = generateResourceName(dagLeveledResource); |
There was a problem hiding this comment.
nit update variable name
| String[] flatResourceName = generateResourceName(dagLeveledResource); | |
| String[] dagResourceName = generateResourceName(dagLeveledResource); |
| CompletableFuture<Void> future = new CompletableFuture<>(); | ||
| CompletableFuture.runAsync(() -> { | ||
| try (BatchOperation batchOperation = getOmMetadataManager().getStore().initBatchOperation()) { | ||
| response.addToDBBatch(getOmMetadataManager(), batchOperation); | ||
| getOmMetadataManager().getStore().commitBatchOperation(batchOperation); | ||
| } catch (IOException e) { | ||
| future.completeExceptionally(e); | ||
| return; | ||
| } | ||
| future.complete(null); | ||
| }); | ||
| future.get(); |
There was a problem hiding this comment.
i don't understand this change. It made the original code asynchronous but then it waits for the result to return, so it has essentially the same effect?
There was a problem hiding this comment.
oh this is because the now the lock acquisition can only happen in a different thread. The test thread currently holds lock for snapshot which would hold Snapshot_DB_LOCK and now that we are preventing SNAPSHOT_CONTENT_LOCK to be acquired while SNAPSHOT_DB_LOCK is held by the same thread the test thread cannot flush the response. Separate thread would be able to run it.
| * | ||
| * The enum implements the {@code Resource} interface, providing a name for identification and | ||
| * an associated {@link IOzoneManagerLock.ResourceManager} to manage its locking behavior. | ||
| */ |
There was a problem hiding this comment.
A little more explanation will be nice.
Add: DAGLeveledResource defines the order in which resources can be locked. Attempting to lock the resources out of order will throw a RuntimeException. For instance, acquiring SNAPSHOT_DB_CONTENT_LOCK followed by SNAPSHOT_DB_LOCK is allowed; acquiring SNAPSHOT_LOCAL_DATA_LOCK followed by SNAPSHOT_DB_CONTENT_LOCK is not permitted.
| /** | ||
| * Performs a depth-first traversal (DFS) on the directed acyclic graph (DAG) | ||
| * of {@link DAGLeveledResource} objects, processing dependencies and updating | ||
| * the lock-dependent adjacency set with resource relationships. |
There was a problem hiding this comment.
After executing this method, lockDependentAdjacencySet is populated with the transitive set of dependent child resources of a given resource type.
…r the snapshot lock Change-Id: If951c89d3977bc711d083da7c26778beee46dab9
cfb516e to
cc56136
Compare
Change-Id: I42ef3debf4fda6738dad382b534eb38871b50db5
7b94620 to
6925ce6
Compare
Change-Id: I01cd34eaf9e86f3e9776da81b4494aab4bad7802
|
thank you @jojochuang for reviewing the patch |
What changes were proposed in this pull request?
The current leveled resource implementation is not ideal for having an implementation since it only works on mask which can mean that only one resource can be there at one level. However for more complex locking levels this is not sufficient. Proposal here is to convert FlatResource into DAGLeveledResource with which one can enforce more complex ordering to ensure lock ordering.
This would be useful for ensuring SNAPSHOT_DB_CONTENT_LOCK is always acquired before SNAPSHOT_DB_HANDLE_LOCK. This is to avoid any sort of deadlock later on. The same implementation can be also extended to bootstrap lock.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13960
How was this patch tested?
Added unit test