-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6186] Fix lock identity in InProcessLockProvider #8658
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
...lient-common/src/test/java/org/apache/hudi/client/transaction/TestInProcessLockProvider.java
Show resolved
Hide resolved
...lient-common/src/test/java/org/apache/hudi/client/transaction/TestInProcessLockProvider.java
Outdated
Show resolved
Hide resolved
...lient-common/src/test/java/org/apache/hudi/client/transaction/TestInProcessLockProvider.java
Outdated
Show resolved
Hide resolved
...lient-common/src/test/java/org/apache/hudi/client/transaction/TestInProcessLockProvider.java
Show resolved
Hide resolved
...lient-common/src/test/java/org/apache/hudi/client/transaction/TestInProcessLockProvider.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestions on making the test deterministic
LOG.info("Writer 2 tries to acquire the lock."); | ||
writer2TryLock.set(true); | ||
lockProvider2.lock(); | ||
LOG.info("Writer 2 acquires the lock."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDL for writer2 lock acquisition : count down by 1.
}); | ||
writer2Locked.set(true); | ||
|
||
while (!writer3TryLock.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CDL for writer3 start. await for it
}); | ||
|
||
Thread writer3 = new Thread(() -> { | ||
while (!writer2Locked.get() || !writer1Completed.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait for CDL for writer2 lock.
CDL for writer3 start : count down by 1
Synced with @nsivabalan and the test is good enough now, as the count down latch may not solve the problem in the locking in this case. |
This commit fixes a bug introduced by apache#6847. apache#6847 extends the InProcessLockProvider to support multiple tables in the same process, by having an in-memory static final map storing the mapping of the table base path to the read-write reentrant lock, so that the writer uses the corresponding lock based on the base path. When closing the lock provider, close() removes the lock entry. Since close() is called when closing the write client, the lock is removed and subsequent concurrent writers will get a different lock instance on the same table, causing the locking mechanism on the same table to be useless. The fix gets rid of the lock removal operation in the `close()` call since it has to be kept for concurrent writers. A new test `TestInProcessLockProvider#testLockIdentity` based on the above scenario is added to guard the behavior.
This commit fixes a bug introduced by apache#6847. apache#6847 extends the InProcessLockProvider to support multiple tables in the same process, by having an in-memory static final map storing the mapping of the table base path to the read-write reentrant lock, so that the writer uses the corresponding lock based on the base path. When closing the lock provider, close() removes the lock entry. Since close() is called when closing the write client, the lock is removed and subsequent concurrent writers will get a different lock instance on the same table, causing the locking mechanism on the same table to be useless. The fix gets rid of the lock removal operation in the `close()` call since it has to be kept for concurrent writers. A new test `TestInProcessLockProvider#testLockIdentity` based on the above scenario is added to guard the behavior.
This commit fixes a bug introduced by apache#6847. apache#6847 extends the InProcessLockProvider to support multiple tables in the same process, by having an in-memory static final map storing the mapping of the table base path to the read-write reentrant lock, so that the writer uses the corresponding lock based on the base path. When closing the lock provider, close() removes the lock entry. Since close() is called when closing the write client, the lock is removed and subsequent concurrent writers will get a different lock instance on the same table, causing the locking mechanism on the same table to be useless. The fix gets rid of the lock removal operation in the `close()` call since it has to be kept for concurrent writers. A new test `TestInProcessLockProvider#testLockIdentity` based on the above scenario is added to guard the behavior.
Change Logs
This PR fixes a bug introduced by #6847. #6847 extends the
InProcessLockProvider
to support multiple tables in the same process, by having an in-memory static final map storing the mapping of the table base path to the read-write reentrant lock, so that the writer uses the corresponding lock based on the base path. When closing the lock provider,close()
removes the lock entry. Sinceclose()
is called when closing the write client, the lock is removed and subsequent concurrent writers will get a different lock instance on the same table, causing the locking mechanism on the same table to be useless. Take the following example where three writers write to the same table concurrently and need to acquire the in-process lock:after Writer 1 releases the lock and closes the lock provider, the lock instance is removed from the map, and Writer 3 will get a different lock instance compared to Writer 2, making the lock ineffective.
The fix gets rid of the lock removal operation in the
close()
call since it has to be kept for concurrent writers.This bug is uncovered while investigating the flaky test
testArchivalWithMultiWriters
(HUDI-6176) where concurrent cannot properly do archival due to the ineffective locking mechanism, although archival is guarded by locks.A new test
TestInProcessLockProvider#testLockIdentity
based on the above scenario is added to guard the behavior. Before this fix, the test failed because of varying lock instances (see logs below); after the fix, the test succeeds.Impact
Fixes a bug in
InProcessLockProvider
that may make in-process lock useless on the same table.Risk level
low
Documentation Update
We need to update the release notes to mention this regression.
Contributor's checklist