Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6186] Fix lock identity in InProcessLockProvider #8658

Merged
merged 3 commits into from
May 8, 2023

Conversation

yihua
Copy link
Contributor

@yihua yihua commented May 7, 2023

Change Logs

This PR fixes a bug introduced by #6847. #6847 extends the InProcessLockProvider to support multiple tables in the same process, by having an in-memory static final map storing the mapping of the table base path to the read-write reentrant lock, so that the writer uses the corresponding lock based on the base path. When closing the lock provider, close() removes the lock entry. Since close() is called when closing the write client, the lock is removed and subsequent concurrent writers will get a different lock instance on the same table, causing the locking mechanism on the same table to be useless. Take the following example where three writers write to the same table concurrently and need to acquire the in-process lock:

Writer 1:   lock |----------------| unlock and close
Writer 2:   try lock   |      ...       lock |-------------| unlock and close
Writer 3:                                    try lock  | ...  lock |------| unlock and close

after Writer 1 releases the lock and closes the lock provider, the lock instance is removed from the map, and Writer 3 will get a different lock instance compared to Writer 2, making the lock ineffective.

The fix gets rid of the lock removal operation in the close() call since it has to be kept for concurrent writers.

This bug is uncovered while investigating the flaky test testArchivalWithMultiWriters (HUDI-6176) where concurrent cannot properly do archival due to the ineffective locking mechanism, although archival is guarded by locks.

A new test TestInProcessLockProvider#testLockIdentity based on the above scenario is added to guard the behavior. Before this fix, the test failed because of varying lock instances (see logs below); after the fix, the test succeeds.

1510 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.TestInProcessLockProvider [] - Writer 1 tries to acquire the lock.
1521 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 0, Read locks = 0], Thread ForkJoinPool-1-worker-9, In-process lock state ACQUIRING
1521 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 1, Read locks = 0], Thread ForkJoinPool-1-worker-9, In-process lock state ACQUIRED
1521 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.TestInProcessLockProvider [] - Writer 1 acquires the lock.
1523 [Thread-1] INFO  org.apache.hudi.client.transaction.TestInProcessLockProvider [] - Writer 2 tries to acquire the lock.
1523 [Thread-1] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 1, Read locks = 0], Thread Thread-1, In-process lock state ACQUIRING
1623 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 1, Read locks = 0], Thread ForkJoinPool-1-worker-9, In-process lock state RELEASING
1624 [Thread-1] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 1, Read locks = 0], Thread Thread-1, In-process lock state ACQUIRED
1624 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 0, Read locks = 0], Thread ForkJoinPool-1-worker-9, In-process lock state RELEASED
1624 [Thread-1] INFO  org.apache.hudi.client.transaction.TestInProcessLockProvider [] - Writer 2 acquires the lock.
1624 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.TestInProcessLockProvider [] - Writer 1 releases the lock.
1624 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 1, Read locks = 0], Thread ForkJoinPool-1-worker-9, In-process lock state ALREADY_RELEASED
1624 [ForkJoinPool-1-worker-9] INFO  org.apache.hudi.client.transaction.TestInProcessLockProvider [] - Writer 1 closes the lock provider.
Exception in thread "Thread-2" junit.framework.AssertionFailedError: The lock instance in Writer 3 should be held by Writer 2: java.util.concurrent.locks.ReentrantReadWriteLock@6b563e78[Write locks = 0, Read locks = 0]
	at org.apache.hudi.client.transaction.TestInProcessLockProvider.lambda$testLockIdentity$6(TestInProcessLockProvider.java:127)
	at java.lang.Thread.run(Thread.java:748)
1727 [Thread-1] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 1, Read locks = 0], Thread Thread-1, In-process lock state RELEASING
1727 [Thread-1] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 0, Read locks = 0], Thread Thread-1, In-process lock state RELEASED
1728 [Thread-1] INFO  org.apache.hudi.client.transaction.TestInProcessLockProvider [] - Writer 2 releases the lock.
1728 [Thread-1] INFO  org.apache.hudi.client.transaction.lock.InProcessLockProvider [] - Base Path table1, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@18488497[Write locks = 0, Read locks = 0], Thread Thread-1, In-process lock state ALREADY_RELEASED
1728 [Thread-1] INFO  org.apache.hudi.client.transaction.TestInProcessLockProvider [] - Writer 2 closes the lock provider.

org.opentest4j.AssertionFailedError: 
Expected :true
Actual   :false

at org.apache.hudi.client.transaction.TestInProcessLockProvider.testLockIdentity(TestInProcessLockProvider.java:169)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Impact

Fixes a bug in InProcessLockProvider that may make in-process lock useless on the same table.

Risk level

low

Documentation Update

We need to update the release notes to mention this regression.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@hudi-bot
Copy link

hudi-bot commented May 8, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua merged commit 79412ed into apache:master May 8, 2023
18 checks passed
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestions on making the test deterministic

LOG.info("Writer 2 tries to acquire the lock.");
writer2TryLock.set(true);
lockProvider2.lock();
LOG.info("Writer 2 acquires the lock.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDL for writer2 lock acquisition : count down by 1.

});
writer2Locked.set(true);

while (!writer3TryLock.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDL for writer3 start. await for it

});

Thread writer3 = new Thread(() -> {
while (!writer2Locked.get() || !writer1Completed.get()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait for CDL for writer2 lock.
CDL for writer3 start : count down by 1

@yihua
Copy link
Contributor Author

yihua commented May 8, 2023

Synced with @nsivabalan and the test is good enough now, as the count down latch may not solve the problem in the locking in this case.

yihua added a commit to yihua/hudi that referenced this pull request May 15, 2023
This commit fixes a bug introduced by apache#6847. apache#6847 extends the InProcessLockProvider to support multiple tables in the same process, by having an in-memory static final map storing the mapping of the table base path to the read-write reentrant lock, so that the writer uses the corresponding lock based on the base path. When closing the lock provider, close() removes the lock entry. Since close() is called when closing the write client, the lock is removed and subsequent concurrent writers will get a different lock instance on the same table, causing the locking mechanism on the same table to be useless.  The fix gets rid of the lock removal operation in the `close()` call since it has to be kept for concurrent writers.  A new test `TestInProcessLockProvider#testLockIdentity` based on the above scenario is added to guard the behavior.
yihua added a commit to yihua/hudi that referenced this pull request May 15, 2023
This commit fixes a bug introduced by apache#6847. apache#6847 extends the InProcessLockProvider to support multiple tables in the same process, by having an in-memory static final map storing the mapping of the table base path to the read-write reentrant lock, so that the writer uses the corresponding lock based on the base path. When closing the lock provider, close() removes the lock entry. Since close() is called when closing the write client, the lock is removed and subsequent concurrent writers will get a different lock instance on the same table, causing the locking mechanism on the same table to be useless.  The fix gets rid of the lock removal operation in the `close()` call since it has to be kept for concurrent writers.  A new test `TestInProcessLockProvider#testLockIdentity` based on the above scenario is added to guard the behavior.
yihua added a commit to yihua/hudi that referenced this pull request May 17, 2023
This commit fixes a bug introduced by apache#6847. apache#6847 extends the InProcessLockProvider to support multiple tables in the same process, by having an in-memory static final map storing the mapping of the table base path to the read-write reentrant lock, so that the writer uses the corresponding lock based on the base path. When closing the lock provider, close() removes the lock entry. Since close() is called when closing the write client, the lock is removed and subsequent concurrent writers will get a different lock instance on the same table, causing the locking mechanism on the same table to be useless.  The fix gets rid of the lock removal operation in the `close()` call since it has to be kept for concurrent writers.  A new test `TestInProcessLockProvider#testLockIdentity` based on the above scenario is added to guard the behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

5 participants