[HUDI-5593] Fixing deadlocks due to async cleaner interplay w/ main thread#7739
Merged
codope merged 1 commit intoapache:masterfrom Jan 24, 2023
Merged
[HUDI-5593] Fixing deadlocks due to async cleaner interplay w/ main thread#7739codope merged 1 commit intoapache:masterfrom
codope merged 1 commit intoapache:masterfrom
Conversation
bf47862 to
80c1261
Compare
nsivabalan
commented
Jan 23, 2023
| } | ||
|
|
||
| /** | ||
| * Schedules a new cleaning instant. |
Contributor
Author
There was a problem hiding this comment.
NTR (not to reviewer): these methods are not used. removing dead code.
| * @param skipLocking if this is triggered by another parent transaction, locking can be skipped. | ||
| */ | ||
| @Nullable | ||
| @Deprecated |
Contributor
Author
There was a problem hiding this comment.
NTR: have deprecated the methods which was taking in "skipLocking" as an argument as we don't need it anymore.
| protected void rollbackFailedWrites(Map<String, Option<HoodiePendingRollbackInfo>> instantsToRollback) { | ||
| rollbackFailedWrites(instantsToRollback, false); | ||
| } | ||
|
|
Contributor
Author
There was a problem hiding this comment.
NTR:
rollbackFailedWrites in L679, is being invoked from upgrade code path which is within lock. so, could not deprecate this method to avoid "skipLocking"
…read is acquired the lock and awaiting for async cleaner to finish
80c1261 to
afb4305
Compare
Collaborator
codope
reviewed
Jan 24, 2023
| try { | ||
| // Delete the marker directory for the instant. | ||
| WriteMarkersFactory.get(config.getMarkersType(), table, instantTime) | ||
| .quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism()); |
Member
There was a problem hiding this comment.
Why should this be done inside the lock if commit has already happened?
Contributor
Author
There was a problem hiding this comment.
I need to think thru this. lets land this patch for now to unblock CI. by tmrw will get back to this.
4 tasks
Contributor
Author
codope
approved these changes
Jan 24, 2023
Merged
4 tasks
fengjian428
pushed a commit
to fengjian428/hudi
that referenced
this pull request
Jan 31, 2023
…hile main thread is acquired the lock and awaiting for async cleaner to finish (apache#7739)
nsivabalan
added a commit
to nsivabalan/hudi
that referenced
this pull request
Mar 22, 2023
…hile main thread is acquired the lock and awaiting for async cleaner to finish (apache#7739)
fengjian428
pushed a commit
to fengjian428/hudi
that referenced
this pull request
Apr 5, 2023
…hile main thread is acquired the lock and awaiting for async cleaner to finish (apache#7739)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Change Logs
In post commit in WriteClient, main thread awaits for async cleaner if async clean is enabled. This code block is already within a lock. So, if multi-writer locks are enabled, and if main thread has already acquired the lock, async cleaner may not be able to acquire the lock and it will keep timing out. This was the main reason for our CI tests to keep timing out frequently. Especially we also auto enable lock provider configs for deltastreamer in continuous mode w/ async table services, some of these tests are impacted.
Impact
No deadlocks due to locks when async cleaner is invoked.
Risk level (write none, low medium or high below)
medium.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
N/A
Contributor's checklist