-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide async version of markLedgerUnderreplicated for LedgerUnderreplicationManager #1619
Conversation
…nager ### Motivation Auditor has multiple places calling sync methods in async callbacks. This raises the possibility hitting deadlock. Issue apache#1578 is one of the examples. After looking into the `LedgerUnderreplicationManager`, `markLedgerUnderreplicated` is the only interface that will be called in async callbacks. This change is to provide an async version of `markLedgerUnderreplicated`. ### Changes - add `markLedgerUnderreplicatedAsync` interface in `LedgerUnderreplicationManager`. - implement the logic of `markLedgerUnderreplicated` using async callbacks - use `markLedgerUnderreplicatedAsync` in the Auditor Related Issues: apache#1578 Master Issue: apache#1617
This PR is to follow up the comment in #1608. After looking into LedgerUnderreplicationManager, I think |
FutureUtils.completeExceptionally(processFuture, BKException.create(rc)); | ||
} | ||
}, null, BKException.Code.OK, BKException.Code.ReadException); | ||
FutureUtils.result(processFuture, BKException.HANDLER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If processing is done async, will get impacted by the below admin.close()
and client.close()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processFuture is completed only when finished processing the last ledger, so checkAllLedgers is still sync, which is fine, since checkAllLedgers is executing in Auditor's own executor.
regarding tests: this PR only focuses on rewriting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Great work. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* @param missingReplicas missing replicas | ||
* @return a future presents the mark result. | ||
*/ | ||
CompletableFuture<Void> markLedgerUnderreplicatedAsync(long ledgerId, Collection<String> missingReplicas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dont we need to have async version of "void markLedgerReplicated(long ledgerId)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't. markLedgerReplicated
is used by replication worker, which replication worker is single threaded and using sync methods. so markLedgerReplicated
is fine at the context.
only markLedgerUnderreplicated
is the issue.
ideally, LedgerUnderreplicationManager should be splitted into at least 2 interfaces, one for Auditor, the other one for ReplicationWorker. That would make things much clearer.
if (cause instanceof ReplicationException) { | ||
return (ReplicationException) cause; | ||
} else { | ||
if (cause instanceof InterruptedException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FutureUtils.result would throw InterruptedException without applying the exceptionHandler function https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/concurrent/FutureUtils.java#L77
return (BKException) cause; | ||
} else { | ||
BKException ex; | ||
if (cause instanceof InterruptedException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FutureUtils.result would throw InterruptedException without applying the exceptionHandler function https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/concurrent/FutureUtils.java#L77
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. I would remove this here
* @param missingReplicas missing replicas | ||
* @return a future presents the mark result. | ||
*/ | ||
CompletableFuture<Void> markLedgerUnderreplicatedAsync(long ledgerId, Collection<String> missingReplicas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there any need of making 'missingReplicas' collection, instead of just String missingReplica
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this for https://github.com/apache/bookkeeper/pull/1619/files#diff-7525f06ad3a1ad0a00a462df4deb4698L581 . then it doesn't need to do multiple zk calls for creating an UL.
final List<ACL> zkAcls = ZkUtils.getACLs(conf); | ||
final String znode = getUrLedgerZnode(ledgerId); | ||
final CompletableFuture<Void> createFuture = new CompletableFuture<>(); | ||
tryMarkLedgerUnderreplicatedAsync(znode, missingReplicas, zkAcls, createFuture); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why does caller has to create 'createFuture' (CompletableFuture) and pass it, why cann't method returns the CompletableFuture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this allows the method tryMarkLedgerUnderreplicatedAsync
to be attempted with same CompleteableFuture
.
@reddycharan I have addressed your comments. |
Sets.newHashSet(lh.getId()) | ||
).whenComplete((result, cause) -> { | ||
if (null != cause) { | ||
callback.processResult(Code.ReplicationException, null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just want to make sure that this 'cause' is not lost in log errors. will it not hide cause completely here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. I added the logging back. please take a look at the latest commit.
} | ||
|
||
callback.processResult(rc, null, null); | ||
lh.closeAsync().whenComplete((result, cause) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it responsibility of ProcessLostFragmentsCb to close the ledgerhandle, I'm not sure if it is appropriate to do it here, since ProcessLostFragmentsCb is not the owner of 'lh'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t change the behavior. The original logic is ProcessLostFramgmentsCb closing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkLedgersProcessor
closes the 'lh' https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java#L679
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line here is wrong.
However it was "okay" because close in ReadOnlyHandle is no-op.
changes LGTM, but i understand that there is no functional change and they are not public API methods, but change of this magnitude needs some new test coverage for confidence. |
if there is no functionality change, what new tests do you expect? |
run bookkeeper-server replication tests |
This test is only rewriting sync calls to use async calls. Although Agreed that we need to have better coverage and since |
…licationManager Descriptions of the changes in this PR: ### Motivation Auditor has multiple places calling sync methods in async callbacks. This raises the possibility hitting deadlock. Issue #1578 is one of the examples. After looking into the `LedgerUnderreplicationManager`, `markLedgerUnderreplicated` is the only interface that will be called in async callbacks. This change is to provide an async version of `markLedgerUnderreplicated`. ### Changes - add `markLedgerUnderreplicatedAsync` interface in `LedgerUnderreplicationManager`. - implement the logic of `markLedgerUnderreplicated` using async callbacks - use `markLedgerUnderreplicatedAsync` in the Auditor Related Issues: #1578 Master Issue: #1617 Author: Sijie Guo <sijie@apache.org> Reviewers: Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>, Matteo Merli <mmerli@apache.org> This closes #1619 from sijie/async_sync_autorecovery (cherry picked from commit 73b428c) Signed-off-by: Sijie Guo <sijie@apache.org>
…licationManager Descriptions of the changes in this PR: ### Motivation Auditor has multiple places calling sync methods in async callbacks. This raises the possibility hitting deadlock. Issue #1578 is one of the examples. After looking into the `LedgerUnderreplicationManager`, `markLedgerUnderreplicated` is the only interface that will be called in async callbacks. This change is to provide an async version of `markLedgerUnderreplicated`. ### Changes - add `markLedgerUnderreplicatedAsync` interface in `LedgerUnderreplicationManager`. - implement the logic of `markLedgerUnderreplicated` using async callbacks - use `markLedgerUnderreplicatedAsync` in the Auditor Related Issues: #1578 Master Issue: #1617 Author: Sijie Guo <sijie@apache.org> Reviewers: Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>, Matteo Merli <mmerli@apache.org> This closes #1619 from sijie/async_sync_autorecovery
…derreplicatedAsync *Motivation* We introduced LedgerUnderredplicationManager#markLedgerUnderreplicatedAsync in apache#1619. This exposes the async api to the public. Let's make sure we have enough test coverage for this new async api, including negative tests and concurrent tests. *Changes* - Add basic tests - Add negative tests - Add concurrent tests on resolving conflicts
…licationManager Descriptions of the changes in this PR: ### Motivation Auditor has multiple places calling sync methods in async callbacks. This raises the possibility hitting deadlock. Issue apache#1578 is one of the examples. After looking into the `LedgerUnderreplicationManager`, `markLedgerUnderreplicated` is the only interface that will be called in async callbacks. This change is to provide an async version of `markLedgerUnderreplicated`. ### Changes - add `markLedgerUnderreplicatedAsync` interface in `LedgerUnderreplicationManager`. - implement the logic of `markLedgerUnderreplicated` using async callbacks - use `markLedgerUnderreplicatedAsync` in the Auditor Related Issues: apache#1578 Master Issue: apache#1617 Author: Sijie Guo <sijie@apache.org> Reviewers: Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>, Matteo Merli <mmerli@apache.org> This closes apache#1619 from sijie/async_sync_autorecovery (cherry picked from commit 3e01125) Signed-off-by: JV Jujjuri <vjujjuri@salesforce.com> Checkstyle fix
Descriptions of the changes in this PR:
Motivation
Auditor has multiple places calling sync methods in async callbacks.
This raises the possibility hitting deadlock. Issue #1578 is one of the examples.
After looking into the
LedgerUnderreplicationManager
,markLedgerUnderreplicated
is the only interface that will be called in async callbacks. This change is
to provide an async version of
markLedgerUnderreplicated
.Changes
markLedgerUnderreplicatedAsync
interface inLedgerUnderreplicationManager
.markLedgerUnderreplicated
using async callbacksmarkLedgerUnderreplicatedAsync
in the AuditorRelated Issues: #1578
Master Issue: #1617