Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue 4109: Fix Flaky-test: HandleFailuresTest.testHandleFailureBookieNotInWriteSet #4110

Merged

Conversation

AnonHxy
Copy link
Contributor

@AnonHxy AnonHxy commented Oct 18, 2023

Descriptions of the changes in this PR:

Master Issue: #4109

Motivation

Fix Flaky-test: HandleFailuresTest.testHandleFailureBookieNotInWriteSet

When we call b1Delay.completeExceptionally(new BKException.BKWriteException()) at line480(code1), the preWriteHook will complete with exception and then do some actions in the choosen thread(code2), e.g., put the failure bookie to delayedWriteFailedBookies(code3). So the delayedWriteFailedBookies update is async.

However, lh.appendAsync("entry2".getBytes()))(Line483 in Code4) is invoked in main thread. So when appendAsync execute, the delayedWriteFailedBookies could be not updated yet, and changeInProgress.complete(null)(code5) will never be invoked.

We can reproduce this flasy-test if we put Thread.sleep(5000) just before line289 at code6.

code1:

b1Delay.completeExceptionally(new BKException.BKWriteException());

code2:

.whenCompleteAsync((res, ex) -> {
if (ex != null) {
cb.writeComplete(BKException.getExceptionCode(ex, BKException.Code.WriteException),
ledgerId, entryId, addr, ctx);
} else {
cb.writeComplete(BKException.Code.OK, ledgerId, entryId, addr, ctx);
}
}, executor.chooseThread(ledgerId));

code3:

void notifyWriteFailed(int index, BookieId addr) {
synchronized (metadataLock) {
delayedWriteFailedBookies.put(index, addr);
}
}

code4:

b1Delay.completeExceptionally(new BKException.BKWriteException());
log.info("write second entry, should have enough bookies, but blocks completion on failure handling");
CompletableFuture<?> e2 = lh.appendAsync("entry2".getBytes());

code5:

clientCtx.getMockLedgerManager().setPreWriteHook((ledgerId, metadata) -> {
changeInProgress.complete(null);
return blockEnsembleChange;
});

code6:

if (completed) {
if (rc != BKException.Code.OK) {
// Got an error after satisfying AQ. This means we are under replicated at the create itself.
// Update the stat to reflect it.
clientCtx.getClientStats().getAddOpUrCounter().inc();
if (!clientCtx.getConf().disableEnsembleChangeFeature.isAvailable()
&& !clientCtx.getConf().delayEnsembleChange) {
lh.notifyWriteFailed(bookieIndex, addr);
}

Changes

(Describe: what changes you have made)

@AnonHxy
Copy link
Contributor Author

AnonHxy commented Oct 18, 2023

@hangc0276
Copy link
Contributor

Nice catch!

@zymap zymap merged commit b3662b7 into apache:master Dec 4, 2023
16 checks passed
yangl pushed a commit to yangl/bookkeeper that referenced this pull request Dec 11, 2023
…eNotInWriteSet (apache#4110)

Master Issue: apache#4109
### Motivation

Fix Flaky-test: HandleFailuresTest.testHandleFailureBookieNotInWriteSet

When we call `b1Delay.completeExceptionally(new BKException.BKWriteException())` at line480(code1),  the `preWriteHook` will complete with exception  and then do some actions in the choosen thread(code2), e.g., put the failure bookie to `delayedWriteFailedBookies`(code3).  So the `delayedWriteFailedBookies` update is async.

However, `lh.appendAsync("entry2".getBytes()))`(Line483 in Code4) is invoked in main thread. So when `appendAsync` execute,  the `delayedWriteFailedBookies` could be not updated yet, and `changeInProgress.complete(null)`(code5) will never be invoked.
hangc0276 pushed a commit to hangc0276/bookkeeper that referenced this pull request Jan 18, 2024
…eNotInWriteSet (apache#4110)

Master Issue: apache#4109
### Motivation

Fix Flaky-test: HandleFailuresTest.testHandleFailureBookieNotInWriteSet

When we call `b1Delay.completeExceptionally(new BKException.BKWriteException())` at line480(code1),  the `preWriteHook` will complete with exception  and then do some actions in the choosen thread(code2), e.g., put the failure bookie to `delayedWriteFailedBookies`(code3).  So the `delayedWriteFailedBookies` update is async.

However, `lh.appendAsync("entry2".getBytes()))`(Line483 in Code4) is invoked in main thread. So when `appendAsync` execute,  the `delayedWriteFailedBookies` could be not updated yet, and `changeInProgress.complete(null)`(code5) will never be invoked.

(cherry picked from commit b3662b7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants