-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When call openLedgerOp, make the timeout ex is a separate error code #3562
When call openLedgerOp, make the timeout ex is a separate error code #3562
Conversation
ping @zymap @dlg99 @eolivelli @hangc0276 @shoothzj PTAL. Thanks. |
62b42f5
to
7eeb397
Compare
@@ -25,6 +25,7 @@ | |||
import static org.junit.Assert.assertArrayEquals; | |||
import static org.junit.Assert.assertEquals; | |||
import static org.junit.Assert.fail; | |||
import static org.mockito.Mockito.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid the star import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already remove the star import
@Test | ||
public void testOpenLedgerNoRecoveryWithTimeoutEx() throws Exception { | ||
mockReadEntryTimeout(); | ||
LedgerMetadata ledgerMetadata = generateLedgerMetadata(ensembleSize, | ||
writeQuorumSize, ackQuorumSize, password, customMetadata); | ||
registerMockLedgerMetadata(ledgerId, ledgerMetadata); | ||
ledgerMetadata.getAllEnsembles().values().forEach(bookieAddressList -> { | ||
bookieAddressList.forEach(bookieAddress -> { | ||
registerMockEntryForRead(ledgerId, BookieProtocol.LAST_ADD_CONFIRMED, bookieAddress, entryData, -1); | ||
registerMockEntryForRead(ledgerId, 0, bookieAddress, entryData, -1); | ||
}); | ||
}); | ||
try { | ||
result(newOpenLedgerOp() | ||
.withPassword(ledgerMetadata.getPassword()) | ||
.withDigestType(DigestType.CRC32) | ||
.withLedgerId(ledgerId) | ||
.withRecovery(false) | ||
.execute()); | ||
fail("Expect timeout error"); | ||
} catch (BKException.BKTimeoutException timeoutException) { | ||
// Expect timeout error. | ||
} | ||
// Reset bk client. | ||
resetBKClient(); | ||
} | ||
|
||
@Test | ||
public void testOpenLedgerRecoveryWithTimeoutEx() throws Exception { | ||
mockReadEntryTimeout(); | ||
LedgerMetadata ledgerMetadata = generateLedgerMetadata(ensembleSize, | ||
writeQuorumSize, ackQuorumSize, password, customMetadata); | ||
registerMockLedgerMetadata(ledgerId, ledgerMetadata); | ||
|
||
ledgerMetadata.getAllEnsembles().values().forEach(bookieAddressList -> { | ||
bookieAddressList.forEach(bookieAddress -> { | ||
registerMockEntryForRead(ledgerId, BookieProtocol.LAST_ADD_CONFIRMED, bookieAddress, entryData, -1); | ||
registerMockEntryForRead(ledgerId, 0, bookieAddress, entryData, -1); | ||
}); | ||
}); | ||
try { | ||
result(newOpenLedgerOp() | ||
.withPassword(ledgerMetadata.getPassword()) | ||
.withDigestType(DigestType.CRC32) | ||
.withLedgerId(ledgerId) | ||
.withRecovery(true) | ||
.execute()); | ||
fail("Expect timeout error"); | ||
} catch (BKException.BKTimeoutException timeoutException) { | ||
// Expect timeout error. | ||
} | ||
// Reset bk client. | ||
resetBKClient(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add a data provider to reduce the duplicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already use @DataProvider
to simpler the test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
if (ex != null) { | ||
LOG.error("Ledger {} read timeout", ledgerId, ex); | ||
} | ||
openComplete(rc, null); | ||
}); | ||
} else { | ||
openComplete(bk.getReturnRc(BKException.Code.LedgerRecoveryException), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to close the ledgerHandle when encountering other exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Already fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@poorbarcode Good work. I have only add some minor comments. PTAL.
} else if (rc == BKException.Code.TimeoutException) { | ||
closeLedgerHandleAsync().whenComplete((r, ex) -> { | ||
if (ex != null) { | ||
LOG.error("Ledger {} read timeout", ledgerId, ex); | ||
} | ||
openComplete(rc, null); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.error("Ledger {} read timeout", ledgerId, ex);
should be replaced with the following:
LOG.error("Ledger {} close failed", ledgerId, ex);
the ex
is not the read timeout exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, already fixed
} else { | ||
openComplete(bk.getReturnRc(BKException.Code.LedgerRecoveryException), null); | ||
closeLedgerHandleAsync().whenComplete((r, ex) -> { | ||
openComplete(bk.getReturnRc(BKException.Code.LedgerRecoveryException), null); | ||
}); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need log error:
if (ex != null) {
LOG.error("Ledger {} close failed", ledgerId, ex);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@poorbarcode Would you please take a look at the failed CI? |
613b36d
to
0379732
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
@poorbarcode Would you please rebase the master? thanks. |
0379732
to
5c31945
Compare
@@ -331,47 +330,6 @@ public void testOpenLedgerClientClosed() throws Exception { | |||
.execute()); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two tests has been removed into BookKeeperBuildersOpenLedgerTest.java
. see #3562 (comment)
rerun failure checks |
@zymap Would you please help take a look? thanks. |
…pache#3562) Descriptions of the changes in this PR: ### Motivation When we execute `bkClient.openLedger(ledgerId)`, the execution flow is as follows: 1. start opening the ledger 2. get ledger meta 3. read the last confirmed entry 4. open ledger success If we get the correct ledgerMeta at step 2, this means that this ledger has not been deleted. If step 3 times out, we should try again to make sure the ledger exists until we get a clear response from the BK server.<strong>(High light)</strong>However, in the current implementation, the timeout exception is rewritten as a `LedgerRecoveryException`, making it impossible to determine whether we should retry. Log: ``` Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While readLastConfirmed ledger: 59158316 did not hear success responses from all quorums, QuorumCoverage(e:2,w:2,a:2) = [-23, -23] Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [order/org-217/persistent/p1010-tx] Opened ledger 59158316 for consumer order-service. rc=-10 ``` Looking at the ledger metadata: ``` LedgerMetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=IN_RECOVERY, digestType=CRC32C, password=base64:, ensembles={0=[***:3181, ***:3181]}, customMetadata={component=***, pulsar/managed-ledger=***, pulsar/cursor=***, application=***}} ``` see also: apache/pulsar#18123 ### Changes - When calling openLedgerOp, do not rewritten `TiemoutException` as a `LedgerRecoveryException` - add the dependency: `junit4-dataprovider` - use `@DataProvider` to simpler the test case "testOpenLedgerRecover" & "testOpenLedgerNoRecover" (cherry picked from commit ef31c7a)
…3562) Descriptions of the changes in this PR: ### Motivation When we execute `bkClient.openLedger(ledgerId)`, the execution flow is as follows: 1. start opening the ledger 2. get ledger meta 3. read the last confirmed entry 4. open ledger success If we get the correct ledgerMeta at step 2, this means that this ledger has not been deleted. If step 3 times out, we should try again to make sure the ledger exists until we get a clear response from the BK server.<strong>(High light)</strong>However, in the current implementation, the timeout exception is rewritten as a `LedgerRecoveryException`, making it impossible to determine whether we should retry. Log: ``` Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While readLastConfirmed ledger: 59158316 did not hear success responses from all quorums, QuorumCoverage(e:2,w:2,a:2) = [-23, -23] Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [order/org-217/persistent/p1010-tx] Opened ledger 59158316 for consumer order-service. rc=-10 ``` Looking at the ledger metadata: ``` LedgerMetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=IN_RECOVERY, digestType=CRC32C, password=base64:, ensembles={0=[***:3181, ***:3181]}, customMetadata={component=***, pulsar/managed-ledger=***, pulsar/cursor=***, application=***}} ``` see also: apache/pulsar#18123 ### Changes - When calling openLedgerOp, do not rewritten `TiemoutException` as a `LedgerRecoveryException` - add the dependency: `junit4-dataprovider` - use `@DataProvider` to simpler the test case "testOpenLedgerRecover" & "testOpenLedgerNoRecover" (cherry picked from commit ef31c7a)
…pache#3562) Descriptions of the changes in this PR: ### Motivation When we execute `bkClient.openLedger(ledgerId)`, the execution flow is as follows: 1. start opening the ledger 2. get ledger meta 3. read the last confirmed entry 4. open ledger success If we get the correct ledgerMeta at step 2, this means that this ledger has not been deleted. If step 3 times out, we should try again to make sure the ledger exists until we get a clear response from the BK server.<strong>(High light)</strong>However, in the current implementation, the timeout exception is rewritten as a `LedgerRecoveryException`, making it impossible to determine whether we should retry. Log: ``` Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While readLastConfirmed ledger: 59158316 did not hear success responses from all quorums, QuorumCoverage(e:2,w:2,a:2) = [-23, -23] Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [order/org-217/persistent/p1010-tx] Opened ledger 59158316 for consumer order-service. rc=-10 ``` Looking at the ledger metadata: ``` LedgerMetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=IN_RECOVERY, digestType=CRC32C, password=base64:, ensembles={0=[***:3181, ***:3181]}, customMetadata={component=***, pulsar/managed-ledger=***, pulsar/cursor=***, application=***}} ``` see also: apache/pulsar#18123 ### Changes - When calling openLedgerOp, do not rewritten `TiemoutException` as a `LedgerRecoveryException` - add the dependency: `junit4-dataprovider` - use `@DataProvider` to simpler the test case "testOpenLedgerRecover" & "testOpenLedgerNoRecover" (cherry picked from commit ef31c7a)
…pache#3562) Descriptions of the changes in this PR: ### Motivation When we execute `bkClient.openLedger(ledgerId)`, the execution flow is as follows: 1. start opening the ledger 2. get ledger meta 3. read the last confirmed entry 4. open ledger success If we get the correct ledgerMeta at step 2, this means that this ledger has not been deleted. If step 3 times out, we should try again to make sure the ledger exists until we get a clear response from the BK server.<strong>(High light)</strong>However, in the current implementation, the timeout exception is rewritten as a `LedgerRecoveryException`, making it impossible to determine whether we should retry. Log: ``` Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While readLastConfirmed ledger: 59158316 did not hear success responses from all quorums, QuorumCoverage(e:2,w:2,a:2) = [-23, -23] Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [order/org-217/persistent/p1010-tx] Opened ledger 59158316 for consumer order-service. rc=-10 ``` Looking at the ledger metadata: ``` LedgerMetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=IN_RECOVERY, digestType=CRC32C, password=base64:, ensembles={0=[***:3181, ***:3181]}, customMetadata={component=***, pulsar/managed-ledger=***, pulsar/cursor=***, application=***}} ``` see also: apache/pulsar#18123 ### Changes - When calling openLedgerOp, do not rewritten `TiemoutException` as a `LedgerRecoveryException` - add the dependency: `junit4-dataprovider` - use `@DataProvider` to simpler the test case "testOpenLedgerRecover" & "testOpenLedgerNoRecover" (cherry picked from commit ef31c7a)
…pache#3562) Descriptions of the changes in this PR: ### Motivation When we execute `bkClient.openLedger(ledgerId)`, the execution flow is as follows: 1. start opening the ledger 2. get ledger meta 3. read the last confirmed entry 4. open ledger success If we get the correct ledgerMeta at step 2, this means that this ledger has not been deleted. If step 3 times out, we should try again to make sure the ledger exists until we get a clear response from the BK server.<strong>(High light)</strong>However, in the current implementation, the timeout exception is rewritten as a `LedgerRecoveryException`, making it impossible to determine whether we should retry. Log: ``` Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While readLastConfirmed ledger: 59158316 did not hear success responses from all quorums, QuorumCoverage(e:2,w:2,a:2) = [-23, -23] Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [order/org-217/persistent/p1010-tx] Opened ledger 59158316 for consumer order-service. rc=-10 ``` Looking at the ledger metadata: ``` LedgerMetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=IN_RECOVERY, digestType=CRC32C, password=base64:, ensembles={0=[***:3181, ***:3181]}, customMetadata={component=***, pulsar/managed-ledger=***, pulsar/cursor=***, application=***}} ``` see also: apache/pulsar#18123 ### Changes - When calling openLedgerOp, do not rewritten `TiemoutException` as a `LedgerRecoveryException` - add the dependency: `junit4-dataprovider` - use `@DataProvider` to simpler the test case "testOpenLedgerRecover" & "testOpenLedgerNoRecover" (cherry picked from commit ef31c7a) (cherry picked from commit cee32aa)
…pache#3562) Descriptions of the changes in this PR: ### Motivation When we execute `bkClient.openLedger(ledgerId)`, the execution flow is as follows: 1. start opening the ledger 2. get ledger meta 3. read the last confirmed entry 4. open ledger success If we get the correct ledgerMeta at step 2, this means that this ledger has not been deleted. If step 3 times out, we should try again to make sure the ledger exists until we get a clear response from the BK server.<strong>(High light)</strong>However, in the current implementation, the timeout exception is rewritten as a `LedgerRecoveryException`, making it impossible to determine whether we should retry. Log: ``` Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.client.ReadLastConfirmedOp - While readLastConfirmed ledger: 59158316 did not hear success responses from all quorums, QuorumCoverage(e:2,w:2,a:2) = [-23, -23] Oct 17, 2022 22:54:05.818 [BookKeeperClientWorker-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [order/org-217/persistent/p1010-tx] Opened ledger 59158316 for consumer order-service. rc=-10 ``` Looking at the ledger metadata: ``` LedgerMetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=IN_RECOVERY, digestType=CRC32C, password=base64:, ensembles={0=[***:3181, ***:3181]}, customMetadata={component=***, pulsar/managed-ledger=***, pulsar/cursor=***, application=***}} ``` see also: apache/pulsar#18123 ### Changes - When calling openLedgerOp, do not rewritten `TiemoutException` as a `LedgerRecoveryException` - add the dependency: `junit4-dataprovider` - use `@DataProvider` to simpler the test case "testOpenLedgerRecover" & "testOpenLedgerNoRecover"
Descriptions of the changes in this PR:
Motivation
When we execute
bkClient.openLedger(ledgerId)
, the execution flow is as follows:If we get the correct ledgerMeta at step 2, this means that this ledger has not been deleted. If step 3 times out, we should try again to make sure the ledger exists until we get a clear response from the BK server.(High light)However, in the current implementation, the timeout exception is rewritten as a
LedgerRecoveryException
, making it impossible to determine whether we should retry.Log:
Looking at the ledger metadata:
see also: apache/pulsar#18123
Changes
TiemoutException
as aLedgerRecoveryException
junit4-dataprovider
@DataProvider
to simpler the test case "testOpenLedgerRecover" & "testOpenLedgerNoRecover"