Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When cursor recovery encounters empty cursor ledger, fallback to latest snapshot #3487

Merged
merged 2 commits into from
Jan 31, 2019

Conversation

merlimat
Copy link
Contributor

Motivation

In some conditions, the ledger used for cursor recovery might appear to be empty after re-opening. In these cases we're currently getting runtime exception from BK client since we're trying to read entry -1.

In these case, use the same fallback mechanism to fallback to the previous cursor snapshot and re-open the cursor.

Example of logs of this error:

21:16:35.837 [bookkeeper-ml-workers-OrderedExecutor-6-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/functions/persistent/coordinate] Created ledger 1988
21:16:35.841 [bookkeeper-ml-workers-OrderedExecutor-6-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/functions/persistent/coordinate] Recovering from bookkeeper ledger cursor: participants
21:16:35.842 [bookkeeper-ml-workers-OrderedExecutor-6-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/functions/persistent/coordinate] Consumer participants meta-data recover from ledger 3
21:16:35.844 [pulsar-ordered-OrderedExecutor-3-0-EventThread] ERROR org.apache.bookkeeper.client.LedgerHandle - IncorrectParameterException on ledgerId:3 firstEntry:-1 lastEntry:-1
21:16:35.844 [pulsar-ordered-OrderedExecutor-3-0-EventThread] WARN  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/functions/persistent/coordinate] Error reading from metadata ledger 3 for consumer participants: Incorrect parameter input
21:16:35.844 [pulsar-ordered-OrderedExecutor-3-0-EventThread] WARN  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/functions/persistent/coordinate] Recovery for cursor participants failed
org.apache.bookkeeper.mledger.ManagedLedgerException: Incorrect parameter input
21:16:35.845 [pulsar-ordered-OrderedExecutor-3-0-EventThread] WARN  org.apache.pulsar.broker.service.BrokerService - Failed to create topic persistent://public/functions/coordinate
org.apache.bookkeeper.mledger.ManagedLedgerException: Incorrect parameter input
21:16:35.847 [pulsar-ordered-OrderedExecutor-3-0-EventThread] WARN  org.apache.pulsar.broker.service.ServerCnx - [/10.244.1.24:49452][persistent://public/functions/coordinate][participants] Failed to create consumer: org.apache.bookkeeper.mledger.ManagedLedgerException: Incorrect parameter input
java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: Incorrect parameter input
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_181]
	at org.apache.pulsar.broker.service.BrokerService$3.openLedgerFailed(BrokerService.java:680) ~[org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl.lambda$asyncOpen$5(ManagedLedgerFactoryImpl.java:254) ~[org.apache.pulsar-managed-ledger-original-2.2.1.jar:2.2.1]
	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_181]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_181]
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl$2.initializeFailed(ManagedLedgerFactoryImpl.java:247) ~[org.apache.pulsar-managed-ledger-original-2.2.1.jar:2.2.1]
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl$3$1.operationFailed(ManagedLedgerImpl.java:452) ~[org.apache.pulsar-managed-ledger-original-2.2.1.jar:2.2.1]
	at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.lambda$null$0(ManagedCursorImpl.java:299) ~[org.apache.pulsar-managed-ledger-original-2.2.1.jar:2.2.1]
	at org.apache.bookkeeper.client.LedgerHandle.asyncReadEntries(LedgerHandle.java:663) ~[org.apache.bookkeeper-bookkeeper-server-4.7.3.jar:4.7.3]
	at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.lambda$recoverFromLedger$1(ManagedCursorImpl.java:285) ~[org.apache.pulsar-managed-ledger-original-2.2.1.jar:2.2.1]
	at org.apache.bookkeeper.client.LedgerOpenOp.openComplete(LedgerOpenOp.java:225) [org.apache.bookkeeper-bookkeeper-server-4.7.3.jar:4.7.3]
	at org.apache.bookkeeper.client.LedgerOpenOp.operationComplete(LedgerOpenOp.java:181) [org.apache.bookkeeper-bookkeeper-server-4.7.3.jar:4.7.3]
	at org.apache.bookkeeper.client.LedgerOpenOp.operationComplete(LedgerOpenOp.java:49) [org.apache.bookkeeper-bookkeeper-server-4.7.3.jar:4.7.3]
	at org.apache.bookkeeper.meta.CleanupLedgerManager$CleanupGenericCallback.operationComplete(CleanupLedgerManager.java:56) [org.apache.bookkeeper-bookkeeper-server-4.7.3.jar:4.7.3]
	at org.apache.bookkeeper.meta.AbstractZkLedgerManager$3.processResult(AbstractZkLedgerManager.java:421) [org.apache.bookkeeper-bookkeeper-server-4.7.3.jar:4.7.3]
	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994) [org.apache.bookkeeper-bookkeeper-server-4.7.3.jar:4.7.3]
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent_aroundBody0(ClientCnxn.java:572) [org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
	at org.apache.zookeeper.ClientCnxn$EventThread$AjcClosure1.run(ClientCnxn.java:1) [org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
	at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) [org.aspectj-aspectjrt-1.9.1.jar:?]
	at org.apache.pulsar.broker.zookeeper.aspectj.ClientCnxnAspect.timedProcessEvent(ClientCnxnAspect.java:73) [org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:528) [org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) [org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
Caused by: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: Incorrect parameter input
	... 23 more
Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException: Incorrect parameter input
21:16:35.853 [pulsar-client-io-44-1] WARN  org.apache.pulsar.client.impl.ClientCnx - [id: 0xa29958a3, L:/10.244.1.24:49452 - R:10.244.1.24/10.244.1.24:6650] Received error from server: org.apache.bookkeeper.mledger.ManagedLedgerException: Incorrect parameter input
21:16:35.853 [pulsar-client-io-44-1] WARN  org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/functions/coordinate][participants] Failed to subscribe to topic on 10.244.1.24/10.244.1.24:6650
21:16:35.854 [main] ERROR org.apache.pulsar.functions.worker.WorkerService - Error Starting up in worker

@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Jan 31, 2019
@merlimat merlimat added this to the 2.3.0 milestone Jan 31, 2019
@merlimat merlimat self-assigned this Jan 31, 2019
@merlimat
Copy link
Contributor Author

run cpp tests
run java8 tests

Copy link
Contributor

@ivankelly ivankelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. one comment about test.

cursor.markDelete(p3);

// Force-reopen so the recovery will be forced to read from ledger
bkc.returnEmptyLedgerAfter(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different to just not calling markDelete on L1110? Surely if you don't markDelete the cursor ledger will be empty anyhow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the same because the cursor ledger is supposed to always have at least one entry, which is appended before the ledger id is stored in the metadata... unless for some reasons that doesn't show up, like in the stack trace in the description.

@merlimat merlimat merged commit 7d9d879 into apache:master Jan 31, 2019
@merlimat merlimat deleted the fix-cursor-recovery branch January 31, 2019 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants