-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix producer stuck issue due to NPE thrown when creating a new ledger #7401
Conversation
digestType, config.getPassword(), cb, ledgerCreated, finalMetadata); | ||
} catch (Throwable cause) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try-catch here is good in any case, though we should also ensure that BK client is handling DNS errors by triggering the callback instead of exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix was already made in the bookkeeper client. We don't release the bookkeeper client yet.
} | ||
cb.createComplete(BKException.Code.TimeoutException, null, ledgerCreated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional?
Shouldn't the callback only be triggered if (!ledgerCreated.get())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intentional - when the timeout task is triggered, always execute the callback. It is totally fine because we already have the logic to ensure the callback is triggered only once.
This change ensures all the logic is executed in a central place instead of spreading across multiple places and can potentially make code maintenance much harder.
@sijie One more thing. There are a couple more calls to create ledgers, we should then do try/catch on all of them:
|
Anyway, we can do in separate PR. But my next question is: can the NPE also happen on |
@merlimat the DNS resolution problem only happens when creating a ledger. So it doesn't happen on adding entries. I think we should release a bug fix release on bookkeeper soon. |
(cherry picked from commit 86e2610)
(cherry picked from commit 86e2610)
Motivation
NPE can be thrown when creating a ledger because the network address is unresolvable. If NPE is thrown before adding the timeout task, the timeout mechanism doesn't work.
Network address unresolvable is commonly seen in the Kubernetes environment. It can happen when a bookie pod or a worker node restarts.
Changes
This pull request does the followings:
CreatingLedger
state is not moving.