-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPE in managed ledger on read failed #11521
Comments
There is PR #11387 that fixes some thread safety issues in OpReadEntry. I wonder if that fix would help for this case. |
@lhotari thank you for pointing to #11387 . I looked through the changes and it appears that changes in #11387 try to serialize accesses to entries. I am not sure if that will help with this issue, though. For this issue, on a given OpReadEntry object one of readEntriesFailed() and checkReadCompletion() needs to get invoked more than once. From reading through the code, I am not entirely sure how that is possible. What makes it harder is that I don't have a way to reproduce the problem. |
@bharanic-dev OpReadEntry isn't thread safe. Breaking thread safety rules can lead to very surprising behaviour which is "impossible" from reading the code. One such example of an "impossible" problem in the past is #10433 . |
This problem might be fixed by #13328 . Netty Recycler was rewritten in 4.1.71.Final with PR netty/netty#11858 . Pulsar uses Netty Recycler extensively and the improvements to Netty Recycler might fix various thread-safety problems. |
The issue had no activity for 30 days, mark with Stale label. |
Closed as stale. Please create a new issue if it's still relevant to the maintained versions. |
Describe the bug
Under some scenarios, the following NPE exception is seen in Pulsar broker logs.
{"instant":{"epochSecond":1626050179,"nanoOfSecond":804217000},"thread":"bookkeeper-ml-workers-OrderedExecutor-7-0","level":"ERROR","loggerName":"org.apache.bookkeeper.common.util.SafeRunnable","message":"Unexpected throwable caught ","thrown":{"commonElementCount":0,"name":"java.lang.NullPointerException","extendedStackTrace":"java.lang.NullPointerException: null\n\tat org.apache.bookkeeper.mledger.impl.OpReadEntry.lambda$readEntriesFailed$0(OpReadEntry.java:94) ~[org.apache.pulsar-managed-ledger.jar]\n\tat org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) ~[org.apache.pulsar-managed-ledger.jar]\n\tat org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common.jar:?]\n\tat org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203) [org.apache.bookkeeper-bookkeeper-common.jar:?]\n\tat org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203) [org.apache.bookkeeper-bookkeeper-common.jar:?]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]\n\tat java.lang.Thread.run(Thread.java:834) [?:?]\n"},"endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":99,"threadPriority":5,"time":"2021-07-12T00:02:30Z","service":"pulsar","hostname":"broker-2"}
To Reproduce
Currently, I don't have a reliable way to reproduce the problem.
Expected behavior
No NPE
The text was updated successfully, but these errors were encountered: