Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix npe when unloading persistent partitioned topic #11310

Conversation

wuzhanpeng
Copy link
Contributor

@wuzhanpeng wuzhanpeng commented Jul 14, 2021

Motivations

When we performed pressure tests on persistent partitioned topics, we found that the NullPointerException would occasionally appear when executing unload bundles operations, and at the same time the producers could no longer write messages.

Some server error log

java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
        at org.apache.pulsar.broker.service.persistent.PersistentTopicTest.testPersistentPartitionedTopicUnload(PersistentTopicTest.java:268)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
        at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
        at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
        at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.pulsar.broker.service.BrokerService.removeTopicFromCache(BrokerService.java:1729)
        at org.apache.pulsar.broker.service.BrokerService.cleanUnloadedTopicFromCache(BrokerService.java:1699)
        at org.apache.pulsar.broker.namespace.OwnedBundle.lambda$handleUnloadRequest$1(OwnedBundle.java:136)
        at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
        at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)

The problem is that the cleanup logic in org.apache.pulsar.broker.service.BrokerService#cleanUnloadedTopicFromCache does not check wheather the topic reference is present when unloading bundles. This may cause an attempt to obtain bundle map which does not exist from org.apache.pulsar.broker.service.BrokerService#multiLayerTopicsMap during cache cleanup.

Modifications

Added more safety checks to fix this issue.

Documentation

Does this pull request introduce a new feature? (no)

@wuzhanpeng
Copy link
Contributor Author

Could you help review these changes ? @codelipenghui @rdhabalia @hangc0276

@wuzhanpeng
Copy link
Contributor Author

/pulsarbot run-failure-checks

@sijie sijie added type/bug The PR fixed a bug or issue reported a bug release/2.7.4 release/2.8.1 labels Jul 14, 2021
@sijie sijie added this to the 2.9.0 milestone Jul 14, 2021
@merlimat merlimat merged commit b928ba2 into apache:master Jul 14, 2021
merlimat pushed a commit that referenced this pull request Jul 14, 2021
Co-authored-by: wuzhanpeng <wuzhanpeng@bigo.sg>
@merlimat merlimat added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Jul 14, 2021
@@ -1682,7 +1682,8 @@ public boolean isTopicNsOwnedByBroker(TopicName topicName) {
public void cleanUnloadedTopicFromCache(NamespaceBundle serviceUnit) {
for (String topic : topics.keys()) {
TopicName topicName = TopicName.get(topic);
if (serviceUnit.includes(topicName)) {
if (getTopicReference(topic).isPresent() && serviceUnit.includes(topicName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to swap the position of getTopicReference(topic).isPresent() and serviceUnit.includes(topicName) to avoid too many calls of getTopicReference(topic).isPresent()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment was not addressed

Technoboy- added a commit to Technoboy-/pulsar that referenced this pull request Jul 20, 2021
Technoboy- added a commit to Technoboy-/pulsar that referenced this pull request Jul 22, 2021
codelipenghui pushed a commit that referenced this pull request Jul 23, 2021
@codelipenghui codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label Jul 23, 2021
@shoothzj
Copy link
Member

@wuzhanpeng I meeted this issue too. Could you please tell me, in what situation, will the topic can be this situation?

eolivelli pushed a commit to datastax/pulsar that referenced this pull request Feb 9, 2022
Co-authored-by: wuzhanpeng <wuzhanpeng@bigo.sg>
(cherry picked from commit a73dc61)
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
Co-authored-by: wuzhanpeng <wuzhanpeng@bigo.sg>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.7 Archived: 2.7 is end of life cherry-picked/branch-2.8 Archived: 2.8 is end of life release/2.7.4 release/2.8.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants