Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unload namespaces bundle hangs. #9116

Merged

Conversation

codelipenghui
Copy link
Contributor

Motivation

Fix namespace bundle unloads hangs. In the BrokerService, we maintained a ConcurrentOpenHashMap for storing all topic references. In #8968 cleanup the topics when unloading namespace bundles, see https://github.com/apache/pulsar/pull/8968/files#diff-0210356c8a88e4efa89eb769a027fa6c166db479dbad8bbbbc704c6ed6e317f5R1572-R1579

Since StampedLock is not a reentrant and the method foreach of the ConcurrentOpenHashMap also acquire read lock, this might block the namespace unloading, here is the thread dump:

"pulsar-io-16-7" #132 prio=5 os_prio=31 tid=0x00007ff370ae2800 nid=0x1f603 waiting on condition [0x00007000121d0000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000780a0be18> (a org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section)
	at java.util.concurrent.locks.StampedLock.acquireWrite(StampedLock.java:1119)
	at java.util.concurrent.locks.StampedLock.writeLock(StampedLock.java:354)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.remove(ConcurrentOpenHashMap.java:306)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.access$200(ConcurrentOpenHashMap.java:180)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.remove(ConcurrentOpenHashMap.java:135)
	at org.apache.pulsar.broker.service.BrokerService.removeTopicFromCache(BrokerService.java:1658)
	at org.apache.pulsar.broker.service.BrokerService.lambda$cleanUnloadedTopicFromCache$61(BrokerService.java:1611)
	at org.apache.pulsar.broker.service.BrokerService$$Lambda$1003/2064147704.accept(Unknown Source)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159)
	at org.apache.pulsar.broker.service.BrokerService.cleanUnloadedTopicFromCache(BrokerService.java:1607)
	at org.apache.pulsar.broker.namespace.OwnedBundle.lambda$handleUnloadRequest$1(OwnedBundle.java:140)
	at org.apache.pulsar.broker.namespace.OwnedBundle$$Lambda$999/503902413.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
	at org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic.lambda$null$18(NonPersistentTopic.java:442)
	at org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic$$Lambda$994/682846231.run(Unknown Source)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute$$$capture(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

This also makes the current ci unstable while shutdown the mock broker after the tests.

Modifications

Use keys() method of the ConcurrentOpenHashMap to get a new keys array list.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@codelipenghui codelipenghui self-assigned this Jan 4, 2021
@codelipenghui codelipenghui added this to the 2.8.0 milestone Jan 4, 2021
@codelipenghui codelipenghui added the type/bug The PR fixed a bug or issue reported a bug label Jan 4, 2021
@@ -1597,12 +1597,12 @@ public void checkTopicNsOwnership(final String topic) throws BrokerServiceExcept
}

public void cleanUnloadedTopicFromCache(NamespaceBundle serviceUnit) {
topics.forEach((name, topicFuture) -> {
TopicName topicName = TopicName.get(name);
for (String topic : topics.keys()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this keys() return a copy of the collection or is it a live cursor ?
should we make a copy here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns a copy

@codelipenghui codelipenghui merged commit 752319e into apache:master Jan 4, 2021
@codelipenghui codelipenghui deleted the penghui/fix-shutdown-hangs branch January 4, 2021 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants