Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-broker] clean up topic that failed to unload from the cache #8968

Merged
merged 1 commit into from
Dec 16, 2020

Conversation

rdhabalia
Copy link
Contributor

Motivation

Right now, if topic unloading times out while bundle-unloading then broker doesn't clean up the topic from the cache and broker-stats API report false ownership of the topic which leads to incorrect stats output.

Modification

Clean up topic from the cache if topic unloading fails while namespace-bundle unloading.

@rdhabalia rdhabalia added this to the 2.8.0 milestone Dec 16, 2020
@rdhabalia rdhabalia self-assigned this Dec 16, 2020
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rdhabalia
Copy link
Contributor Author

/pulsarbot run-failure-checks

@merlimat merlimat merged commit 639b45a into apache:master Dec 16, 2020
@rdhabalia rdhabalia deleted the bundle_clean branch December 21, 2020 18:07
codelipenghui added a commit that referenced this pull request Jan 4, 2021
### Motivation

Fix namespace bundle unloads hangs. In the BrokerService, we maintained a ConcurrentOpenHashMap for storing all topic references. In #8968 cleanup the topics when unloading namespace bundles, see https://github.com/apache/pulsar/pull/8968/files#diff-0210356c8a88e4efa89eb769a027fa6c166db479dbad8bbbbc704c6ed6e317f5R1572-R1579

Since StampedLock is not a reentrant and the method `foreach` of the ConcurrentOpenHashMap also acquire read lock, this might block the namespace unloading, here is the thread dump:

```
"pulsar-io-16-7" #132 prio=5 os_prio=31 tid=0x00007ff370ae2800 nid=0x1f603 waiting on condition [0x00007000121d0000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000780a0be18> (a org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section)
	at java.util.concurrent.locks.StampedLock.acquireWrite(StampedLock.java:1119)
	at java.util.concurrent.locks.StampedLock.writeLock(StampedLock.java:354)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.remove(ConcurrentOpenHashMap.java:306)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.access$200(ConcurrentOpenHashMap.java:180)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.remove(ConcurrentOpenHashMap.java:135)
	at org.apache.pulsar.broker.service.BrokerService.removeTopicFromCache(BrokerService.java:1658)
	at org.apache.pulsar.broker.service.BrokerService.lambda$cleanUnloadedTopicFromCache$61(BrokerService.java:1611)
	at org.apache.pulsar.broker.service.BrokerService$$Lambda$1003/2064147704.accept(Unknown Source)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159)
	at org.apache.pulsar.broker.service.BrokerService.cleanUnloadedTopicFromCache(BrokerService.java:1607)
	at org.apache.pulsar.broker.namespace.OwnedBundle.lambda$handleUnloadRequest$1(OwnedBundle.java:140)
	at org.apache.pulsar.broker.namespace.OwnedBundle$$Lambda$999/503902413.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
	at org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic.lambda$null$18(NonPersistentTopic.java:442)
	at org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic$$Lambda$994/682846231.run(Unknown Source)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute$$$capture(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
```
This also makes the current ci unstable while shutdown the mock broker after the tests.

### Modifications

Use `keys()` method of the `ConcurrentOpenHashMap` to get a new keys array list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants