Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[broker] Close topics that remain fenced forcefully #8561

Merged
merged 3 commits into from
Nov 18, 2020

Conversation

massakam
Copy link
Contributor

Motivation

The other day, we faced a problem where a topic remained fenced and unavailable. This topic remained unavailable until it was unloaded. The following is the broker log at that time.

11:37:55.905 [bookkeeper-ml-workers-OrderedExecutor-77-0] INFO  o.a.b.mledger.impl.OpAddEntry        - [tenant/ns/persistent/topic] Closing ledger 40891546 for being full
11:37:56.208 [pulsar-ordered-OrderedExecutor-0-0-EventThread] ERROR o.a.b.client.MetadataUpdateLoop      - UpdateLoop(ledgerId=40891546,loopId=6ce63876) Error writing metadata to store
11:37:56.209 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.b.mledger.impl.OpAddEntry        - Error when closing ledger 40891546. Status=Error while using ZooKeeper
11:37:56.359 [pulsar-ordered-OrderedExecutor-0-0-EventThread] ERROR o.a.b.mledger.impl.ManagedLedgerImpl - [tenant/ns/persistent/topic] Error creating ledger rc=-9 Error while using ZooKeeper
11:37:56.359 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  o.a.pulsar.broker.service.Producer   - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://tenant/ns/topic}, client=/xxx.xxx.xxx.xxx:40646, producerName=pulsar.repl.jp-west, producerId=668}
11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  o.a.pulsar.broker.service.Producer   - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://tenant/ns/topic}, client=/xxx.xxx.xxx.xxx:40646, producerName=pulsar.repl.jp-west, producerId=668}
11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  o.a.pulsar.broker.service.Producer   - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://tenant/ns/topic}, client=/xxx.xxx.xxx.xxx:40646, producerName=pulsar.repl.jp-west, producerId=668}
11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
11:37:57.495 [ForkJoinPool.commonPool-worker-51] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40256][persistent://tenant/ns/topic] Creating producer. producerId=668
11:37:58.291 [bookkeeper-ml-workers-OrderedExecutor-77-0] INFO  o.a.b.mledger.impl.ManagedLedgerImpl - [tenant/ns/persistent/topic] End TrimConsumedLedgers. ledgers=2 totalSize=162868668
11:37:58.291 [bookkeeper-ml-workers-OrderedExecutor-77-0] INFO  o.a.b.mledger.impl.ManagedLedgerImpl - [tenant/ns/persistent/topic] Removing ledger 40880508 - size: 82183409
11:37:58.292 [ForkJoinPool.commonPool-worker-20] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40256]-668 persistent://tenant/ns/topic configured with schema false
11:37:58.292 [ForkJoinPool.commonPool-worker-20] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Attempting to add producer to a fenced topic
11:37:58.292 [ForkJoinPool.commonPool-worker-20] ERROR o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40256] Failed to add producer to topic persistent://tenant/ns/topic: Topic is temporarily unavailable
11:37:58.728 [ForkJoinPool.commonPool-worker-75] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40330][persistent://tenant/ns/topic] Creating producer. producerId=668
11:37:58.729 [ForkJoinPool.commonPool-worker-75] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40330]-668 persistent://tenant/ns/topic configured with schema false
11:37:58.729 [ForkJoinPool.commonPool-worker-75] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Attempting to add producer to a fenced topic
11:37:58.729 [ForkJoinPool.commonPool-worker-75] ERROR o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40330] Failed to add producer to topic persistent://tenant/ns/topic: Topic is temporarily unavailable
11:37:59.489 [ForkJoinPool.commonPool-worker-106] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40260][persistent://tenant/ns/topic] Creating producer. producerId=668
11:37:59.489 [ForkJoinPool.commonPool-worker-106] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40260]-668 persistent://tenant/ns/topic configured with schema false
11:37:59.489 [ForkJoinPool.commonPool-worker-106] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Attempting to add producer to a fenced topic
11:37:59.489 [ForkJoinPool.commonPool-worker-106] ERROR o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40260] Failed to add producer to topic persistent://tenant/ns/topic: Topic is temporarily unavailable
11:38:01.062 [ForkJoinPool.commonPool-worker-51] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40248][persistent://tenant/ns/topic] Creating producer. producerId=668
11:38:01.062 [ForkJoinPool.commonPool-worker-51] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40248]-668 persistent://tenant/ns/topic configured with schema false
11:38:01.063 [ForkJoinPool.commonPool-worker-51] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Attempting to add producer to a fenced topic
11:38:01.063 [ForkJoinPool.commonPool-worker-51] ERROR o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40248] Failed to add producer to topic persistent://tenant/ns/topic: Topic is temporarily unavailable
11:38:04.103 [ForkJoinPool.commonPool-worker-90] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40338][persistent://tenant/ns/topic] Creating producer. producerId=668
11:38:04.104 [ForkJoinPool.commonPool-worker-102] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40338]-668 persistent://tenant/ns/topic configured with schema false

We were maintaining the ZooKeeper servers, so I think this phenomenon was caused by the shutdown of some ZK servers. However, the causal relationship has not been clarified.

Modifications

As a workaround, close the topic if it remains fenced for a period of time. Reconnecting from the clients will instantiate a new PersistentTopic topic and the topic will back to normal.

@massakam massakam added this to the 2.7.0 milestone Nov 13, 2020
@massakam massakam self-assigned this Nov 13, 2020
conf/broker.conf Outdated Show resolved Hide resolved
@Huanli-Meng Huanli-Meng added the doc-required Your PR changes impact docs and you will update later. label Nov 13, 2020
@Huanli-Meng
Copy link
Contributor

Add a doc-required label as the broker.config and standalone.config file are updated.

} else {
log.error("[{}] Topic remained fenced for {} seconds, so close it (pendingWriteOps: {})", topic,
timeout, pendingWriteOps.get());
close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to put close in the synchronized block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed "synchronized" because there was no clear reason to make this method synchronized.

@codelipenghui
Copy link
Contributor

@sijie Please help review this PR.

@sijie sijie merged commit 0df5f6f into apache:master Nov 18, 2020
@massakam massakam deleted the close-fenced-topic branch November 18, 2020 01:55
@codelipenghui
Copy link
Contributor

/pulsarbot cherry-pick to branch-2.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-complete Your PR changes impact docs and the related docs have been already added. release/2.6.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants