Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Broker sees topic fenced #20526

Open
1 of 2 tasks
KannarFr opened this issue Jun 7, 2023 · 9 comments
Open
1 of 2 tasks

[Bug] Broker sees topic fenced #20526

KannarFr opened this issue Jun 7, 2023 · 9 comments
Labels
Stale type/bug The PR fixed a bug or issue reported a bug

Comments

@KannarFr
Copy link
Contributor

KannarFr commented Jun 7, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Version

2.11.1

Minimal reproduce step

I have a cluster with thousands of topics and one became fenced see broker's logs:

Jun 07 11:20:48 yo-pulsar-broker-c3-n4 pulsar[336]: 2023-06-07T11:20:48,089+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://tenant/ns/topic-partition-0] Attempting to add producer to a fenced topic
Jun 07 11:20:48 yo-pulsar-broker-c3-n4 pulsar[336]: 2023-06-07T11:20:48,089+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.3:46348] Failed to add producer to topic persistent://tenant/ns/topic-partition-0: producerId=360, Topic is tempo
rarily unavailable
Jun 07 11:20:48 yo-pulsar-broker-c3-n4 pulsar[336]: 2023-06-07T11:20:48,102+0000 [BookKeeperClientWorker-OrderedExecutor-10-0] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://tenant/ns/topic-partition-0] Attempting to add producer to a fenced topic
Jun 07 11:20:48 yo-pulsar-broker-c3-n4 pulsar[336]: 2023-06-07T11:20:48,102+0000 [BookKeeperClientWorker-OrderedExecutor-10-0] WARN  org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.3:46348] Failed to add producer to topic persistent://tenant/ns/topic-partition-0: producerId=361, Topic is temp
orarily unavailable
Jun 07 11:20:48 yo-pulsar-broker-c3-n4 pulsar[336]: 2023-06-07T11:20:48,112+0000 [BookKeeperClientWorker-OrderedExecutor-10-0] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://tenant/ns/topic-partition-0] Attempting to add producer to a fenced topic
Jun 07 11:20:48 yo-pulsar-broker-c3-n4 pulsar[336]: 2023-06-07T11:20:48,112+0000 [BookKeeperClientWorker-OrderedExecutor-10-0] WARN  org.apache.pulsar.broker.service.ServerCnx - [/192.168.1.3:46348] Failed to add producer to topic persistent://tenant/ns/topic-partition-0: producerId=362, Topic is temp
orarily unavailable

The topic was fenced for 30mins. I just restarted the broker and everything looks good now. Any idea? A wrong cache of fenced or something?

What did you expect to see?

Not fenced

What did you see instead?

Fenced

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@KannarFr KannarFr added the type/bug The PR fixed a bug or issue reported a bug label Jun 7, 2023
lhotari added a commit to lhotari/pulsar that referenced this issue Jun 8, 2023
@lhotari
Copy link
Member

lhotari commented Jun 8, 2023

Thanks for reporting this. This is an issue which has stayed unresolved for many years.
Some previous reports: #5284 and #14941.

There's an ugly workaround for the problem:
By setting topicFencingTimeoutSeconds=5 for brokers, it will release the "fencing" after 5 seconds.
However, there is a chance that this causes other problems such as data consistency problems. If metadata gets overwritten, it could lead to data loss.

The recently merged fixes #18688 and #20527 could help improve the situation. I happened to investigate problems in this area yesterday.

I have created #20540 to address some issues that I have observed in the current solution. One of the remaining challenges in the PR is adding proper test coverage. I'm also waiting for feedback from other code contributors on the PR before finishing it. I'd appreciate feedback on the PR #20540.

@poorbarcode
Copy link
Contributor

@KannarFr

Jun 07 11:20:48 yo-pulsar-broker-c3-n4 pulsar[336]: 2023-06-07T11:20:48,112+0000 [BookKeeperClientWorker-OrderedExecutor-10-0] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://tenant/ns/topic-partition-0] Attempting to add producer to a fenced topic

At this time, do you know if there is a bundle unload or a namespace unload executed?

You can check the HTTP request log to confirm it.

@KannarFr
Copy link
Contributor Author

Unfortunately, I do not have suck logs retention. I should take a dump, my bad.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jul 13, 2023
@StevenLeRoux
Copy link
Contributor

Still impacted with this issue. Restarting brokers all day long doesn't seem a proper situation. I wonder if there is any production deployment that's not concerned by this issue, if so, how?

@lhotari
Copy link
Member

lhotari commented Dec 26, 2023

Still impacted with this issue. Restarting brokers all day long doesn't seem a proper situation. I wonder if there is any production deployment that's not concerned by this issue, if so, how?

@StevenLeRoux which Pulsar version are you using? do you have a chance to test #20540 with a custom build?

@StevenLeRoux
Copy link
Contributor

@lhotari Thanks for pointing out to #20540

We're using currently v3.1.1, but we will get the chance to test with #20540 in a few days (cc @KannarFr )

@lhotari
Copy link
Member

lhotari commented Jan 11, 2024

@lhotari Thanks for pointing out to #20540

We're using currently v3.1.1, but we will get the chance to test with #20540 in a few days (cc @KannarFr )

@StevenLeRoux @KannarFr FYI, there's a new bug report #21860 in this area with a promising bug fix in the Bookkeeper client in the works.

@KannarFr
Copy link
Contributor Author

Thanks for pinging us @lhotari.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

4 participants