Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] new load balancer system topic should not be auto-created now #20566

Merged
merged 1 commit into from Jun 14, 2023

Conversation

heesung-sn
Copy link
Contributor

@heesung-sn heesung-sn commented Jun 13, 2023

Motivation

Example Error Logs

Caused by: org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: The subscription __compaction of the topic persistent://pulsar/system/loadbalancer-service-unit-state-partition-0 gets the last message id was failed
{"errorMsg":"Failed to get batch size for entry org.apache.bookkeeper.mledger.ManagedLedgerException: Error while reading ledger error code: -1","}
	at org.apache.pulsar.client.api.PulsarClientException.wrap(PulsarClientException.java:977) 
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$internalGetLastMessageIdAsync$61(ConsumerImpl.java:2424)
	... 29 more

This system topic persistent://pulsar/system/loadbalancer-service-unit-state-partition-0 should be a non-partitioned topic after this fix ,1080ad5 .
It seems like there is a racing condition that auto-topic creation creates this system topic.

Modifications

Do not auto-create the new load balancer system topic.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jun 13, 2023
@heesung-sn heesung-sn force-pushed the fix-new-lb-topic-auto-creation branch from 27007c9 to ff3e3fa Compare June 13, 2023 03:02
// ServiceUnitStateChannelImpl.TOPIC expects to be a non-partitioned-topic now.
// We don't allow the auto-creation here.
// ServiceUnitStateChannelImpl.start() is responsible to create the topic.
if (ServiceUnitStateChannelImpl.TOPIC.equals(topicName.toString())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me, but we should finally change it to a partitioned topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is another optimization area for scalability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it that in some cases we auto create in a non-partitioned way and in this case we do not auto create it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understood your question. Since the system topic is currently expected to work on a non-partitioned topic, we specifically create the topic in ServiceUnitStateChannelImpl.start(). This change will block the system topic auto-creation(when any other broker tries to auto-create the topic due to any race-condition )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was mixing up #20370 and #20397.

My primary concern is that the solution in this PR creates a special case instead of creating a framework to make it easier to build system topics in the future. In my opinion, the load manager should be contained by the relevant interfaces and should not be referenced here. There is some relevant discussion about system topics in this PR too #20514 (comment). There is also relevant discussion on the ML here https://lists.apache.org/thread/f0q8n0hf1lgw9r2j53tm4yjjfdyr9kjd.

I think it is relevant that until now, we have always allowed system topics to be auto created to prevent classes of failures.

If I were to guess, the problem this PR is trying to solve is actually more generic than the system topics we're working with here. It is likely a challenge for all pulsar users to know how to create their topics correctly. That is one reason I want to push for thinking about a general solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this fix general like the followings.

  1. Maintain a White-list for system topics that are pre-created. We can check this list and reject any auto creation here.

  2. Or Make Pulsar pre-create all system topics, and then we can reject auto-creation for all topics under the system namespace.

I think the first option is less intrusive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A third option is to only auto create topics as non-partitioned topics. Any component that needs to create a system topic as a partitioned topic could do so on start up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov-commenter
Copy link

Codecov Report

Merging #20566 (ff3e3fa) into master (51c2bb4) will increase coverage by 39.42%.
The diff coverage is 97.77%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #20566       +/-   ##
=============================================
+ Coverage     33.50%   72.92%   +39.42%     
- Complexity    12053    31930    +19877     
=============================================
  Files          1613     1867      +254     
  Lines        126120   138592    +12472     
  Branches      13749    15223     +1474     
=============================================
+ Hits          42254   101072    +58818     
+ Misses        78332    29499    -48833     
- Partials       5534     8021     +2487     
Flag Coverage Δ
inttests 24.12% <0.00%> (-0.05%) ⬇️
systests 24.91% <6.66%> (?)
unittests 72.22% <97.77%> (+40.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...dbalance/extensions/ExtensibleLoadManagerImpl.java 77.99% <92.30%> (+75.68%) ⬆️
...xtensions/channel/ServiceUnitStateChannelImpl.java 84.41% <100.00%> (+83.86%) ⬆️
...sar/broker/loadbalance/impl/LoadManagerShared.java 81.78% <100.00%> (+51.78%) ⬆️
...ache/pulsar/broker/namespace/NamespaceService.java 71.23% <100.00%> (+28.39%) ⬆️
...rg/apache/pulsar/broker/service/BrokerService.java 81.66% <100.00%> (+26.02%) ⬆️

... and 1515 files with indirect coverage changes

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@Technoboy- Technoboy- merged commit 441d0ef into apache:master Jun 14, 2023
48 checks passed
@heesung-sn heesung-sn deleted the fix-new-lb-topic-auto-creation branch April 2, 2024 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants