Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix namespace bundle stuck in unloading status #21445

Merged
merged 12 commits into from
Nov 8, 2023
Merged

[fix][broker] Fix namespace bundle stuck in unloading status #21445

merged 12 commits into from
Nov 8, 2023

Conversation

mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Oct 26, 2023

Motivation

PR #21231 made user topic creation rely on system topic __change_event if the user is enabling topicLevelPoliciesEnabled.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create __change_event reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.

Modifications

  • Get the topic policy before loading.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@mattisonchao mattisonchao self-assigned this Oct 26, 2023
@mattisonchao mattisonchao marked this pull request as draft October 26, 2023 02:14
@mattisonchao mattisonchao added this to the 3.2.0 milestone Oct 26, 2023
@mattisonchao mattisonchao changed the title [fix][broker] Fix bundle stuck in unloading status [fix][broker] Fix namespace bundle stuck in unloading status Oct 26, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 26, 2023
@mattisonchao mattisonchao added the category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost label Oct 26, 2023
@mattisonchao mattisonchao added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Oct 26, 2023
@mattisonchao mattisonchao marked this pull request as ready for review October 26, 2023 10:12
@mattisonchao mattisonchao reopened this Oct 26, 2023
@mattisonchao mattisonchao marked this pull request as draft November 1, 2023 23:04
@mattisonchao mattisonchao marked this pull request as ready for review November 2, 2023 11:14
@mattisonchao mattisonchao reopened this Nov 2, 2023
});
}
return loadOrCreatePersistentTopic(tpName, createIfMissing, properties);
final CompletableFuture<Optional<TopicPolicies>> topicPoliciesFuture =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test to cover the getTopic method and the TopicPolicies are loaded?

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good.
Just left a few minor comments.

@codecov-commenter
Copy link

Codecov Report

Merging #21445 (3bda31b) into master (eebd821) will increase coverage by 0.01%.
Report is 2 commits behind head on master.
The diff coverage is 94.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #21445      +/-   ##
============================================
+ Coverage     73.23%   73.24%   +0.01%     
+ Complexity    32674    32648      -26     
============================================
  Files          1892     1892              
  Lines        140632   140645      +13     
  Branches      15467    15467              
============================================
+ Hits         102986   103010      +24     
+ Misses        29565    29519      -46     
- Partials       8081     8116      +35     
Flag Coverage Δ
inttests 24.14% <69.18%> (-0.03%) ⬇️
systests 24.86% <64.77%> (+0.13%) ⬆️
unittests 72.52% <94.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../service/SystemTopicBasedTopicPoliciesService.java 73.10% <60.00%> (-1.13%) ⬇️
...rg/apache/pulsar/broker/service/BrokerService.java 80.89% <95.45%> (-0.04%) ⬇️

... and 89 files with indirect coverage changes

@codelipenghui codelipenghui merged commit 428c18c into apache:master Nov 8, 2023
48 checks passed
Technoboy- pushed a commit that referenced this pull request Nov 10, 2023
PR #21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.

- Get the topic policy before loading.
@mattisonchao mattisonchao deleted the fix/unloading_stuck branch November 10, 2023 13:12
nborisov pushed a commit to nborisov/pulsar that referenced this pull request Nov 13, 2023
…21445)

### Motivation

PR apache#21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.


### Modifications

- Get the topic policy before loading.
poorbarcode pushed a commit that referenced this pull request Nov 14, 2023
Technoboy- added a commit that referenced this pull request Nov 16, 2023
coderzc pushed a commit to coderzc/pulsar that referenced this pull request Nov 21, 2023
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Dec 2, 2023
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
…21445)

PR apache#21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.

- Get the topic policy before loading.
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
…21445)

PR apache#21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.

- Get the topic policy before loading.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this pull request Feb 2, 2024
…licies before loading topics

### Motivation

apache#21445 introduces an implicit
requirement that if the topic policies cannot be loaded,
`BrokerService#getTopic` will fail and then the client will retry
loading the topic.

It could break some existing usages like the tests in the C++
client: apache/pulsar-client-cpp#394

This change is applied only to avoid the race condition when unloading a
namespace bundle. However, the client should not fail if the topic-level
policies are not available.

### Modifications

If the topic policies cannot be loaded due to a non-retryable error, we
should not fail the `getTopic`. For retryable errors, the client will
still retry until the broker gets the topic-level policies successfully
after some attempts.

Modify `TokenAuthenticatedProducerConsumerTest` to protect the change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-2.10 cherry-picked/branch-2.11 cherry-picked/branch-3.0 cherry-picked/branch-3.1 doc-not-needed Your PR changes do not impact docs ready-to-test release/blocker Indicate the PR or issue that should block the release until it gets resolved release/2.10.6 release/2.11.3 release/3.0.2 release/3.1.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants