Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Broker] Fix Broker HealthCheck Endpoint Exposes Race Conditions #14367

Merged
merged 17 commits into from
Mar 4, 2022
Merged

[Broker] Fix Broker HealthCheck Endpoint Exposes Race Conditions #14367

merged 17 commits into from
Mar 4, 2022

Conversation

mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Feb 18, 2022

Master Issue: #14362

Motivation

See #14362

According to relative PR #7724, we will force delete all subscriptions when calling healthCheck REST API. but it has a race condition when two threads call this API.
Please consider this case:

Thread A: Clean up all subscriptions, then create a reader.
Thread B: Prepare to force delete all subscriptions.

So, in this case, the reader of thread A is deleted and then an NPE or other exception occurs.

Modifications

  • Use Completable#handle to fix problem 1, the reader needs to be closed regardless of whether there is an exception.
  • Recheck the subscription after closing reading, and force deletion if it still exists after closing reading.
  • Added multi-threaded tests for health checks.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • no-need-doc

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 18, 2022
@mattisonchao
Copy link
Member Author

I need more eyes to help me review this change to avoid regression.

@codelipenghui @michaeljmarshall @Technoboy-

@michaeljmarshall
Copy link
Member

Thanks for working on this @mattisonchao!

@michaeljmarshall
Copy link
Member

@mattisonchao - I should have added in my review that your tests look good.

@michaeljmarshall michaeljmarshall added area/broker type/bug The PR fixed a bug or issue reported a bug labels Feb 18, 2022
@michaeljmarshall michaeljmarshall modified the milestones: 2.10.0, 2.11.0 Feb 18, 2022
@mattisonchao mattisonchao changed the title [Issue 14362] Fix Broker HealthCheck Endpoint Exposes Race Conditions [WIP][Issue 14362] Fix Broker HealthCheck Endpoint Exposes Race Conditions Feb 18, 2022
@mattisonchao
Copy link
Member Author

@michaeljmarshall @Technoboy-
I changed logic of this PR, PTAL :)

@mattisonchao mattisonchao changed the title [WIP][Issue 14362] Fix Broker HealthCheck Endpoint Exposes Race Conditions [Broker] Fix Broker HealthCheck Endpoint Exposes Race Conditions Feb 21, 2022
@mattisonchao
Copy link
Member Author

/pulsarbot rerun-failure-checks

@mattisonchao
Copy link
Member Author

@codelipenghui @Technoboy- @eolivelli @lhotari @michaeljmarshall @Jason918

PTAL, when you have time ~

@Technoboy-
Copy link
Contributor

@codelipenghui @Technoboy- @eolivelli @lhotari @michaeljmarshall @Jason918

PTAL, when you have time ~

LGTM.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

great work !

@michaeljmarshall
Copy link
Member

I will review this within the next couple hours, thank you for your work @mattisonchao!

Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui codelipenghui merged commit 4f1e39b into apache:master Mar 4, 2022
@eolivelli
Copy link
Contributor

@codelipenghui if we cut a new rc for 2.10 is would be great to have this patch.
It is a big problem for k8s users

@eolivelli
Copy link
Contributor

@mattisonchao
can you please port this patch to branch-2.8 and branch-2.9 ?
the patch does not apply cleanly

@codelipenghui codelipenghui added cherry-picked/branch-2.8 Archived: 2.8 is end of life and removed cherry-picked/branch-2.8 Archived: 2.8 is end of life labels Mar 9, 2022
gaoran10 pushed a commit to gaoran10/pulsar that referenced this pull request Mar 11, 2022
gaoran10 pushed a commit to gaoran10/pulsar that referenced this pull request Mar 11, 2022
codelipenghui pushed a commit that referenced this pull request Mar 12, 2022
@codelipenghui codelipenghui modified the milestones: 2.11.0, 2.10.0 Mar 12, 2022
codelipenghui pushed a commit that referenced this pull request Mar 14, 2022
…#14658)

Original PR #14367

Because the PR #14367 is based on PR #14091, so I want to cherry-pick these two PRs to branch-2.9, the PR #13525 is also needed.

---

Fix Issue: #14362

### Motivation

According to relative PR #7724, we will force delete all subscriptions when calling ``healthCheck`` REST API. but it has a race condition when two threads call this API.
Please consider this case:

> Thread A: Clean up all subscriptions, then create a reader.
> Thread B: Prepare to force delete all subscriptions.

So, in this case, the reader of thread A is deleted and then an NPE or other exception occurs.

### Modifications

- Use ``Completable#handle`` to fix problem 1, the reader needs to be closed regardless of whether there is an exception.
- Recheck the subscription after closing reading, and force deletion if it still exists after closing reading.
- Added multi-threaded tests for health checks.
@gaoran10
Copy link
Contributor

The #14658 cherry-pick this PR to branch-2.9.

@gaoran10 gaoran10 added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Mar 14, 2022
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Mar 18, 2022
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.4 release/2.9.2 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants