Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Make the new exclusive consumer instead the inactive one faster #21183

Merged
merged 14 commits into from
Oct 30, 2023

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Sep 14, 2023

Motivation

There is an issue similar to the #21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to use a new connection to reconnect an exclusive consumer, then got an error Exclusive consumer is already connected

Modifications

  • Check the connection of the old consumer is available when the new one tries to subscribe

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 14, 2023
@poorbarcode poorbarcode self-assigned this Sep 14, 2023
@poorbarcode poorbarcode added release/3.0.2 release/2.11.3 release/2.10.6 category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost labels Sep 14, 2023
@poorbarcode poorbarcode added this to the 3.2.0 milestone Sep 14, 2023
codelipenghui
codelipenghui previously approved these changes Sep 15, 2023
Comment on lines 169 to 182
Consumer actConsumer = ACTIVE_CONSUMER_UPDATER.get(this);
if (actConsumer != null) {
actConsumer.cnx().checkConnectionLiveness();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the new consumer will not retry to connect to the topic, right? Do we need to wait for the connection liveness check done?

Copy link
Contributor Author

@poorbarcode poorbarcode Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui

But the new consumer will not retry to connect to the topic, right? Do we need to wait for the connection liveness check to be done?

Sure, I did this improve.

(Highlight) I have a concern:

Background: the PR #20026 changed the method dispatcher.addConsumer to an asynchronous method, it broke the lock of synchronized(dispathcer.this), this change only affected the releases larger than 3.0.0.

Concern: The improvement "wait for the connection liveness check done" relies on the asynchronous method dispatcher.addConsumer. I am thinking about whether to accept the patch #20026 and fix the broken lock(this would make the logic complex), or revert this patch to make the logic simple.

I'd like to know your advice on the concern.

Copy link
Contributor Author

@poorbarcode poorbarcode Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking with @codelipenghui @BewareMyPower @RobertIndie @gaoran10 @Technoboy- , I will write a new PR to fix the lock that broke by the PR #20026

@poorbarcode
Copy link
Contributor Author

rebase master

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client assumed the connection was inactive, but the Broker assumed the connection was fine.

In addition, could you explain in which case could the case described in the PR happen?

pulsar.getConfig().setConnectionLivenessCheckTimeoutMillis(5000);
}

private AtomicBoolean startChannelMonitorToHandleUserTask() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is hard to understand. Maybe it's better to add some comments.

I removed AtomicBoolean channel1MonitorStopped = startChannelMonitorToHandleUserTask(); and channel1MonitorStopped.set(true); and the tests still passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel is typed EmbeddedChannel. Once we call channel.execute(runnable), there is no background thread to run it.

So starting a background thread to trigger the tasks in the queue will make the test more stable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment for this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed AtomicBoolean channel1MonitorStopped = startChannelMonitorToHandleUserTask(); and channel1MonitorStopped.set(true); and the tests still passed.

  • If you run the test testHandleConsumerAfterClientChannelInactive, It has a high probability of failure.
  • If you run the test testHandleConsumerAfterClientChannelInactiveWhenDisabledFeatureConnectionLivenessCheckTimeoutMillis, it will always be passed, because it just confirms that this fix will not be affected if disabled the feature connectionLivenessCheckTimeoutMillis

@poorbarcode poorbarcode merged commit 29db8f8 into apache:master Oct 30, 2023
47 checks passed
poorbarcode added a commit that referenced this pull request Oct 31, 2023
…ne faster (#21183)

### Motivation

There is an issue similar to the #21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

### Modifications

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
poorbarcode added a commit that referenced this pull request Oct 31, 2023
…ne faster (#21183)

There is an issue similar to the #21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
poorbarcode added a commit that referenced this pull request Oct 31, 2023
…ne faster (#21183)

There is an issue similar to the #21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 15, 2024
…ne faster (apache#21183)

There is an issue similar to the apache#21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
(cherry picked from commit b796f56)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 15, 2024
…ne faster (apache#21183)

There is an issue similar to the apache#21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
(cherry picked from commit b796f56)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 17, 2024
…ne faster (apache#21183)

There is an issue similar to the apache#21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
(cherry picked from commit b796f56)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 17, 2024
…ne faster (apache#21183)

There is an issue similar to the apache#21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
(cherry picked from commit b796f56)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 19, 2024
…ne faster (apache#21183)

There is an issue similar to the apache#21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
(cherry picked from commit b796f56)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 23, 2024
…ne faster (apache#21183)

There is an issue similar to the apache#21155 fixed one.

The client assumed the connection was inactive, but the Broker assumed the connection was fine. The Client tried to  use a new connection to reconnect an exclusive consumer, then got an error `Exclusive consumer is already connected`

- Check the connection of the old consumer is available when the new one tries to subscribe

(cherry picked from commit 29db8f8)
(cherry picked from commit b796f56)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-2.10 cherry-picked/branch-2.11 cherry-picked/branch-3.0 doc-not-needed Your PR changes do not impact docs release/2.10.6 release/2.11.3 release/3.0.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants