Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hasMessageAvailable return true but can't read message #10414

Merged
merged 1 commit into from
May 4, 2021

Conversation

315157973
Copy link
Contributor

@315157973 315157973 commented Apr 28, 2021

Motivation

I temporarily fixed this problem in PR #10190.
Now we have found a better way, this way can avoid the seek, then avoid trigger another reconnection.
Thank you @codelipenghui to troubleshoot this issue with me all night.

We have added a lot of log and found that this issue is caused by some race condition problems. Here is the first reason:

cnx.sendRequestWithId(seek, requestId).thenRun(() -> {
log.info("[{}][{}] Successfully reset subscription to {}", topic, subscription, seekBy);
acknowledgmentsGroupingTracker.flushAndClean();
seekMessageId = new BatchMessageIdImpl((MessageIdImpl) seekId);
duringSeek.set(true);
lastDequeuedMessageId = MessageId.earliest;
clearIncomingMessages();
seekFuture.complete(null);
}).exceptionally(e -> {

Now we have an acknowledgmentsGroupingTracker to filter duplicate messages, and this Tracker will be cleaned up after seek.

However, it is possible that the connection is ready and Broker has pushed message, but acknowledgmentsGroupingTracker.flushAndClean(); has not been executed yet.

Finally hasMessageAvailableAsync returns true, but the message cannot be read because it is filtered by the acknowledgmentsGroupingTracker

Modifications

clean the tracker when connection was open

Verifying this change

@lhotari
Copy link
Member

lhotari commented Apr 28, 2021

I wonder if #7796 and #7848 are anyhow related to this fix?

@315157973
Copy link
Contributor Author

I wonder if #7796 and #7848 are anyhow related to this fix?

There are many reasons for this problem, this is just one of the scenarios, for example: there are messages in incomeQueue, but they cannot be consumed. there will be follow-up pr to fix this problem.
I've seen it, and this pr can't cover his scene
What can be confirmed now is that all the flaky tests in MultiTopicsReaderTest have been resolved

@codelipenghui codelipenghui added this to the 2.8.0 milestone Apr 28, 2021
@wolfstudy wolfstudy self-requested a review April 29, 2021 09:32
Copy link
Member

@wolfstudy wolfstudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, LGTM +1

@wolfstudy wolfstudy added area/client type/bug The PR fixed a bug or issue reported a bug labels Apr 29, 2021
@sijie sijie merged commit f69a03b into apache:master May 4, 2021
@315157973 315157973 deleted the hasMessage branch May 11, 2021 09:02
eolivelli pushed a commit to eolivelli/pulsar that referenced this pull request May 11, 2021
)

### Motivation

I temporarily fixed this problem in PR apache#10190.
Now we have found a better way, this way can avoid the seek, then avoid trigger another reconnection.
Thank you @codelipenghui  to troubleshoot this issue with me all night.

We have added a lot of log and found that this issue is caused by some race condition problems. Here is the first reason:
https://github.com/apache/pulsar/blob/f2d72c9fc13a33df584ec1bd96a4c147774b858d/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L1808-L1818
Now we have an acknowledgmentsGroupingTracker to filter duplicate messages, and this Tracker will be cleaned up after seek.

However, it is possible that the connection is ready and Broker has pushed message, but `acknowledgmentsGroupingTracker.flushAndClean(); ` has not been executed yet. 

Finally hasMessageAvailableAsync returns true, but the message cannot be read because it is filtered by the acknowledgmentsGroupingTracker


### Modifications
clean the tracker when connection was open

### Verifying this change
codelipenghui pushed a commit that referenced this pull request Jun 26, 2021
### Motivation

I temporarily fixed this problem in PR #10190.
Now we have found a better way, this way can avoid the seek, then avoid trigger another reconnection.
Thank you @codelipenghui  to troubleshoot this issue with me all night.

We have added a lot of log and found that this issue is caused by some race condition problems. Here is the first reason:
https://github.com/apache/pulsar/blob/f2d72c9fc13a33df584ec1bd96a4c147774b858d/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L1808-L1818
Now we have an acknowledgmentsGroupingTracker to filter duplicate messages, and this Tracker will be cleaned up after seek.

However, it is possible that the connection is ready and Broker has pushed message, but `acknowledgmentsGroupingTracker.flushAndClean(); ` has not been executed yet. 

Finally hasMessageAvailableAsync returns true, but the message cannot be read because it is filtered by the acknowledgmentsGroupingTracker


### Modifications
clean the tracker when connection was open

### Verifying this change


(cherry picked from commit f69a03b)
@codelipenghui codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label Jun 26, 2021
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Nov 24, 2021
)

I temporarily fixed this problem in PR apache#10190.
Now we have found a better way, this way can avoid the seek, then avoid trigger another reconnection.
Thank you @codelipenghui  to troubleshoot this issue with me all night.

We have added a lot of log and found that this issue is caused by some race condition problems. Here is the first reason:
https://github.com/apache/pulsar/blob/f2d72c9fc13a33df584ec1bd96a4c147774b858d/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L1808-L1818
Now we have an acknowledgmentsGroupingTracker to filter duplicate messages, and this Tracker will be cleaned up after seek.

However, it is possible that the connection is ready and Broker has pushed message, but `acknowledgmentsGroupingTracker.flushAndClean(); ` has not been executed yet.

Finally hasMessageAvailableAsync returns true, but the message cannot be read because it is filtered by the acknowledgmentsGroupingTracker

clean the tracker when connection was open

(cherry picked from commit f69a03b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client cherry-picked/branch-2.7 Archived: 2.7 is end of life release/2.7.3 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants