Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 12885]Fix unordered consuming case in Key_Shared subscription. #12890

Merged

Conversation

Jason918
Copy link
Contributor

@Jason918 Jason918 commented Nov 19, 2021

Fixes #12885

Motivation

See #12885

Modifications

Restrict that the consumer remove happens after entries readings.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as KeySharedSubscriptionTest.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • no-need-doc
    Bug fix.

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Nov 19, 2021
@Jason918
Copy link
Contributor Author

@codelipenghui @lhotari @eolivelli @merlimat Would you please help review this?

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the case, I think the root cause is there is an ongoing read op to read data from the managed ledger, but after a consumer closed, some messages will be added to the replay queue, after the previous read op completed, the active consumer might get messages out of order by key.

I think we can just add a check after got new messages from the managed ledger? If we have messages in the replay queue, we can just add the new messages to the replay queue and then trigger a new read entries operation, then the new operation will try to replay messages from the replay queue, does this can fix the issue?

@Jason918
Copy link
Contributor Author

Thanks for the case, I think the root cause is there is an ongoing read op to read data from the managed ledger, but after a consumer closed, some messages will be added to the replay queue, after the previous read op completed, the active consumer might get messages out of order by key.

The root cause of this case is recentlyJoinedConsumers is cleared in Thread3 when c1 is closed. See here.

So in the following Thread2, when sendMessagesToConsumers is called for m10-m14. Previously we are using recentlyJoinedConsumers in getRestrictedMaxEntriesForConsumer to prevent sending these messages to c2. But now it's cleared. so "maxMessages" is returned instead of "0". See here:

@Jason918 Jason918 force-pushed the fix-12885-KeyShared-RemoveFirstConsumer branch from ea6aee5 to 7884c0a Compare November 21, 2021 14:36
@Jason918
Copy link
Contributor Author

@codelipenghui Come up with a more simple way for the case. PTAL.
The basic ideal is to check if we got message with smaller position to replay, in this case, unacked messages prefecthed by c1 will be added for replay after consumer closed.

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918 Jason918 force-pushed the fix-12885-KeyShared-RemoveFirstConsumer branch from 7884c0a to d61c17c Compare November 26, 2021 08:59
@codelipenghui
Copy link
Contributor

@merlimat Please help review this PR.

@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Nov 29, 2021
@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918 Jason918 closed this Nov 30, 2021
@Jason918 Jason918 reopened this Nov 30, 2021
@Jason918 Jason918 force-pushed the fix-12885-KeyShared-RemoveFirstConsumer branch from 5021128 to f7aa203 Compare November 30, 2021 03:44
@Jason918 Jason918 force-pushed the fix-12885-KeyShared-RemoveFirstConsumer branch from f7aa203 to ca97899 Compare November 30, 2021 06:46
@codelipenghui codelipenghui merged commit 73ef162 into apache:master Nov 30, 2021
zeo1995 pushed a commit to zeo1995/pulsar that referenced this pull request Dec 1, 2021
* up/master: (75 commits)
  [website][upgrade]feat: website upgrade / docs migration - 2.5.1 Get Started/Concepts and Architecture/Pulsar Schema (apache#13030)
  Fix environment variable assignment in startup scripts (apache#13025)
  update 2.8.x (apache#13029)
  [Doc] add tips for Pulsar tools (apache#13044)
  Suggest to use tlsPort instead of deprecated TlsEnable (apache#13039)
  Integration tests for function-worker rebalance and drain operations. (apache#13058)
  fix(functions): missing runtime set in GoInstanceConfig (apache#13031)
  [pulsar-admin] Add get-replicated-subscription-status command for topic (apache#12891)
  [Broker] Consider topics in pulsar/system namespace as system topics (apache#13050)
  Fix typo: correct sizeUint to sizeUnit (apache#13040)
  fix-12894 (apache#12896)
  Don't attempt to delete pending ack store unless transactions are enabled (apache#13041)
  [Perf] Evaluate the current protocol version once (apache#13045)
  Fix Issue apache#12885, Unordered consuming case in Key_Shared subscription (apache#12890)
  [broker]Optimize topicMaxMessageSize with topic local cache. (apache#12830)
  [PIP-105] Part-2 Support pluggable entry filter in Dispatcher (apache#12970)
  [website] Modify admin-api-topic.md document (apache#12996)
  add missed import (apache#13037)
  [metadata] Add RocksdbMetadataStore (apache#12776)
  [C] Add pulsar_client_subscribe_multi_topics and pulsar_client_subscribe_pattern (apache#12965)
  ...

# Conflicts:
#	site2/website-next/docusaurus.config.js
#	site2/website-next/versioned_sidebars/version-2.6.1-sidebars.json
#	site2/website-next/versions.json
eolivelli pushed a commit that referenced this pull request Dec 15, 2021
@eolivelli eolivelli added cherry-picked/branch-2.9 Archived: 2.9 is end of life release/2.9.1 and removed release/2.9.2 labels Dec 15, 2021
fxbing pushed a commit to fxbing/pulsar that referenced this pull request Dec 19, 2021
zymap pushed a commit that referenced this pull request Dec 23, 2021
@zymap zymap added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Dec 23, 2021
eolivelli pushed a commit to datastax/pulsar that referenced this pull request Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.3 release/2.9.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Unordered consuming case in Key_Shared subscription
7 participants