Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Producer is blocked on creation because backlog exceeded on topic, when dedup is enabled and no producer is there #20951

Merged
merged 2 commits into from Aug 24, 2023

Conversation

heesung-sn
Copy link
Contributor

@heesung-sn heesung-sn commented Aug 8, 2023

Motivation

As per backlog quota logic, Producer is blocked on creation when

  1. backlog exceeded on topic
  2. dedup is enabled
  3. no producer is there.

Especially, producer creation is indefinitely blocked when there brokerDeduplicationEnabled=false(default=false), which disables periodic dedup map snapshots every 2mins.

When checking backlog size, the logic computes the backlog size against the slowest consumer on the topic. Unfortunately, the dedup system consumer(for the dedup map recovery) is usually behind the latest position -- the broker takes deup recovery map snapshot every 1000 entry(by default, brokerDeduplicationEntriesInterval) or every 2 mins(by default) when brokerDeduplicationEnabled is enabled(default is false), so before the next snapshot cycle, the dedup system consumer's MarkDeletePos is usually behind, and this can hold backlog more than the quota. As a result, the dedup-enabled topic could show more backlog size and block produce creations, especially when there is no producer and the time-based dedup-map snapshot is disabled, although all of the actual consumers' MarkDeletePos are up-to-date.

Modifications

Add advanceSlowestMessageDeduplicationCursor in the backlogQuotaCheck failure handler to advance the Slowest MessageDeduplication Cursor when the backlog quota exceeds the threshold.

Verifying this change

# local repro steps
bin/pulsar-admin tenants create my-tenant
bin/pulsar-admin namespaces create my-tenant/my-namespace-3 --bundles 5
bin/pulsar-admin namespaces set-backlog-quota my-tenant/my-namespace-3 -l 1K -p producer_request_hold
bin/pulsar-admin namespaces set-deduplication my-tenant/my-namespace-3 --enable

# on another console
bin/pulsar-client consume -s sub-read persistent://my-tenant/my-namespace-3/my-topic -p Earliest -n 0 --hide-content

# on another console, try multiple times until hit
bin/pulsar-perf produce persistent://my-tenant/my-namespace-3/my-topic -r 100 -bm 1 -mk random -n 1 
  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:
Added the unit test to cover this logic change.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: heesung-sn#50

@github-actions
Copy link

github-actions bot commented Aug 8, 2023

@heesung-sn Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Aug 8, 2023
@heesung-sn heesung-sn force-pushed the fix-dedup-quota branch 3 times, most recently from cacf511 to 0261b82 Compare August 8, 2023 05:44
@@ -195,6 +195,10 @@ public class PersistentTopic extends AbstractTopic implements Topic, AddEntryCal
private final TopicName shadowSourceTopic;

static final String DEDUPLICATION_CURSOR_NAME = "pulsar.dedup";

public static boolean isDedupCursorName(String name) {
return name.equals(DEDUPLICATION_CURSOR_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use DEDUPLICATION_CURSOR_NAME .equals(curosrName) is better, because the constant variable is never be null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Copy link
Contributor

@poorbarcode poorbarcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -406,6 +410,11 @@ private void takeSnapshot(Position position) {
if (log.isDebugEnabled()) {
log.debug("[{}] Taking snapshot of sequence ids map", topic.getName());
}

if (!snapshotTaking.compareAndSet(false, true)) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since one updating cursor metadata task has been skipped(maybe the skipped one has a larger position), shall we print a log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this snapshot can be called from other threads too. We better not overflow logs.

@poorbarcode poorbarcode added this to the 3.2.0 milestone Aug 8, 2023
@poorbarcode poorbarcode changed the title [broker][fix] Producer is blocked on creation because backlog exceeded on topic, when dedup is enabled and no producer is there [fix] [broker] Producer is blocked on creation because backlog exceeded on topic, when dedup is enabled and no producer is there Aug 8, 2023
@poorbarcode poorbarcode added type/bug The PR fixed a bug or issue reported a bug and removed type/bug The PR fixed a bug or issue reported a bug labels Aug 8, 2023
@codelipenghui codelipenghui added type/bug The PR fixed a bug or issue reported a bug area/broker labels Aug 9, 2023
@codelipenghui
Copy link
Contributor

@heesung-sn Please rebase to the master branch to resolve the CI issue.

@heesung-sn
Copy link
Contributor Author

@heesung-sn Please rebase to the master branch to resolve the CI issue.

Done

byte[] content = new byte[1024];
Producer<byte[]> producer = createProducer(client, topic1);

admin.topicPolicies().setDeduplicationStatus(topic1, dedupTestSet);
Thread.sleep((TIME_TO_CHECK_BACKLOG_QUOTA + 1) * 1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use admin to instead of sleep

assertFalse(gotException, "unable to publish due to " + sendException);

gotException = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this line

@codelipenghui
Copy link
Contributor

@heesung-sn Is there anything blocking your committer process? I realized you still don't have permission for the pulsar repo.

@codelipenghui codelipenghui added the category/functionality Some functions are not working such as get errors label Aug 21, 2023
@heesung-sn heesung-sn merged commit 30073db into apache:master Aug 24, 2023
44 of 45 checks passed
@heesung-sn
Copy link
Contributor Author

@heesung-sn Is there anything blocking your committer process? I realized you still don't have permission for the pulsar repo.

I got the permission now. I merged this PR.

poorbarcode pushed a commit that referenced this pull request Aug 29, 2023
…ed on topic, when dedup is enabled and no producer is there (#20951)

(cherry picked from commit 30073db)
shibd pushed a commit to shibd/pulsar that referenced this pull request Oct 22, 2023
…ed on topic, when dedup is enabled and no producer is there (apache#20951)

(cherry picked from commit 30073db)
shibd pushed a commit to shibd/pulsar that referenced this pull request Oct 24, 2023
…ed on topic, when dedup is enabled and no producer is there (apache#20951)

(cherry picked from commit 30073db)
@heesung-sn heesung-sn deleted the fix-dedup-quota branch April 2, 2024 17:45
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 15, 2024
…ed on topic, when dedup is enabled and no producer is there (apache#20951)

(cherry picked from commit 30073db)
(cherry picked from commit f68589e)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 19, 2024
…ed on topic, when dedup is enabled and no producer is there (apache#20951)

(cherry picked from commit 30073db)
(cherry picked from commit f68589e)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Apr 23, 2024
…ed on topic, when dedup is enabled and no producer is there (apache#20951)

(cherry picked from commit 30073db)
(cherry picked from commit f68589e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker category/functionality Some functions are not working such as get errors cherry-picked/branch-2.11 cherry-picked/branch-3.0 doc-not-needed Your PR changes do not impact docs ready-to-test release/2.10.7 release/2.11.3 release/3.0.2 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants