[Broker] Replicator producer should have blockIfQueueFull=true #15724

lhotari · 2022-05-23T17:37:39Z

Motivation

In #15691, an exception is seen which isn't properly handled.

2022-05-19T18:35:09,916+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentReplicator - [persistent://tenant/namespace/perftest-partition-33][pulsar-cluster-src -> pulsar-cluster-dst] Error producing on remote broker
org.apache.pulsar.client.api.PulsarClientException$MemoryBufferIsFullError: Client memory buffer is full
	at org.apache.pulsar.client.impl.ProducerImpl.canEnqueueRequest(ProducerImpl.java:849) ~[com.datastax.oss-pulsar-client-original-2.10.0.3.jar:2.10.0.3]
	at org.apache.pulsar.client.impl.ProducerImpl.sendAsync(ProducerImpl.java:429) ~[com.datastax.oss-pulsar-client-original-2.10.0.3.jar:2.10.0.3]
	at org.apache.pulsar.broker.service.persistent.PersistentReplicator.lambda$readEntriesComplete$2(PersistentReplicator.java:369) ~[com.datastax.oss-pulsar-broker-2.10.0.3.jar:2.10.0.3]
	at java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731) ~[?:?]
	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108) ~[?:?]
	at org.apache.pulsar.broker.service.persistent.PersistentReplicator.readEntriesComplete(PersistentReplicator.java:367) ~[com.datastax.oss-pulsar-broker-2.10.0.3.jar:2.10.0.3]
	at org.apache.bookkeeper.mledger.impl.OpReadEntry.lambda$checkReadCompletion$2(OpReadEntry.java:156) ~[com.datastax.oss-managed-ledger-2.10.0.3.jar:2.10.0.3]
	at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [com.datastax.oss-managed-ledger-2.10.0.3.jar:2.10.0.3]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [com.datastax.oss-bookkeeper-common-4.14.5.1.0.0.jar:4.14.5.1.0.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.76.Final.jar:4.1.76.Final]
	at java.lang.Thread.run(Thread.java:829) [?:?]

The reason for this is the memory limit controller which is enabled by default. PR #15723 addresses that and makes it configurable and disables the memory limit for replication clients by default. Besides this, the blockIfQueueFull=true setting should be used in the replicator producer.

prevents exceptions when sending gets backpressured by memory limit (or queue size limit)

Modifications

set blockIfQueueFull=true for replicator producer

- prevents exceptions when sending gets backpressured by memory limit

merlimat · 2022-05-23T17:39:25Z

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java

@@ -83,6 +83,7 @@ public AbstractReplicator(String topicName, String replicatorPrefix, String loca
                .enableBatching(false)
                .sendTimeout(0, TimeUnit.SECONDS) //
                .maxPendingMessages(producerQueueSize) //
+                .blockIfQueueFull(true)


This would mean blocking the IO threads (or other shared threads) in the broker causing a deadlock

We need to find a better solution to have a broker-wide flow control mechanism for replicators that can be tied to the client memory limit.

Thanks for reviewing @merlimat . Yes, blocking would be problematic in this case.

Would you mind also reviewing #15723 since the memory limit is currently causing regressions in geo-replication in 2.10 .

[Broker] Replicator producer should have blockIfQueueFull=true

1a5e604

- prevents exceptions when sending gets backpressured by memory limit

lhotari added type/bug The PR fixed a bug or issue reported a bug area/broker area/geo-replication doc-not-needed Your PR changes do not impact docs labels May 23, 2022

lhotari added this to the 2.11.0 milestone May 23, 2022

lhotari requested review from merlimat, eolivelli, codelipenghui and michaeljmarshall May 23, 2022 17:37

lhotari self-assigned this May 23, 2022

lhotari mentioned this pull request May 23, 2022

[Broker] Disable memory limit controller for broker client and replication clients #15723

Merged

merlimat requested changes May 23, 2022

View reviewed changes

lhotari marked this pull request as draft May 23, 2022 17:44

lhotari closed this May 24, 2022

rogueVpedro mentioned this pull request Aug 31, 2022

[Bug] How to solve or optimize this problem [Client memory buffer is full] #17360

Open

2 tasks

sijie mentioned this pull request Aug 31, 2022

ISSUE-17360: [Bug] How to solve or optimize this problem [Client memory buffer is full] streamnative/pulsar-archived#4800

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Broker] Replicator producer should have blockIfQueueFull=true #15724

[Broker] Replicator producer should have blockIfQueueFull=true #15724

lhotari commented May 23, 2022

merlimat May 23, 2022

merlimat May 23, 2022

lhotari May 23, 2022

[Broker] Replicator producer should have blockIfQueueFull=true #15724

[Broker] Replicator producer should have blockIfQueueFull=true #15724

Conversation

lhotari commented May 23, 2022

Motivation

Modifications

merlimat May 23, 2022

Choose a reason for hiding this comment

merlimat May 23, 2022

Choose a reason for hiding this comment

lhotari May 23, 2022

Choose a reason for hiding this comment