Branch 2.5 All test #14

tuteng · 2020-02-23T04:29:05Z

(If this PR fixes a github issue, please add Fixes #<xyz>.)

Fixes #

(or if this PR is one task of a github issue, please add Master Issue: #<xyz> to link to the master issue.)

Master Issue: #

Motivation

Explain here the context, and why you're making that change. What is the problem you're trying to solve.

Modifications

Describe the modifications you've done.

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API: (yes / no)
The schema: (yes / no / don't know)
The default values of configurations: (yes / no)
The wire protocol: (yes / no)
The rest endpoints: (yes / no)
The admin cli options: (yes / no)
Anything that affects deployment: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
If a feature is not applicable for documentation, explain why?
If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

…ation. (apache#5572) ### Motivation Currently, disable the topic auto creation will cause consumer create failed on a partitioned topic. Since the partitioned topic is already created, so we should handle the topic partition create when disable the topic auto creation. ### Modifications By default, create partitioned topics also try to create all partitions, and if create partitions failed, users can use `create-missed-partitions` to repair. If users already have a partitioned topic without created partitions, can also use `create-missed-partitions` to repair. (cherry picked from commit 602f1c2)

* Fixed static linking on C++ lib on MacOS * Use `-undefined dynamic_lookup` when linking on Mac to not include python's own runtime * Fixed searching for protobuf (cherry picked from commit 125a588)

…ip to avoid bad zk-version (apache#5599) ### Motivation We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException` right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk. ``` 01:01:13.452 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000 : 01:01:13.653 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic : 01:02:13.677 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms : 01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout 01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK : 01:02:14.404 [shutdown-thread-57-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger ``` ### Modification This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves ``` 01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03] at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ``` (cherry picked from commit 0a259ab)

### Motivation Expose bookkeeper expose explicit lac configuration in broker.conf It's related to apache#3828 apache#4976, some Pulsar SQL users need to enable the explicitLacInterval, so that they can get the last message in Pulsar SQL. (cherry picked from commit 4fd17d4)

…ache#5836) *Motivation* pulsar-client-kafka-compact depends on pulsar-client implementation hence it pulls in protobuf dependencies. This results in `class file for com.google.protobuf.GeneratedMessageV3 not found` errors when generating javadoc for those modules. *Modifications* Skip javadoc tasks for these modules. Because: - pulsar-client-kafka-compact is a kafka wrapper. Kafka already provides javadoc for this API. - we didn't publish the javadoc for this module. (cherry picked from commit 97f9431)

(cherry picked from commit d1d5cf7)

…pache#5915) * Allow to enable/disable delyed delivery for messages on namespace Signed-off-by: xiaolong.ran <rxl@apache.org> * add isDelayedDeliveryEnabled function Signed-off-by: xiaolong.ran <rxl@apache.org> * add delayed_delivery_time process logic Signed-off-by: xiaolong.ran <rxl@apache.org> * add test case Signed-off-by: xiaolong.ran <rxl@apache.org> * update admin cli docs Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * update import lib Signed-off-by: xiaolong.ran <rxl@apache.org> * avoid import * Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * remove unuse code Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * add test case for delayed delivery messages Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> * fix comments Signed-off-by: xiaolong.ran <rxl@apache.org> (cherry picked from commit f0d339e)

Fixes apache#5755 ### Motivation Fix negative un-ack messages in consumer stats while set maxUnackedMessagesPerConsumer=0 ### Verifying this change Added unit test (cherry picked from commit 9d94860)

### Motivation Currently, Pulsar uses Avro 1.8.2, a version released two years ago. The latest version of Avro is 1.9.1, which uses FasterXML's Jackson 2.x instead of Codehaus's Jackson 1.x. Jackson is prone to security issues, so we should not keep using older versions. https://blog.godatadriven.com/apache-avro-1-9-release ### Modifications Avro 1.9 has some major changes: - The library used to handle logical datetime values has changed from Joda-Time to JSR-310 (apache/avro#631) - Namespaces no longer include "$" when generating schemas containing inner classes using ReflectData (apache/avro#283) - Validation of default values has been enabled (apache/avro#288). This results in a validation error when parsing the following schema: ```json { "name": "fieldName", "type": [ "null", "string" ], "default": "defaultValue" } ``` The default value of a nullable field must be null (cf. https://issues.apache.org/jira/browse/AVRO-1803), and the default value of the field as above is actually null. However, this PR disables the validation in order to maintain the traditional behavior. (cherry picked from commit d6f240e)

…5942) ### Motivation Avoid using same OpAddEntry between different ledger handles. ### Modifications Add state for OpAddEntry, if op handled by new ledger handle, the op will set to CLOSED state, after the legacy callback happens will check the op state, only INITIATED can be processed. When ledger rollover happens, pendingAddEntries will be processed. when process pendingAddEntries, will create a new OpAddEntry by the old OpAddEntry to avoid different ledger handles use same OpAddEntry. (cherry picked from commit 7ec17b2)

…itioned topic (apache#5943) ### Motivation Currently, it is not possible to create a partitioned topic with the same name as an existing non-partitioned topic, but the reverse is possible. ``` $ ./bin/pulsar-admin topics create persistent://public/default/t1 $ ./bin/pulsar-admin topics create-partitioned-topic -p 2 persistent://public/default/t1 16:12:50.418 [AsyncHttpClient-5-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/default/t1/partitions] Failed to perform http put request: javax.ws.rs.ClientErrorException: HTTP 409 Conflict This topic already exists Reason: This topic already exists $ ./bin/pulsar-admin topics create-partitioned-topic -p 2 persistent://public/default/t2 $ ./bin/pulsar-admin topics create persistent://public/default/t2 $ ./bin/pulsar-admin topics list public/default "persistent://public/default/t2" "persistent://public/default/t1" $ ./bin/pulsar-admin topics list-partitioned-topics public/default "persistent://public/default/t2" ``` These non-partitioned topics are not available and should not be created. ### Modifications When creating a non-partitioned topic, "409 Conflict" error will be returned if a partitioned topic with the same name already exists. (cherry picked from commit 7fd3f70)

…oducer (apache#5988) * [pulsar-broker] Clean up closed producer to avoid publish-time for producer * fix test cases (cherry picked from commit 0bc54c5)

### Motivation Since apache#5599 merged, it introduce some conflict code with master branch, maybe the reason is apache#5599 not rebase with master ### Verifying this change This is a test change (cherry picked from commit 275854e)

…apache#6051) --- Master Issue: apache#6046 *Motivation* Make people can use the timestamp to tell if acknowledge and consumption are happening. *Modifications* - Add lastConsumedTimestamp and lastAckedTimestamp to consume stats *Verify this change* - Pass the test `testConsumerStatsLastTimestamp` (cherry picked from commit 5728977)

(cherry picked from commit 56280ea)

(cherry picked from commit c90854a)

* PIP-55: Refresh Authentication Credentials * Fixed import order * Do not check for original client credential if it's not coming through proxy * Fixed import order * Fixed mocked test assumption * Addressed comments * Avoid to print NPE on auth refresh check if auth is disabled (cherry picked from commit 4af5223)

Motivation Message redelivery is not work well with zero queue consumer when using receive() or listeners to consume messages. This pull request is try to fix it. Modifications Add missed trackMessage() method call at zero queue size consumer. Verifying this change New unit tests added. (cherry picked from commit 787bee1)

### Motivation Currently, pulsar support delete inactive topic which has no active producers and no subscriptions. This pull request is support to delete inactive topics that all subscriptions of the topic are caught up and no active producers/consumer. ### Modifications Expose inactive topic delete mode in broker.conf, future more we can support namespace level configuration for the inactive topic delete mode. (cherry picked from commit dc7abd8)

…component (apache#6078) ### Motivation Some users may confuse by pulsar/bookie log without flushing immediately. ### Modifications Add a message in `bin/pulsar-daemon` when starting a component. (cherry picked from commit 4f461c3)

…to receive messages (apache#6090) Fix message redelivery for zero queue consumer while using async api to receive messages (cherry picked from commit d5fca06)

…ng (apache#6101) *Motivation* Related to apache#6084 apache#5400 introduces `customRuntimeOptions` in function details. But the description was wrong. The mistake was probably introduced by bad merges. *Modification* Fix the argument and description for `deadletterTopic` and `customRuntimeOptions`. (cherry picked from commit c6e258d)

) *Motivation* Fixes apache#5997 Fixes apache#6079 A regression was introduced in apache#5486. If websocket service as running as part of pulsar standalone, the cluster data is set with null service urls. This causes service url is not set correctly in the pulsar client and an illegal argument exception ("Param serviceUrl must not be blank.") will be thrown. *Modifications* 1. Pass `null` when constructing the websocket service. So the local cluster data can be refreshed when creating pulsar client. 2. Set the cluster data after both broker service and web service started and ports are allocated. (cherry picked from commit 49a9897)

### Motivation Available permits of ZeroQueueConsuemer must be 1 or less, however ZeroQueueConsuemer using listener may be greater than 1. ### Modifications If listener is processing message, ZeroQueueConsumer doesn't send permit when it reconnect to broker. ### Reproduction 1. ZeroQueueConsuemer using listener consume a topic. 2. Unload that topic( or restart a broker) when listener is processing message. 3. ZeroQueueConsumer sends permit when it reconnect to broker. https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L133 4. ZeroQueueConsumer also sends permit when finished processing message. https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L163 5. Available permits become 2. (cherry picked from commit c09314c)

`ManagedCursorImpl.asyncResetCursor` is used in three kinds of circumstances: - REST API: create a subscription with messageId. Per the document: Reset subscription to message position closest to given position. - REST API: reset subscription to a given position: Per the document: Reset subscription to message position closest to given position. - Consumer seek command. In all the cases above, when the user provides a MessageId, we should make the best effort to find the closest position, instead of throwing an InvalidCursorPosition Exception. This is because if a user provids an invalid position, it's not possible for he or she gets a valid position, since ledger ids for a given topic may not be continuous and only brokers are aware of the order. Therefore, we should avoid throw invalid cursor position but find the nearest position and do the reset stuff. (cherry picked from commit d2f37a7)

…pache#6122) ### Motivation In apache#2981, we have added support to grant subscriber-permission to manage subscription based apis. However, grant-subscription-permission api requires super-user access and it creates too much dependency on system-admin when many tenants want to grant subscription permission. So, allow each tenant to manage subscription permission in order to reduce administrative efforts for super user. (cherry picked from commit 254e54b)

…ng stuck (apache#6124) (cherry picked from commit d42cfa1)

when broker create the inside client, it sets tlsTrustCertsFilePath as "getTlsCertificateFilePath()", but it should be "getBrokerClientTrustCertsFilePath()" (cherry picked from commit 1fcccd6)

BatchReceivePolicy implements Serializable. (cherry picked from commit 792ab17)

netty 4.1.43 has a bug preventing it from using Linux native Epoll transport This results in pulsar brokers failing over to NioEventLoopGroup even when running on Linux. The bug is fixed in netty releases 4.1.45.Final (cherry picked from commit 760bd1a)

Motivation If set up maxMessagePublishBufferSizeInMB > Integer.MAX_VALUE / 1024 / 1024, the publish buffer limit does not take effect. The reason is maxMessagePublishBufferBytes always 0 when use following calculation method : pulsar.getConfiguration().getMaxMessagePublishBufferSizeInMB() * 1024 * 1024; So, changed to pulsar.getConfiguration().getMaxMessagePublishBufferSizeInMB() * 1024L * 1024L; (cherry picked from commit 75a321d)

…open instance (apache#6436) (cherry picked from commit 2ed2eb8)

Motivation fix the bug of authenticationData is't initialized. the method org.apache.pulsar.proxy.server.ProxyConnection#handleConnect can't init the value of authenticationData. cause of the bug that you will get the null value form the method org.apache.pulsar.broker.authorization.AuthorizationProvider#canConsumeAsync when implements org.apache.pulsar.broker.authorization.AuthorizationProvider interface. Modifications init the value of authenticationData from the method org.apache.pulsar.proxy.server.ProxyConnection#handleConnect. Verifying this change implements org.apache.pulsar.broker.authorization.AuthorizationProvider interface， and get the value of authenticationData. (cherry picked from commit b8f0ca0)

* Fixed the max backoff configuration for lookups * Fixed test expectation * More test fixes (cherry picked from commit 6ff87ee)

) Fixes apache#6453 ### Motivation `ConsumerBase` and `ProducerImpl` use `System.currentTimeMillis()` to measure the elapsed time in the 'operations' inner classes (`ConsumerBase$OpBatchReceive` and `ProducerImpl$OpSendMsg`). An instance variable `createdAt` is initialized with `System.currentTimeMills()`, but it is not used for reading wall clock time, the variable is only used for computing elapsed time (e.g. timeout for a batch). When the variable is used to compute elapsed time, it would more sense to use `System.nanoTime()`. ### Modifications The instance variable `createdAt` in `ConsumerBase$OpBatchReceive` and `ProducerImpl$OpSendMsg` is initialized with `System.nanoTime()`. Usage of the variable is updated to reflect that the variable holds nano time; computations of elapsed time takes the difference between the current system nano time and the `createdAt` variable. The `createdAt` field is package protected, and is currently only used in the declaring class and outer class, limiting the chances for unwanted side effects. (cherry picked from commit 459ec6e)

…one (apache#6457) When starting Pulsar in standalone mode with TLS enabled, it will fail to create two namespaces during start. This is because it's using the unencrypted URL/port while constructing the PulsarAdmin client. (cherry picked from commit 3e1b8f6)

…rpm (apache#6458) Fix apache#6439 We shouldn't static link libssl in libpulsar.a, as this is a security red flag. we should just use whatever the libssl the system provides. Because if there is a security problem in libssl, all the machines can just update their own libssl library without rebuilding libpulsar.a. As suggested, this change not change the old behavior, and mainly provides 2 other additional pulsar cpp client library in deb/rpm, and add related docs of how to use 4 libs in doc. The additional 2 libs: - pulsarSharedNossl (libpulsarnossl.so), similar to pulsarShared(libpulsar.so), with no ssl statically linked. - pulsarStaticWithDeps(libpulsarwithdeps.a), similar to pulsarStatic(libpulsar.a), and archived in the dependencies libraries of `libboost_regex`, `libboost_system`, `libcurl`, `libprotobuf`, `libzstd` and `libz` statically. Passed 4 libs rpm/deb build, install, and compile with a pulsar-client example code. * also add libpulsarwithdeps.a together with libpulsar.a into cpp client release * add documentation for libpulsarwithdeps.a, add g++ build examples * add pulsarSharedNossl target to build libpulsarnossl.so * update doc * verify 4 libs in rpm/deb build, installed, use all good (cherry picked from commit 33eea88)

…#6460) ### Motivation fix correct name for proxy thread executor name (cherry picked from commit 5c2c058)

### Motivation Proxy-logging fetches incorrect producerId for `Send` command because of that logging always gets producerId as 0 and it fetches invalid topic name for the logging. ### Modification Fixed topic logging by fetching correct producerId for `Send` command. (cherry picked from commit 65cc303)

…me. (apache#6478) Fixes apache#6468 Fix create a partitioned topic with a substring of an existing topic name. And make create partitioned topic async. (cherry picked from commit 19ccfd5)

…empty (apache#6480) (cherry picked from commit 6604f54)

(cherry picked from commit 47ca8e6)

Fixes apache#6482 ### Motivation Prevent topic compaction from leaking direct memory ### Modifications Several leaks were discovered using Netty leak detection and code review. * `CompactedTopicImpl.readOneMessageId` would get an `Enumeration` of `LedgerEntry`, but did not release the underlying buffers. Fix: iterate though the `Enumeration` and release underlying buffer. Instead of logging the case where the `Enumeration` did not contain any elements, complete the future exceptionally with the message (will be logged by Caffeine). * Two main sources of leak in `TwoPhaseCompactor`. The `RawBacthConverter.rebatchMessage` method failed to close/release a `ByteBuf` (uncompressedPayload). Also, the return ByteBuf of `RawBacthConverter.rebatchMessage` was not closed. The first one was easy to fix (release buffer), to fix the second one and make the code easier to read, I decided to not let `RawBacthConverter.rebatchMessage` close the message read from the topic, instead the message read from the topic can be closed in a try/finally clause surrounding most of the method body handing a message from a topic (in phase two loop). Then if a new message was produced by `RawBacthConverter.rebatchMessage` we check that after we have added the message to the compact ledger and release the message. ### Verifying this change Modified `RawReaderTest.testBatchingRebatch` to show new contract. One can run the test described to reproduce the issue, to verify no leak is detected. (cherry picked from commit f2ec1b4)

### Motivation Currently, the proxy only works to proxy v1/v2 functions routes to the function worker. ### Modifications This changes this code to proxy all routes for the function worker when those routes match. At the moment this is still a static list of prefixes, but in the future it may be possible to have this list of prefixes be dynamically fetched from the REST routes. ### Verifying this change - added some tests to ensure the routing works as expected (cherry picked from commit 329e231)

(cherry picked from commit ad5415a)

See apache#6416. This change ensures that all futures within BrokerService have a guranteed timeout. As stated in apache#6416, we see cases where it appears that loading or creating a topic fails to resolve the future for unknown reasons. It appears that these futures *may* not be returning. This seems like a sane change to make to ensure that these futures finish, however, it still isn't understood under what conditions these futures may not be returning, so this fix is mostly a workaround for some underlying issues Co-authored-by: Addison Higham <ahigham@instructure.com> (cherry picked from commit 4a4cce9)

…er When Ack Messages . (apache#6498) ### Motivation Because of apache#6391 , acked messages were counted as unacked messages. Although messages from brokers were acknowledged, the following log was output. ``` 2020-03-06 19:44:51.790 INFO ConsumerImpl:174 | [persistent://public/default/t1, sub1, 0] Created consumer on broker [127.0.0.1:58860 -> 127.0.0.1:6650] my-message-0: Fri Mar 6 19:45:05 2020 my-message-1: Fri Mar 6 19:45:05 2020 my-message-2: Fri Mar 6 19:45:05 2020 2020-03-06 19:45:15.818 INFO UnAckedMessageTrackerEnabled:53 | [persistent://public/default/t1, sub1, 0] : 3 Messages were not acked within 10000 time ``` This behavior happened on master branch. (cherry picked from commit 67f8cf3)

…er. (apache#6499) ### Motivation If the broker service is started, the client can connect to the broker and send requests depends on the namespace service, so we should create the namespace service before starting the broker. Otherwise, NPE occurs. ![image](https://user-images.githubusercontent.com/12592133/76090515-a9961400-5ff6-11ea-9077-cb8e79fa27c0.png) ![image](https://user-images.githubusercontent.com/12592133/76099838-b15db480-6006-11ea-8f39-31d820563c88.png) ### Modifications Move the namespace service creation and the schema registry service creation before start broker service. (cherry picked from commit 5285c68)

…access for topic (apache#6504) Co-authored-by: Sanjeev Kulkarni <sanjeevk@splunk.com> (cherry picked from commit 36ea153)

Fix apache#6462 ### Motivation admin api add getLastMessageId return batchIndex (cherry picked from commit 757824f)

apache#6550) ### Motivation Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem. Enable publish rate limit in broker.conf ``` brokerPublisherThrottlingTickTimeMillis=1 brokerPublisherThrottlingMaxByteRate=10000000 ``` Use Pulsar perf to test 100 partition message publishing: ``` bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p ``` The test result: ``` 10:45:28.844 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 367.8 msg/s --- 1402.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007 10:45:38.919 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 456.6 msg/s --- 1741.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319 10:45:48.959 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 432.0 msg/s --- 1648.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727 10:45:58.996 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 430.6 msg/s --- 1642.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551 10:46:09.195 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 403.2 msg/s --- 1538.3 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775 10:46:19.314 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 401.1 msg/s --- 1530.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567 10:46:29.389 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 372.8 msg/s --- 1422.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199 10:46:39.459 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 374.6 msg/s --- 1429.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095 10:46:49.537 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 295.6 msg/s --- 1127.5 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623 ``` Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message. ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation - Does this pull request introduce a new feature? (no) (cherry picked from commit ec31d54)

…over subscription mode. (apache#6558) Fixes apache#6552 ### Motivation apache#6552 is introduced by apache#5929, so this PR stop increase unacked messages for the consumer with Exclusive/Failover subscription mode. (cherry picked from commit 2449696)

* Fix: topic with one partition cannot be updated (cherry picked from commit 9602c9b)

### Motivation Fixes apache#6561 ### Modifications Initialize `BatchMessageAckerDisabled` with a `new BitSet()` Object. (cherry picked from commit 2007de6)

github-actions · 2022-09-23T04:03:00Z

The pr had no activity for 30 days, mark with Stale label.

sijie and others added 2 commits January 6, 2020 10:18

Release 2.5.0

f2afad3

tuteng force-pushed the master branch from deafc24 to f0880f2 Compare March 17, 2020 05:04

merlimat and others added 27 commits March 21, 2020 11:04

Fixed static linking on C++ lib on MacOS (apache#5581)

2efd47d

* Fixed static linking on C++ lib on MacOS * Use `-undefined dynamic_lookup` when linking on Mac to not include python's own runtime * Fixed searching for protobuf (cherry picked from commit 125a588)

add_backlogSize_in_topicStat (apache#5914)

33b2e24

(cherry picked from commit d1d5cf7)

Fix negative un-ack messages in consumer stats (apache#5929)

63a66d0

Fixes apache#5755 ### Motivation Fix negative un-ack messages in consumer stats while set maxUnackedMessagesPerConsumer=0 ### Verifying this change Added unit test (cherry picked from commit 9d94860)

[pulsar-broker] Clean up closed producer to avoid publish-time for pr…

6be3149

…oducer (apache#5988) * [pulsar-broker] Clean up closed producer to avoid publish-time for producer * fix test cases (cherry picked from commit 0bc54c5)

Fix unit test (apache#6006)

f3ca73f

### Motivation Since apache#5599 merged, it introduce some conflict code with master branch, maybe the reason is apache#5599 not rebase with master ### Verifying this change This is a test change (cherry picked from commit 275854e)

Fix issue 5505 (apache#6060)

ffa0b04

(cherry picked from commit 56280ea)

make acker transient (apache#6064)

4fbdc95

(cherry picked from commit c90854a)

Fix message redelivery for zero queue consumer while using async api …

693b59d

…to receive messages (apache#6090) Fix message redelivery for zero queue consumer while using async api to receive messages (cherry picked from commit d5fca06)

Add timeout to search for web service URLs to avoid web threads getti…

ea134ff

…ng stuck (apache#6124) (cherry picked from commit d42cfa1)

Fix broker client tls settings error (apache#6128)

7c13b40

when broker create the inside client, it sets tlsTrustCertsFilePath as "getTlsCertificateFilePath()", but it should be "getBrokerClientTrustCertsFilePath()" (cherry picked from commit 1fcccd6)

codelipenghui and others added 27 commits March 21, 2020 13:16

Update BatchReceivePolicy.java (apache#6423)

52d5d7c

BatchReceivePolicy implements Serializable. (cherry picked from commit 792ab17)

[Flink-Connector]Get PulsarClient from cache should always return an …

22fbdc1

…open instance (apache#6436) (cherry picked from commit 2ed2eb8)

Fixed the max backoff configuration for lookups (apache#6444)

56c7079

* Fixed the max backoff configuration for lookups * Fixed test expectation * More test fixes (cherry picked from commit 6ff87ee)

pulsar-proxy: fix correct name for proxy thread executor name (apache…

85257b5

…#6460) ### Motivation fix correct name for proxy thread executor name (cherry picked from commit 5c2c058)

Fix create partitioned topic with a substring of an existing topic na…

b5322bc

…me. (apache#6478) Fixes apache#6468 Fix create a partitioned topic with a substring of an existing topic name. And make create partitioned topic async. (cherry picked from commit 19ccfd5)

Avoid calling ConsumerImpl::redeliverMessages() when message list is …

c4902d6

…empty (apache#6480) (cherry picked from commit 6604f54)

Fix some async method problems at PersistentTopicsBase. (apache#6483)

1e1dd06

(cherry picked from commit 47ca8e6)

[pulsar-client] fix deadlock on send failure (apache#6488)

e05b786

(cherry picked from commit ad5415a)

Instead of always using admin access for topic, use read/write/admin …

394791d

…access for topic (apache#6504) Co-authored-by: Sanjeev Kulkarni <sanjeevk@splunk.com> (cherry picked from commit 36ea153)

Fix admin getLastMessageId return batchIndex (apache#6511)

b1c9c2f

Fix apache#6462 ### Motivation admin api add getLastMessageId return batchIndex (cherry picked from commit 757824f)

Fix: topic with one partition cannot be updated (apache#6560)

e7459b4

* Fix: topic with one partition cannot be updated (cherry picked from commit 9602c9b)

Fix NPE while call getLastMessageId. (apache#6562)

58e52e0

### Motivation Fixes apache#6561 ### Modifications Initialize `BatchMessageAckerDisabled` with a `new BitSet()` Object. (cherry picked from commit 2007de6)

Fixed protobuf-shaded version 2.5.0

68e5b79

tuteng force-pushed the branch-2.5 branch from 2b5cc0d to 68e5b79 Compare March 21, 2020 05:53

github-actions bot added the Stale label Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch 2.5 All test #14

Branch 2.5 All test #14

tuteng commented Feb 23, 2020

github-actions bot commented Sep 23, 2022

Branch 2.5 All test #14

Are you sure you want to change the base?

Branch 2.5 All test #14

Conversation

tuteng commented Feb 23, 2020

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

github-actions bot commented Sep 23, 2022