Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 2.5 All test #14

Open
wants to merge 110 commits into
base: master
Choose a base branch
from
Open

Branch 2.5 All test #14

wants to merge 110 commits into from

Conversation

tuteng
Copy link

@tuteng tuteng commented Feb 23, 2020

(If this PR fixes a github issue, please add Fixes #<xyz>.)

Fixes #

(or if this PR is one task of a github issue, please add Master Issue: #<xyz> to link to the master issue.)

Master Issue: #

Motivation

Explain here the context, and why you're making that change. What is the problem you're trying to solve.

Modifications

Describe the modifications you've done.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
  • If a feature is not applicable for documentation, explain why?
  • If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

sijie and others added 2 commits January 6, 2020 10:18
…ation. (apache#5572)

### Motivation

Currently, disable the topic auto creation will cause consumer create failed on a partitioned topic. Since the partitioned topic is already created, so we should handle the topic partition create when disable the topic auto creation.

### Modifications

By default, create partitioned topics also try to create all partitions, and if create partitions failed, users can use `create-missed-partitions` to repair.

If users already have a partitioned topic without created partitions, can also use `create-missed-partitions` to repair.
(cherry picked from commit 602f1c2)
merlimat and others added 27 commits March 21, 2020 11:04
* Fixed static linking on C++ lib on MacOS

* Use `-undefined dynamic_lookup` when linking on Mac to not include python's own runtime

* Fixed searching for protobuf

(cherry picked from commit 125a588)
…ip to avoid bad zk-version (apache#5599)

### Motivation

We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException`

right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk.

```
01:01:13.452 [shutdown-thread-57-1] INFO  org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000
:
01:01:13.653 [shutdown-thread-57-1] INFO  org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic
:
01:02:13.677 [shutdown-thread-57-1] INFO  org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms
:
01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout
01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO  org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK
:
01:02:14.404 [shutdown-thread-57-1] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger
```

### Modification

This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves
```

01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed
org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
        at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo]
        at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo]
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final]
        at java.lang.Thread.run(Thread.java:834) [?:?]
```

(cherry picked from commit 0a259ab)
### Motivation

Expose bookkeeper expose explicit lac configuration in broker.conf
It's related to apache#3828 apache#4976, some Pulsar SQL users need to enable the explicitLacInterval, so that they can get the last message in Pulsar SQL.

(cherry picked from commit 4fd17d4)
…ache#5836)

*Motivation*

pulsar-client-kafka-compact depends on pulsar-client implementation hence it pulls in
protobuf dependencies. This results in `class file for com.google.protobuf.GeneratedMessageV3 not found`
errors when generating javadoc for those modules.

*Modifications*

Skip javadoc tasks for these modules. Because:

- pulsar-client-kafka-compact is a kafka wrapper. Kafka already provides javadoc for this API.
- we didn't publish the javadoc for this module.

(cherry picked from commit 97f9431)
…pache#5915)

* Allow to enable/disable delyed delivery for messages on namespace

Signed-off-by: xiaolong.ran <rxl@apache.org>

* add isDelayedDeliveryEnabled function

Signed-off-by: xiaolong.ran <rxl@apache.org>

* add delayed_delivery_time process logic

Signed-off-by: xiaolong.ran <rxl@apache.org>

* add test case

Signed-off-by: xiaolong.ran <rxl@apache.org>

* update admin cli docs

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* update import lib

Signed-off-by: xiaolong.ran <rxl@apache.org>

* avoid import *

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* remove unuse code

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* add test case for delayed delivery messages

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>

* fix comments

Signed-off-by: xiaolong.ran <rxl@apache.org>
(cherry picked from commit f0d339e)
Fixes apache#5755

### Motivation

Fix negative un-ack messages in consumer stats while set maxUnackedMessagesPerConsumer=0

### Verifying this change

Added unit test

(cherry picked from commit 9d94860)
### Motivation

Currently, Pulsar uses Avro 1.8.2, a version released two years ago. The latest version of Avro is 1.9.1, which uses FasterXML's Jackson 2.x instead of Codehaus's Jackson 1.x. Jackson is prone to security issues, so we should not keep using older versions.
https://blog.godatadriven.com/apache-avro-1-9-release

### Modifications

Avro 1.9 has some major changes:

- The library used to handle logical datetime values has changed from Joda-Time to JSR-310 (apache/avro#631)
- Namespaces no longer include "$" when generating schemas containing inner classes using ReflectData (apache/avro#283)
- Validation of default values has been enabled (apache/avro#288). This results in a validation error when parsing the following schema:
```json
{
  "name": "fieldName",
  "type": [
    "null",
    "string"
  ],
  "default": "defaultValue"
}
```
The default value of a nullable field must be null (cf. https://issues.apache.org/jira/browse/AVRO-1803), and the default value of the field as above is actually null. However, this PR disables the validation in order to maintain the traditional behavior.

(cherry picked from commit d6f240e)
…5942)

### Motivation

Avoid using same OpAddEntry between different ledger handles.

### Modifications

Add state for OpAddEntry, if op handled by new ledger handle, the op will set to CLOSED state, after the legacy callback happens will check the op state, only INITIATED can be processed.

When ledger rollover happens, pendingAddEntries will be processed. when process pendingAddEntries, will create a new OpAddEntry by the old OpAddEntry to avoid different ledger handles use same OpAddEntry.

(cherry picked from commit 7ec17b2)
…itioned topic (apache#5943)

### Motivation


Currently, it is not possible to create a partitioned topic with the same name as an existing non-partitioned topic, but the reverse is possible.

```
$ ./bin/pulsar-admin topics create persistent://public/default/t1
$ ./bin/pulsar-admin topics create-partitioned-topic -p 2 persistent://public/default/t1

16:12:50.418 [AsyncHttpClient-5-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/default/t1/partitions] Failed to perform http put request: javax.ws.rs.ClientErrorException: HTTP 409 Conflict
This topic already exists

Reason: This topic already exists

$ ./bin/pulsar-admin topics create-partitioned-topic -p 2 persistent://public/default/t2
$ ./bin/pulsar-admin topics create persistent://public/default/t2
$ ./bin/pulsar-admin topics list public/default

"persistent://public/default/t2"
"persistent://public/default/t1"

$ ./bin/pulsar-admin topics list-partitioned-topics public/default

"persistent://public/default/t2"
```

These non-partitioned topics are not available and should not be created.

### Modifications

When creating a non-partitioned topic, "409 Conflict" error will be returned if a partitioned topic with the same name already exists.

(cherry picked from commit 7fd3f70)
…oducer (apache#5988)

* [pulsar-broker] Clean up closed producer to avoid publish-time  for producer

* fix test cases

(cherry picked from commit 0bc54c5)
### Motivation

Since apache#5599 merged, it introduce some conflict code with master branch, maybe the reason is apache#5599 not rebase with master

### Verifying this change

This is a test change

(cherry picked from commit 275854e)
…apache#6051)

---

Master Issue: apache#6046

*Motivation*

Make people can use the timestamp to tell if acknowledge and consumption
are happening.

*Modifications*

- Add lastConsumedTimestamp and lastAckedTimestamp to consume stats

*Verify this change*

- Pass the test `testConsumerStatsLastTimestamp`

(cherry picked from commit 5728977)
(cherry picked from commit 56280ea)
(cherry picked from commit c90854a)
* PIP-55: Refresh Authentication Credentials

* Fixed import order

* Do not check for original client credential if it's not coming through proxy

* Fixed import order

* Fixed mocked test assumption

* Addressed comments

* Avoid to print NPE on auth refresh check if auth is disabled

(cherry picked from commit 4af5223)
Motivation
Message redelivery is not work well with zero queue consumer when using receive() or listeners to consume messages. This pull request is try to fix it.

Modifications
Add missed trackMessage() method call at zero queue size consumer.

Verifying this change
New unit tests added.

(cherry picked from commit 787bee1)
### Motivation

Currently, pulsar support delete inactive topic which has no active producers and no subscriptions. This pull request is support to delete inactive topics that all subscriptions of the topic are caught up and no active producers/consumer. 

### Modifications

Expose inactive topic delete mode in broker.conf, future more we can support namespace level configuration for the inactive topic delete mode.

(cherry picked from commit dc7abd8)
…component (apache#6078)

### Motivation

Some users may confuse by pulsar/bookie log without flushing immediately.

### Modifications

Add a message in `bin/pulsar-daemon` when starting a component.

(cherry picked from commit 4f461c3)
…to receive messages (apache#6090)

Fix message redelivery for zero queue consumer while using async api to receive messages

(cherry picked from commit d5fca06)
…ng (apache#6101)

*Motivation*

Related to apache#6084

 apache#5400 introduces `customRuntimeOptions` in function details. But the description was wrong. The mistake was probably introduced by bad merges.

*Modification*

Fix the argument and description for `deadletterTopic` and `customRuntimeOptions`.

(cherry picked from commit c6e258d)
)

*Motivation*

Fixes apache#5997
Fixes apache#6079

A regression was introduced in apache#5486. If websocket service as running as part of
pulsar standalone, the cluster data is set with null service urls. This causes
service url is not set correctly in the pulsar client and an illegal argument exception
("Param serviceUrl must not be blank.") will be thrown.

*Modifications*

1. Pass `null` when constructing the websocket service. So the local cluster data can
   be refreshed when creating pulsar client.
2. Set the cluster data after both broker service and web service started and ports are allocated.

(cherry picked from commit 49a9897)
### Motivation

Available permits of ZeroQueueConsuemer must be 1 or less, however ZeroQueueConsuemer using listener may be greater than 1.


### Modifications

If listener is processing message, ZeroQueueConsumer doesn't send permit when it reconnect to broker.


### Reproduction
1. ZeroQueueConsuemer using listener consume a topic.

2. Unload that topic( or restart a broker) when listener is processing message.

3.  ZeroQueueConsumer sends permit when it reconnect to broker.
https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L133

4. ZeroQueueConsumer also sends permit when finished processing message.
https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L163

5. Available permits become 2.

(cherry picked from commit c09314c)
`ManagedCursorImpl.asyncResetCursor` is used in three kinds of circumstances:
- REST API: create a subscription with messageId. Per the document: Reset subscription to message position closest to given position.
- REST API: reset subscription to a given position: Per the document: Reset subscription to message position closest to given position.
- Consumer seek command.

In all the cases above, when the user provides a MessageId, we should make the best effort to find the closest position, instead of throwing an InvalidCursorPosition Exception. 

This is because if a user provids an invalid position, it's not possible for he or she gets a valid position, since ledger ids for a given topic may not be continuous and only brokers are aware of the order. Therefore, we should avoid throw invalid cursor position but find the nearest position and do the reset stuff.

(cherry picked from commit d2f37a7)
…pache#6122)

### Motivation
In apache#2981, we have added support to grant subscriber-permission to manage subscription based apis. However, grant-subscription-permission api requires super-user access and it creates too much dependency on system-admin when many tenants want to grant subscription permission.
So, allow each tenant to manage subscription permission in order to reduce administrative efforts for super user. 

(cherry picked from commit 254e54b)
when broker create the inside client, it sets tlsTrustCertsFilePath as "getTlsCertificateFilePath()", but it should be "getBrokerClientTrustCertsFilePath()"

(cherry picked from commit 1fcccd6)
codelipenghui and others added 27 commits March 21, 2020 13:16
BatchReceivePolicy implements Serializable.

(cherry picked from commit 792ab17)
netty 4.1.43 has a bug preventing it from using Linux native Epoll transport

This results in pulsar brokers failing over to NioEventLoopGroup even when running on Linux.

The bug is fixed in netty releases 4.1.45.Final

(cherry picked from commit 760bd1a)
Motivation
If set up maxMessagePublishBufferSizeInMB > Integer.MAX_VALUE / 1024 / 1024, the publish buffer limit does not take effect. The reason is maxMessagePublishBufferBytes always 0 when use following calculation method :

pulsar.getConfiguration().getMaxMessagePublishBufferSizeInMB() * 1024 * 1024;
So, changed to

pulsar.getConfiguration().getMaxMessagePublishBufferSizeInMB() * 1024L * 1024L;

(cherry picked from commit 75a321d)
Motivation
fix the bug of authenticationData is't initialized.

the method org.apache.pulsar.proxy.server.ProxyConnection#handleConnect can't init the value of authenticationData.
cause of the bug that you will get the null value form the method org.apache.pulsar.broker.authorization.AuthorizationProvider#canConsumeAsync
when implements org.apache.pulsar.broker.authorization.AuthorizationProvider interface.

Modifications
init the value of authenticationData from the method org.apache.pulsar.proxy.server.ProxyConnection#handleConnect.

Verifying this change
implements org.apache.pulsar.broker.authorization.AuthorizationProvider interface, and get the value of authenticationData.

(cherry picked from commit b8f0ca0)
* Fixed the max backoff configuration for lookups

* Fixed test expectation

* More test fixes

(cherry picked from commit 6ff87ee)
)

Fixes apache#6453 

### Motivation
`ConsumerBase` and `ProducerImpl` use `System.currentTimeMillis()` to measure the elapsed time in the 'operations' inner classes (`ConsumerBase$OpBatchReceive` and `ProducerImpl$OpSendMsg`).

An instance variable `createdAt` is initialized with `System.currentTimeMills()`, but it is not used for reading wall clock time, the variable is only used for computing elapsed time (e.g. timeout for a batch).

When the variable is used to compute elapsed time, it would more sense to use `System.nanoTime()`.

### Modifications

The instance variable `createdAt` in `ConsumerBase$OpBatchReceive` and  `ProducerImpl$OpSendMsg` is initialized with `System.nanoTime()`. Usage of the variable is updated to reflect that the variable holds nano time; computations of elapsed time takes the difference between the current system nano time and the `createdAt` variable.

The `createdAt` field is package protected, and is currently only used in the declaring class and outer class, limiting the chances for unwanted side effects.

(cherry picked from commit 459ec6e)
…one (apache#6457)

When starting Pulsar in standalone mode with TLS enabled, it will fail to create two namespaces during start. 

This is because it's using the unencrypted URL/port while constructing the PulsarAdmin client. 

(cherry picked from commit 3e1b8f6)
…rpm (apache#6458)

Fix apache#6439 
We shouldn't static link libssl in libpulsar.a, as this is a security red flag. we should just use whatever the libssl the system provides. Because if there is a security problem in libssl, all the machines can just update their own libssl library without rebuilding libpulsar.a.
As suggested, this change not change the old behavior, and mainly provides 2 other additional pulsar cpp client library in deb/rpm, and add related docs of how to use 4 libs in doc.
The additional 2 libs: 
- pulsarSharedNossl (libpulsarnossl.so), similar to pulsarShared(libpulsar.so), with no ssl statically linked.
- pulsarStaticWithDeps(libpulsarwithdeps.a), similar to pulsarStatic(libpulsar.a), and archived in the dependencies libraries of `libboost_regex`,  `libboost_system`, `libcurl`, `libprotobuf`, `libzstd` and `libz` statically.

Passed 4 libs rpm/deb build, install, and compile with a pulsar-client example code.

* also add libpulsarwithdeps.a together with libpulsar.a into cpp client release

* add documentation for libpulsarwithdeps.a, add g++ build examples

* add pulsarSharedNossl target to build libpulsarnossl.so

* update doc

* verify 4 libs in rpm/deb build, installed, use all good

(cherry picked from commit 33eea88)
…#6460)

### Motivation
fix correct name for proxy thread executor name

(cherry picked from commit 5c2c058)
### Motivation
Proxy-logging fetches incorrect producerId for `Send` command because of that logging always gets producerId as 0 and it fetches invalid topic name for the logging.

### Modification
Fixed topic logging by fetching correct producerId for `Send` command.

(cherry picked from commit 65cc303)
…me. (apache#6478)

Fixes apache#6468

Fix create a partitioned topic with a substring of an existing topic name. And make create partitioned topic async.

(cherry picked from commit 19ccfd5)
Fixes apache#6482

### Motivation
Prevent topic compaction from leaking direct memory

### Modifications

Several leaks were discovered using Netty leak detection and code review.
* `CompactedTopicImpl.readOneMessageId` would get an `Enumeration` of `LedgerEntry`, but did not release the underlying buffers. Fix: iterate though the `Enumeration` and release underlying buffer. Instead of logging the case where the `Enumeration` did not contain any elements, complete the future exceptionally with the message (will be logged by Caffeine).
* Two main sources of leak in `TwoPhaseCompactor`. The `RawBacthConverter.rebatchMessage` method failed to close/release a `ByteBuf` (uncompressedPayload). Also, the return ByteBuf of `RawBacthConverter.rebatchMessage` was not closed. The first one was easy to fix (release buffer), to fix the second one and make the code easier to read, I decided to not let `RawBacthConverter.rebatchMessage`  close the message read from the topic, instead the message read from the topic can be closed in a try/finally clause surrounding most of the method body handing a message from a topic (in phase two loop). Then if a new message was produced by `RawBacthConverter.rebatchMessage` we check that after we have added the message to the compact ledger and release the message.

### Verifying this change
Modified `RawReaderTest.testBatchingRebatch` to show new contract.

One can run the test described to reproduce the issue, to verify no leak is detected.

(cherry picked from commit f2ec1b4)
### Motivation


Currently, the proxy only works to proxy v1/v2 functions routes to the
function worker.

### Modifications

This changes this code to proxy all routes for the function worker when
those routes match. At the moment this is still a static list of
prefixes, but in the future it may be possible to have this list of
prefixes be dynamically fetched from the REST routes.

### Verifying this change
- added some tests to ensure the routing works as expected

(cherry picked from commit 329e231)
See apache#6416. This change ensures that all futures within BrokerService
have a guranteed timeout. As stated in apache#6416, we see cases where it
appears that loading or creating a topic fails to resolve the future for
unknown reasons. It appears that these futures *may* not be returning.
This seems like a sane change to make to ensure that these futures
finish, however, it still isn't understood under what conditions these
futures may not be returning, so this fix is mostly a workaround for
some underlying issues

Co-authored-by: Addison Higham <ahigham@instructure.com>
(cherry picked from commit 4a4cce9)
…er When Ack Messages . (apache#6498)

### Motivation
Because of apache#6391 , acked messages were counted as unacked messages. 
Although messages from brokers were acknowledged, the following log was output.

```
2020-03-06 19:44:51.790 INFO  ConsumerImpl:174 | [persistent://public/default/t1, sub1, 0] Created consumer on broker [127.0.0.1:58860 -> 127.0.0.1:6650]
my-message-0: Fri Mar  6 19:45:05 2020
my-message-1: Fri Mar  6 19:45:05 2020
my-message-2: Fri Mar  6 19:45:05 2020
2020-03-06 19:45:15.818 INFO  UnAckedMessageTrackerEnabled:53 | [persistent://public/default/t1, sub1, 0] : 3 Messages were not acked within 10000 time

```

This behavior happened on master branch.

(cherry picked from commit 67f8cf3)
…er. (apache#6499)

### Motivation

If the broker service is started, the client can connect to the broker and send requests depends on the namespace service, so we should create the namespace service before starting the broker. Otherwise, NPE occurs.

![image](https://user-images.githubusercontent.com/12592133/76090515-a9961400-5ff6-11ea-9077-cb8e79fa27c0.png)

![image](https://user-images.githubusercontent.com/12592133/76099838-b15db480-6006-11ea-8f39-31d820563c88.png)


### Modifications

Move the namespace service creation and the schema registry service creation before start broker service.

(cherry picked from commit 5285c68)
…access for topic (apache#6504)

Co-authored-by: Sanjeev Kulkarni <sanjeevk@splunk.com>
(cherry picked from commit 36ea153)
Fix apache#6462 
### Motivation
admin api add getLastMessageId return batchIndex

(cherry picked from commit 757824f)
apache#6550)

### Motivation

Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem.

Enable publish rate limit in broker.conf
```
brokerPublisherThrottlingTickTimeMillis=1
brokerPublisherThrottlingMaxByteRate=10000000
```

Use Pulsar perf to test 100 partition message publishing:
```
bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p
```

The test result:
```
10:45:28.844 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    367.8  msg/s ---   1402.9 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007
10:45:38.919 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    456.6  msg/s ---   1741.9 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319
10:45:48.959 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    432.0  msg/s ---   1648.0 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727
10:45:58.996 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    430.6  msg/s ---   1642.6 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551
10:46:09.195 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    403.2  msg/s ---   1538.3 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775
10:46:19.314 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    401.1  msg/s ---   1530.1 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567
10:46:29.389 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    372.8  msg/s ---   1422.0 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199
10:46:39.459 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    374.6  msg/s ---   1429.1 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095
10:46:49.537 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:    295.6  msg/s ---   1127.5 Mbit/s --- failure      0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623
```

Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. 

This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message.

### Does this pull request potentially affect one of the following parts:

*If `yes` was chosen, please highlight the changes*

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API: (no)
  - The schema: (no)
  - The default values of configurations: (no)
  - The wire protocol: (no)
  - The rest endpoints: (no)
  - The admin cli options: (no)
  - Anything that affects deployment: (no)

### Documentation

  - Does this pull request introduce a new feature? (no)

(cherry picked from commit ec31d54)
…over subscription mode. (apache#6558)

Fixes apache#6552

### Motivation

apache#6552 is introduced by apache#5929, so this PR stop increase unacked messages for the consumer with Exclusive/Failover subscription mode.

(cherry picked from commit 2449696)
* Fix: topic with one partition cannot be updated

(cherry picked from commit 9602c9b)
### Motivation

Fixes apache#6561

### Modifications

Initialize `BatchMessageAckerDisabled` with a `new BitSet()` Object.

(cherry picked from commit 2007de6)
@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet