Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyValue schema support for pulsar sql #6325

Merged
merged 8 commits into from Feb 17, 2020
Merged

KeyValue schema support for pulsar sql #6325

merged 8 commits into from Feb 17, 2020

Conversation

gaoran10
Copy link
Contributor

@gaoran10 gaoran10 commented Feb 14, 2020

Fixes #5560

Motivation

Currently, the pulsar sql can't read the keyValue schema data.

Modifications

Add KeyValue schema support for pulsar sql. Add prefix key. for the key field name, add prefix value. for the value field name.

Verifying this change

This change added tests and can be verified as follows:

  • Added unit tests for keyValue schema handler

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (don't know)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (don't know)

Documentation

If the key uses the schema Schema.INT32 or other primitive schemas, the field name like this __key.__value__

if the key uses the struct schema Schema.JSON(User.class) , the field name like this __key.name, __key.age

The value field name format unchanged.

gao.ran added 3 commits February 14, 2020 09:25
…lsar into pulsar-sql-schema-kv

� Conflicts:
�	pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarMetadata.java
�	pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplit.java
�	pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java
�	pulsar-sql/presto-pulsar/src/test/java/org/apache/pulsar/sql/presto/TestPulsarConnector.java
@tuteng tuteng added area/sql Pulsar SQL related features type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages labels Feb 14, 2020
@tuteng tuteng added this to the 2.6.0 milestone Feb 14, 2020
ByteBuf valueByteBuf;
if (Objects.equals(keyValueEncodingType, KeyValueEncodingType.INLINE)) {
dataPayload.resetReaderIndex();
int keyLength2 = dataPayload.readInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int keyLength2 = dataPayload.readInt();
int keyLength = dataPayload.readInt();

int keyLength2 = dataPayload.readInt();
keyByteBuf = dataPayload.readBytes(keyLength2);

int valueLength2 = dataPayload.readInt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int valueLength2 = dataPayload.readInt();
int valueLength = dataPayload.readInt();

Comment on lines 62 to 71
/**
* True if the column is key column handler for KeyValueSchema.
*/
private final boolean key;

/**
* True if the column is value column handler for KeyValueSchema.
*/
private final boolean value;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use a since enum to indicate the column is key column or value column for KeyValueSchema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use an enum instead to them

Comment on lines 437 to 449
Class<PulsarColumnMetadata> clazz = PulsarColumnMetadata.class;
Class<ColumnMetadata> superClazz = ColumnMetadata.class;
Field nameField = null;
Field nameWithCaseField = null;
try {
nameField = superClazz.getDeclaredField("name");
nameField.setAccessible(true);
nameWithCaseField = clazz.getDeclaredField("nameWithCase");
nameWithCaseField.setAccessible(true);
for (ColumnMetadata columnMetadata : columnMetadataList) {
nameField.set(columnMetadata, namePrefix + columnMetadata.getName());
nameWithCaseField.set(columnMetadata, columnMetadata.getName());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be handled in PulsarColumnMetadata.class. If the PulsarColumnMetadata known the column is key or value, straightforward rename it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Comment on lines 426 to 429
if (this.currentMessage.getKey().isPresent()) {
keyByteBuf = Unpooled.wrappedBuffer(
Base64.getDecoder().decode(this.currentMessage.getKey().get()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get the keyBytes by message.getKeyBytes(), so that you don't need to decode a String by base64 decoder.

@skyrocknroll
Copy link
Contributor

/pulsarbot run-failure-checks

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

*
* @return true if the key is base64 encoded, false otherwise
*/
boolean hasBase64EncodedKey();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to named isKeyBase64Encoded().

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, just left some minor comments.

.build();
} catch (Exception e) {
log.error("Create schemaInfo failed!", e);
schemaInfoTemp = SchemaInfo.builder().build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If exception cause when reading schema properties, we'd better throw a runtime exception.

@codelipenghui
Copy link
Contributor

@sijie @jia Please help review this PR.

@codelipenghui
Copy link
Contributor

/pulsarbot run-failure-checks

Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaoran10 this is a great change. I only have one comment regarding the name of fields of the value.

@@ -32,15 +33,20 @@
private String nameWithCase;
private String[] fieldNames;
private Integer[] positionIndices;
private PulsarColumnHandle.HandleKeyValueType handleKeyValueType;
public final static String KEY_SCHEMA_COLUMN_PREFIX = "key.";
public final static String VALUE_SCHEMA_COLUMN_PREFIX = "value.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to add value. for value. We should keep the behavior consistent between messages with keys and without keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it’s better! 👍

Copy link
Contributor Author

@gaoran10 gaoran10 Feb 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use __key as the prefix of the key field name, this makes it easier to distinguish from the ordinary field name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. +1

@codelipenghui codelipenghui merged commit 3cf6be1 into apache:master Feb 17, 2020
kaynewu added a commit to kaynewu/pulsar that referenced this pull request Mar 10, 2020
* [Issue 5904]Support `unload` all partitions of a partitioned topic (apache#6187)

Fixes apache#5904 

### Motivation
Pulsar supports unload a non-partitioned-topic or a partition of a partitioned topic. If there has a partitioned topic with too many partitions, users need to get all partition and unload them one by one. We need to support unload all partition of a partitioned topic.

* [Issue 4175] [pulsar-function-go] Create integration tests for Go Functions for production-readiness (apache#6104)

This PR is to provide integration tests that test execution of Go functions that are managed by the Java FunctionManager. This will allow us to test things like behavior during function timeouts, heartbeat failures, and other situations that can only be effectively tested in an integration test. 

Master issue: apache#4175
Fixes issue: apache#6204 

### Modifications

We must add Go to the integration testing logic. We must also build the Go dependencies into the test Dockerfile to ensure the Go binaries are available at runtime for the integration tests.

* [Issue 5999] Support create/update tenant with empty cluster (apache#6027)

### Motivation

Fixes apache#5999

### Modifications

Add the logic to handle the blank cluster name.

* Introduce maxMessagePublishBufferSizeInMB configuration to avoid broker OOM (apache#6178)

Motivation
Introduce maxMessagePublishBufferSizeInMB configuration to avoid broker OOM.

Modifications
If the processing message size exceeds this value, the broker will stop read data from the connection. When available size > half of the maxMessagePublishBufferSizeInMB, start auto-read data from the connection.

* Enable get precise backlog and backlog without delayed messages. (apache#6310)

Fixes apache#6045 apache#6281 

### Motivation

Enable get precise backlog and backlog without delayed messages.

### Verifying this change

Added new unit tests for the change.

* KeyValue schema support for pulsar sql (apache#6325)

Fixes apache#5560

### Motivation

Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema.

### Modifications

Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name.

* Avoid get partition metadata while the topic name is a partition name. (apache#6339)

Motivation

To avoid get partition metadata while the topic name is a partition name.
Currently, if users want to skip all messages for a partitioned topic or unload a partitioned topic, the broker will call get topic metadata many times. For a topic with the partition name, it is not necessary to call get partitioned topic metadata again.

* explicit statement env 'BOOKIE_MEM' and 'BOOKIE_GC' for values-mini.yaml (apache#6340)

Fixes apache#6338

### Motivation
This commit started while I was using helm in my local minikube, noticed that there's a mismatch between `values-mini.yaml` and `values.yaml` files. At first I thought it was a copy/paste error. So I created apache#6338;

Then I looked into the details how these env-vars[ were used](https://github.com/apache/pulsar/blob/28875d5abc4cd13a3e9cc4f59524d2566d9f9f05/conf/bkenv.sh#L36), found out its ok to use `PULSAR_MEM` as an alternative. But it introduce problems:
1. Since `BOOKIE_GC` was not defined , the default [BOOKIE_EXTRA_OPTS](https://github.com/apache/pulsar/blob/28875d5abc4cd13a3e9cc4f59524d2566d9f9f05/conf/bkenv.sh#L39)  will finally use default value of `BOOKIE_GC`, thus would cover same the JVM parameters defined prior in `PULSAR_MEM`.
2. May cause problems when bootstrap scripts changed in later dev, better to make it explicitly.

So I create this pr to solve above problems(hidden trouble).

### Modifications

As mentioned above, I've made such modifications below:
1. make `BOOKIE_MEM` and `BOOKIE_GC` explicit in `values-mini.yaml` file.  Keep up with the format in`values.yaml` file.
2. remove all  print-gc-logs related args. Considering the resource constraints of minikube environment. The removed part's content is `-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest`
3. leave `PULSAR_PREFIX_dbStorage_rocksDB_blockCacheSize` empty as usual, as [conf/standalone.conf#L576](https://github.com/apache/pulsar/blob/df152109415f2b10dd83e8afe50d9db7ab7cbad5/conf/standalone.conf#L576) says it would to use 10% of the direct memory size by default.

* Fix java doc for key shared policy. (apache#6341)

The key shared policy does not support setting the maximum key hash range, so fix the java doc.

* client: make SubscriptionMode a member of ConsumerConfigurationData (apache#6337)

Currently, SubscriptionMode is a parameter to create ConsumerImpl, but it is not exported out, and user could not set this value for consumer.  This change tries to make SubscriptionMode a member of ConsumerConfigurationData, so user could set this parameter when create consumer.

* Windows CMake corrections (apache#6336)

* Corrected method of specifying Windows path to LLVM tools

* Fixing windows build

* Corrected the dll install path

* Fixing pulsarShared paths

* remove future.join() from PulsarSinkEffectivelyOnceProcessor (apache#6361)

* use checkout@v2 to avoid fatal: reference is not a tree (apache#6386)

"fatal: reference is not a tree" is a known issue in actions/checkout#23 and fixed in checkout@v2, update checkout used in GitHub actions.

* [Pulsar-Client] Stop shade snappy-java in pulsar-client-shaded (apache#6375)

Fixes apache#6260 

Snappy, like other compressions (LZ4, ZSTD), depends on native libraries to do the real encode/decode stuff. When we shade them in a fat jar, only the java implementations of snappy class are shaded, however, left the JNI incompatible with the underlying c++ code.

We should just remove the shade for snappy, and let maven import its lib as a dependency.

I've tested the shaded jar locally generated by this pr, it works for all compression codecs.

* Fix CI not triggered (apache#6397)

In apache#6386 , checkout@v2 is brought in for checkout.

However, it's checking out PR merge commit by default, therefore breaks diff-only action which looking for commits that a PR is based on. And make all tests skipped.

This PR fixes this issue. and has been proven to work with apache#6396 Brokers/unit-tests.

* [Issue 6355][HELM] autorecovery - could not find or load main class (apache#6373)

This applies the recommended fix from
apache#6355 (comment)

Fixes apache#6355

### Motivation

This PR corrects the configmap data which was causing the autorecovery pod to crashloop
with `could not find or load main class`

### Modifications

Updated the configmap var data per [this comment](apache#6355 (comment)) from @sijie

* Creating a topic does not wait for creating cursor of replicators (apache#6364)

### Motivation

Creating a topic does not wait for creating cursor of replicators

## Verifying this change

The exists unit test can cover this change

* [Reader] Should set either start message id or start message from roll back duration. (apache#6392)

Currently, when constructing a reader, users can set both start message id and start time. 

This is strange and the behavior should be forbidden.

* Seek to the first one >= timestamp (apache#6393)

The current logic for `resetCursor` by timestamp is odd. The first message it returns is the last message earlier or equal to the designated timestamp. This "earlier" message should be avoided to emit.

* [Minor] Fix java code errors reported by lgtm.  (apache#6398)

Four kinds of errors are fixed in this PR:

- Array index out of bounds
- Inconsistent equals and hashCode
- Missing format argument
- Reference equality test of boxed types

According to https://lgtm.com/projects/g/apache/pulsar/alerts/?mode=tree&severity=error&id=&lang=java

* [Java Reader Client] Start reader inside batch result in read first message in batch. (apache#6345)

Fixes apache#6344 
Fixes apache#6350

The bug was brought in apache#5622 by changing the skip logic wrongly.

* Fix broker to specify a list of bookie groups. (apache#6349)

### Motivation

Fixes apache#6343

### Modifications

Add a method to cast object value to `String`.

* Fixed enum package not found (apache#6401)

Fixes apache#6400

### Motivation
This problem is blocking the current test. 1.1.8 version of `enum34` seems to have some problems, and the problem reproduces:

Use pulsar latest code:
```
cd pulsar
mvn clean install -DskipTests
dokcer pull apachepulsar/pulsar-build:ubuntu-16.04
docker run -it -v $PWD:/pulsar --name pulsar apachepulsar/pulsar-build:ubuntu-16.04 /bin/bash
docker exec -it pulsar /bin/bash
cmake .
make -j4 && make install 
cd python
python setup.py bdist_wheel
pip install dist/pulsar_client-*-linux_x86_64.whl
```
`pip show enum34`
```
Name: enum34
Version: 1.1.8
Summary: Python 3.4 Enum backported to 3.3, 3.2, 3.1, 2.7, 2.6, 2.5, and 2.4
Home-page: https://bitbucket.org/stoneleaf/enum34
Author: Ethan Furman
Author-email: ethan@stoneleaf.us
License: BSD License
Location: /usr/local/lib/python2.7/dist-packages
Requires:
Required-by: pulsar-client, grpcio
```

```
root@55e06c5c770f:/pulsar/pulsar-client-cpp/python# python
Python 2.7.12 (default, Oct  8 2019, 14:14:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from enum import Enum, EnumMeta
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named enum
>>> exit()
```

There is no problem with using 1.1.9 in the test.

### Modifications

* Upgrade enum34 from 1.1.8 to 1.1.9

### Verifying this change

local test pass

* removed comma from yaml config (apache#6402)

* Fix broker client tls settings error (apache#6128)

when broker create the inside client, it sets tlsTrustCertsFilePath as "getTlsCertificateFilePath()", but it should be "getBrokerClientTrustCertsFilePath()"

* [Issue 3762][Schema] Fix the problem with parsing of an Avro schema related to shading in pulsar-client. (apache#6406)

Motivation
Avro schemas are quite important for proper data flow and it is a pity that the apache#3762 issue stayed untouched for so long. There were some workarounds on how to make Pulsar use an original avro schema, but in the end, it is pretty hard to run an enterprise solution on workarounds. With this PR I would like to find a solution to the problem caused by shading avro in pulsar-client. As it was discussed in the issue, there are two possible solutions for this problem:

Unshade the avro library in the pulsar-client library. (IMHO it seems like a proper solution for this problem, but it also brings a risk of unknown side-effects)
Use reflection to get original schemas from generated classes. (I went for this solution)
Could you please comment if this is a proper solution for the problem? I will add tests when my approach will be confirmed.

Modifications
First, we try to extract an original avro schema from the "$SCHEMA" field using reflection. If it doesn't work, the process falls back generation of the schema from POJO.

* Remove duplicated lombok annotations in the tests module (apache#6412)

* Add verification for SchemaDefinitionBuilderImpl.java (apache#6405)

### Motivation

Add verification for SchemaDefinitionBuilderImpl.java

### Verifying this change

Added a new unit test.

* Cleanup pom files in the tests module (apache#6421)

### Modifications

- Removed dependencies on test libraries that were already imported in the parent pom file.

- Removed groupId tags that are inherited from the parent pom file.

* Update BatchReceivePolicy.java (apache#6423)

BatchReceivePolicy implements Serializable.

* Consumer received duplicated deplayed messages upon restart

Fix when send a delayed message ,there is a case when a consumer restarts and pull duplicate messages. apache#6403

* Bump netty version to 4.1.45.Final (apache#6424)

netty 4.1.43 has a bug preventing it from using Linux native Epoll transport

This results in pulsar brokers failing over to NioEventLoopGroup even when running on Linux.

The bug is fixed in netty releases 4.1.45.Final

* Fix publish buffer limit does not take effect

Motivation
If set up maxMessagePublishBufferSizeInMB > Integer.MAX_VALUE / 1024 / 1024, the publish buffer limit does not take effect. The reason is maxMessagePublishBufferBytes always 0 when use following calculation method :

pulsar.getConfiguration().getMaxMessagePublishBufferSizeInMB() * 1024 * 1024;
So, changed to

pulsar.getConfiguration().getMaxMessagePublishBufferSizeInMB() * 1024L * 1024L;

* doc: Add on the missing right parenthesis (apache#6426)

* Add on the missing right parenthesis

doc: Missing right parenthesis in the `token()` line from Pulsar Client Java Code.

* Add on the missing right parenthesis on line L70

* Switch from deprecated MAINTAINER tag to LABEL with maintainer's info in Dockerfile (apache#6429)

Motivation & Modification
The MAINTAINER instruction is deprecated in favor of the LABEL instruction with the maintainer's info in docker files.

* Amend the default value of . (apache#6374)

* fix the bug of authenticationData is't initialized. (apache#6440)

Motivation
fix the bug of authenticationData is't initialized.

the method org.apache.pulsar.proxy.server.ProxyConnection#handleConnect can't init the value of authenticationData.
cause of the bug that you will get the null value form the method org.apache.pulsar.broker.authorization.AuthorizationProvider#canConsumeAsync
when implements org.apache.pulsar.broker.authorization.AuthorizationProvider interface.

Modifications
init the value of authenticationData from the method org.apache.pulsar.proxy.server.ProxyConnection#handleConnect.

Verifying this change
implements org.apache.pulsar.broker.authorization.AuthorizationProvider interface, and get the value of authenticationData.

* Remove duplicated test libraries in POM dependencies (apache#6430)

### Motivation
The removed test libraries were already defined in the parent pom

### Modification
Removed duplicated test libraries in POM dependencies

* Add a message on how to make log refresh immediately when starting a component (apache#6078)

### Motivation

Some users may confuse by pulsar/bookie log without flushing immediately.

### Modifications

Add a message in `bin/pulsar-daemon` when starting a component.

* Close ZK before canceling future with exception (apache#6228) (apache#6399)

Fixes apache#6228

* [Flink-Connector]Get PulsarClient from cache should always return an open instance (apache#6436)

* Update sidebars.json (apache#6434)

The referenced markdown files do not exist and so the "Next" and "Previous" buttons on the bottom of pages surrounding them result in 404 Not Found errors

* [Broker] Create namespace failed when TLS is enabled in PulsarStandalone (apache#6457)

When starting Pulsar in standalone mode with TLS enabled, it will fail to create two namespaces during start. 

This is because it's using the unencrypted URL/port while constructing the PulsarAdmin client.

* Update version-2.5.0-sidebars.json (apache#6455)

The referenced markdown files do not exist and so the "Next" and "Previous" buttons on the bottom of pages surrounding them result in 404 Not Found errors

* [Issue 6168] Fix Unacked Message Tracker by Using Time Partition on C++ (apache#6391)

### Motivation
Fix apache#6168 .
>On C++ lib, like the following log, unacked messages are redelivered after about 2 * unAckedMessagesTimeout.

### Modifications
As same apache#3118, by using TimePartition, fixed ` UnackedMessageTracker` .
- Add `TickDurationInMs`
- Add `redeliverUnacknowledgedMessages` which require `MessageIds` to `ConsumerImpl`, `MultiTopicsConsumerImpl` and `PartitionedConsumerImpl`.

* [ClientAPI]Fix hasMessageAvailable() (apache#6362)

Fixes apache#6333 

Previously, `hasMoreMessages` is test against:
```
return lastMessageIdInBroker.compareTo(lastDequeuedMessage) == 0
                && incomingMessages.size() > 0;
```
However, the `incomingMessages` could be 0 when the consumer/reader has just started and hasn't received any messages yet. 

In this PR, the last entry is retrieved and decoded to get message metadata. for the batchIndex field population.

* Use System.nanoTime() instead of System.currentTimeMillis() (apache#6454)

Fixes apache#6453 

### Motivation
`ConsumerBase` and `ProducerImpl` use `System.currentTimeMillis()` to measure the elapsed time in the 'operations' inner classes (`ConsumerBase$OpBatchReceive` and `ProducerImpl$OpSendMsg`).

An instance variable `createdAt` is initialized with `System.currentTimeMills()`, but it is not used for reading wall clock time, the variable is only used for computing elapsed time (e.g. timeout for a batch).

When the variable is used to compute elapsed time, it would more sense to use `System.nanoTime()`.

### Modifications

The instance variable `createdAt` in `ConsumerBase$OpBatchReceive` and  `ProducerImpl$OpSendMsg` is initialized with `System.nanoTime()`. Usage of the variable is updated to reflect that the variable holds nano time; computations of elapsed time takes the difference between the current system nano time and the `createdAt` variable.

The `createdAt` field is package protected, and is currently only used in the declaring class and outer class, limiting the chances for unwanted side effects.

* Fixed the max backoff configuration for lookups (apache#6444)

* Fixed the max backoff configuration for lookups

* Fixed test expectation

* More test fixes

* upgrade scala-maven-plugin to 4.1.0 (apache#6469)

### Motivation
The Pulsar examples include some third-party libraries with security vulnerabilities.
- log4j-core-2.8.1
https://www.cvedetails.com/cve/CVE-2017-5645

### Modifications

- Upgraded the version of scala-maven-plugin from 4.0.1 to 4.1.0. log4j-core-2.8.1 were installed because scala-maven-plugin depends on it.

* [pulsar-proxy] fix logging for published messages (apache#6474)

### Motivation
Proxy-logging fetches incorrect producerId for `Send` command because of that logging always gets producerId as 0 and it fetches invalid topic name for the logging.

### Modification
Fixed topic logging by fetching correct producerId for `Send` command.

* [Issue 6394] Add configuration to disable auto creation of subscriptions (apache#6456)

### Motivation

Fixes apache#6394

### Modifications

- provide a flag `allowAutoSubscriptionCreation` in `ServiceConfiguration`, defaults to `true`
- when `allowAutoSubscriptionCreation` is disabled and the specified subscription (`Durable`) on the topic does not exist when trying to subscribe via a consumer, the server should reject the request directly by `handleSubscribe` in `ServerCnx`
- create the subscription on the coordination topic if it does not exist when init `WorkerService`

* Make tests more stable by using JSONAssert equals (apache#6435)

Similar to the change you already merged for AvroSchemaTest.java(apache#6247):
`jsonSchema.getSchemaInfo().getSchema()` in `pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/JSONSchemaTest.java` returns a JSON object. `schemaJson` compares with hard-coded JSON String. However, the order of entries in `schemaJson` is not guaranteed. Similarly, test `testKeyValueSchemaInfoToString` in `pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/KeyValueSchemaInfoTest.java` returns a JSON object. `havePrimitiveType` compares with hard-coded JSON String, and the order of entries in `havePrimitiveType` is not guaranteed.


This PR proposes to use JSONAssert and modify the corresponding JSON test assertions so that the test is more stable.

### Motivation

Using JSONAssert and modifying the corresponding JSON test assertions so that the test is more stable.

### Modifications

Adding `assertJSONEqual` method and replacing `assertEquals` with it in tests `testAllowNullSchema`, `testNotAllowNullSchema` and `testKeyValueSchemaInfoToString`.

* Avoid calling ConsumerImpl::redeliverMessages() when message list is empty (apache#6480)

* [pulsar-client] fix deadlock on send failure (apache#6488)

* Enhance Authorization by adding TenantAdmin interface (apache#6487)

* Enhance Authorization by adding TenantAdmin interface

* Remove debugging comment

Co-authored-by: Sanjeev Kulkarni <sanjeevk@splunk.com>

* Independent schema is set for each consumer generated by topic (apache#6356)

### Motivation

Master Issue: apache#5454 

When one Consumer subscribe multi topic, setSchemaInfoPorvider() will be covered by the consumer generated by the last topic.

### Modification
clone schema for each consumer generated by topic.
### Verifying this change
Add the schemaTest for it.

* Fix memory leak when running topic compaction. (apache#6485)


Fixes apache#6482

### Motivation
Prevent topic compaction from leaking direct memory

### Modifications

Several leaks were discovered using Netty leak detection and code review.
* `CompactedTopicImpl.readOneMessageId` would get an `Enumeration` of `LedgerEntry`, but did not release the underlying buffers. Fix: iterate though the `Enumeration` and release underlying buffer. Instead of logging the case where the `Enumeration` did not contain any elements, complete the future exceptionally with the message (will be logged by Caffeine).
* Two main sources of leak in `TwoPhaseCompactor`. The `RawBacthConverter.rebatchMessage` method failed to close/release a `ByteBuf` (uncompressedPayload). Also, the return ByteBuf of `RawBacthConverter.rebatchMessage` was not closed. The first one was easy to fix (release buffer), to fix the second one and make the code easier to read, I decided to not let `RawBacthConverter.rebatchMessage`  close the message read from the topic, instead the message read from the topic can be closed in a try/finally clause surrounding most of the method body handing a message from a topic (in phase two loop). Then if a new message was produced by `RawBacthConverter.rebatchMessage` we check that after we have added the message to the compact ledger and release the message.

### Verifying this change
Modified `RawReaderTest.testBatchingRebatch` to show new contract.

One can run the test described to reproduce the issue, to verify no leak is detected.

* Fix create partitioned topic with a substring of an existing topic name. (apache#6478)

Fixes apache#6468

Fix create a partitioned topic with a substring of an existing topic name. And make create partitioned topic async.

* Bump jcloud version to 2.2.0 and remove jcloud-shade module (apache#6494)

In jclouds 2.2.0, the [gson is shaded internally](https://issues.apache.org/jira/browse/JCLOUDS-1166). We could safely remove the jcloud-shade module as a cleanup.

* Refactor tests in pulsar client tools test (apache#6472)

### Modifications

The main modification was the reduction of repeated initialization of the variables in the tests.

* Fix Topic metrics documentation (apache#6495)

### Motivation

*Explain here the context, and why you're making that change. What is the problem you're trying to solve.*

Motivation is to have correct reference-metrics documentation.

### Modifications

*Describe the modifications you've done.*

There is an error in the `Topic metrics` section

`pulsar_producers_count` => `pulsar_in_messages_total`

* [pulsar-client] remove duplicate cnx method (apache#6490)

### Motivation
Remove duplicate `cnx()` method for `producer`

* [proxy] Fix proxy routing to functions worker (apache#6486)

### Motivation


Currently, the proxy only works to proxy v1/v2 functions routes to the
function worker.

### Modifications

This changes this code to proxy all routes for the function worker when
those routes match. At the moment this is still a static list of
prefixes, but in the future it may be possible to have this list of
prefixes be dynamically fetched from the REST routes.

### Verifying this change
- added some tests to ensure the routing works as expected

* Fix some async method problems at PersistentTopicsBase. (apache#6483)

* Instead of always using admin access for topic, use read/write/admin access for topic (apache#6504)

Co-authored-by: Sanjeev Kulkarni <sanjeevk@splunk.com>

* [Minor]Remove unused property from pom (apache#6500)

This PR is a follow-up of apache#6494

* [pulsar-common] Remove duplicate RestException references (apache#6475)

### Motivation
Right now, various pulsar-modules have duplicate `RestException` class  and repo has multiple duplicate class. So, move `RestException` to common place and all modules should use the same Exception class to avoid duplicate classes.

* pulsar-proxy: fix correct name for proxy thread executor name (apache#6460)

### Motivation
fix correct name for proxy thread executor name

* Add subscribe initial position for consumer cli. (apache#6442)

### Motivation

In some case, users expect to consume messages from beginning similar to the option `--from-beginning` of kafka consumer CLI. 

### Modifications

Add `--subscription-position` for `pulsar-client` and `pulsar-perf`.

* [Cleanup] Log format does not match arguments (apache#6509)

* Start namespace service and schema registry service before start broker. (apache#6499)

### Motivation

If the broker service is started, the client can connect to the broker and send requests depends on the namespace service, so we should create the namespace service before starting the broker. Otherwise, NPE occurs.

![image](https://user-images.githubusercontent.com/12592133/76090515-a9961400-5ff6-11ea-9077-cb8e79fa27c0.png)

![image](https://user-images.githubusercontent.com/12592133/76099838-b15db480-6006-11ea-8f39-31d820563c88.png)


### Modifications

Move the namespace service creation and the schema registry service creation before start broker service.

* [pulsar-client-cpp] Fix Redelivery of Messages on UnackedMessageTracker When Ack Messages . (apache#6498)

### Motivation
Because of apache#6391 , acked messages were counted as unacked messages. 
Although messages from brokers were acknowledged, the following log was output.

```
2020-03-06 19:44:51.790 INFO  ConsumerImpl:174 | [persistent://public/default/t1, sub1, 0] Created consumer on broker [127.0.0.1:58860 -> 127.0.0.1:6650]
my-message-0: Fri Mar  6 19:45:05 2020
my-message-1: Fri Mar  6 19:45:05 2020
my-message-2: Fri Mar  6 19:45:05 2020
2020-03-06 19:45:15.818 INFO  UnAckedMessageTrackerEnabled:53 | [persistent://public/default/t1, sub1, 0] : 3 Messages were not acked within 10000 time

```

This behavior happened on master branch.

* [pulsar-proxy] fixing data-type of logging-level (apache#6476)

### Modification
`ProxyConfig` has wrapper method for `proxyLogLevel` to present `Optional` data-type. after apache#3543 we can define config param as optional without creating wrapper methods.

* [pulsar-broker] recover zk-badversion while updating cursor metadata (apache#5604)

fix test

Co-authored-by: ltamber <ltamber12@gmail.com>
Co-authored-by: Devin Bost <devinbost@users.noreply.github.com>
Co-authored-by: Fangbin Sun <sunfangbin@gmail.com>
Co-authored-by: lipenghui <penghui@apache.org>
Co-authored-by: ran <gaoran_10@126.com>
Co-authored-by: liyuntao <liyuntao58607@gmail.com>
Co-authored-by: Jia Zhai <zhaijia@apache.org>
Co-authored-by: Nick Rivera <heronr@users.noreply.github.com>
Co-authored-by: Neng Lu <freeneng@gmail.com>
Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com>
Co-authored-by: John Harris <jharris-@users.noreply.github.com>
Co-authored-by: guangning <guangning@apache.org>
Co-authored-by: newur <ruwen.reddig@gmail.com>
Co-authored-by: Sergii Zhevzhyk <vzhikserg@users.noreply.github.com>
Co-authored-by: liudezhi <33149602+liudezhi2098@users.noreply.github.com>
Co-authored-by: Dzmitry Kazimirchyk <dzmitryk@users.noreply.github.com>
Co-authored-by: futeng <ifuteng@gmail.com>
Co-authored-by: bilahepan <YTgaotianci@gmail.com>
Co-authored-by: Paweł Łoziński <pawel.lozinski@gmail.com>
Co-authored-by: Ryan Slominski <ryans@jlab.org>
Co-authored-by: k2la <mzq6mft9zz@gmail.com>
Co-authored-by: Rolf Arne Corneliussen <racorn@users.noreply.github.com>
Co-authored-by: Matteo Merli <mmerli@apache.org>
Co-authored-by: Sijie Guo <sijie@apache.org>
Co-authored-by: Rajan Dhabalia <rdhabalia@apache.org>
Co-authored-by: Sanjeev Kulkarni <sanjeevrk@gmail.com>
Co-authored-by: Sanjeev Kulkarni <sanjeevk@splunk.com>
Co-authored-by: congbo <39078850+congbobo184@users.noreply.github.com>
Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
Co-authored-by: Addison Higham <addisonj@gmail.com>
@tuteng
Copy link
Member

tuteng commented Mar 21, 2020

Add label 2.5.1, due to #6361 dependence

tuteng pushed a commit to AmateurEvents/pulsar that referenced this pull request Mar 21, 2020
Fixes apache#5560

### Motivation

Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema.

### Modifications

Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name.

(cherry picked from commit 3cf6be1)
tuteng pushed a commit that referenced this pull request Apr 6, 2020
Fixes #5560

### Motivation

Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema.

### Modifications

Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name.

(cherry picked from commit 3cf6be1)
tuteng pushed a commit that referenced this pull request Apr 13, 2020
Fixes #5560

### Motivation

Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema.

### Modifications

Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name.

(cherry picked from commit 3cf6be1)
jiazhai pushed a commit to jiazhai/pulsar that referenced this pull request May 18, 2020
Fixes apache#5560

### Motivation

Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema.

### Modifications

Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name.
(cherry picked from commit 3cf6be1)
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
Fixes apache#5560

### Motivation

Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema.

### Modifications

Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql Pulsar SQL related features release/2.5.1 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyValue schema support in Pulsar SQL
5 participants