PIP 83 : Pulsar client: Message consumption with pooled buffer #10184

rdhabalia · 2021-04-10T02:38:24Z

Motivation

Pulsar client library should provide support for an API to allow the application to access message payload from pooled buffers. The library has to also provide an associated release API, to release and deallocate pooled buffers used by the message.

Modification

add support of pool message in consumer
add Message::release() api
add pool-message functionality support to client-tools for testing

eolivelli

overall it looks good to me.

I left one comment in Schema API, PTAL

eolivelli · 2021-04-10T05:59:02Z

pulsar-client-api/src/main/java/org/apache/pulsar/client/api/Schema.java

+     * @return the deserialized object
+     */
+    default T decode(ByteBuf bytes, byte[] schemaVersion) {
+        return null;


can you please write the default implementation, that calls the other decode method ?

otherwise custom schema may break if in the future we are going to relay more on this method

custom schema will not break, MessageImpl already handles and calls default decode. calling default decode from here will require ByteBuf conversion to byte[] which should be handled at top level and in this case we are handling at MessageImpl

if you look at this from Pulsar point of view you are right.
but sometimes in your application or Pulsar IO Sink you have a Schema instance and you want to decode a given payload using the Schema.
In this case if you see that there is a decode(ByteBuf) method and you already have a ByteBuf you are tempted to use this new method, but it is not implemented for most of the system Schema and it is absolutely not implemented for new custom Schema.

this is why I believe that we should provide a default implementation that relies on the fact that the decode(byte[]....) method is always implemented.

the default implementation is straightforward and I believe it is worth to add it

eolivelli · 2021-04-10T06:00:19Z

pulsar-client-tools/src/main/java/org/apache/pulsar/client/cli/CmdConsume.java

@@ -171,6 +174,12 @@ private String interpretMessage(Message<?> message, boolean displayHex) throws I
        } else if (value instanceof GenericRecord) {
            Map<String, Object> asMap = genericRecordToMap((GenericRecord) value);
            data = asMap.toString();
+        } else if (value instanceof ByteBuffer) {
+            ByteBuffer payload = ((ByteBuffer)value);


what about using the internal array of the ByteBuffer in case of "hasArray", offset=0 and len == remaining? I did the same in BytesKafkaSource
this way we can save a copy.

merlimat · 2021-04-10T18:33:16Z

pulsar-client-api/src/main/java/org/apache/pulsar/client/api/Message.java

@@ -66,6 +66,13 @@
     */
    byte[] getData();

+    /**
+     * Get the message payload size in bytes.
+     * 


We should specify wether it's compressed or uncompressed size.

merlimat · 2021-04-10T18:34:15Z

pulsar-client-api/src/main/java/org/apache/pulsar/client/api/Schema.java

+     *            the schema version to decode the object. null indicates using latest version.
+     * @return the deserialized object
+     */
+    default T decode(ByteBuf bytes, byte[] schemaVersion) {


Since Schema is also part of public API, we can't use Netty ByteBuf. Instead we can use java.nio.ByteBuffer

I introduced ByteBuf earlier as AbstractSchema already has the same API so, it won't impact other existing Schema implementation as all existing impl are extending AbstracSchema.
But I agree. Schema is part of public API and it should not have ByteBuf. So, changed to ByteBuffer

merlimat · 2021-04-10T18:35:06Z

pulsar-client-tools/src/main/java/org/apache/pulsar/client/cli/CmdConsume.java

@@ -122,7 +123,9 @@
    @Parameter(names = { "-st", "--schema-type"}, description = "Set a schema type on the consumer, it can be 'bytes' or 'auto_consume'")
    private String schematype = "bytes";

-
+    @Parameter(names = { "-pm", "--pool-messages" }, description = "Use the pooled message")
+    private boolean poolMessages = false;


I'd say to not have it configurable for the tool. Just enabled it always.

I changed default value to true. let's keep this configuration for a release because for a small memory size client-host, they might want to use heap-memory as GC can help to make some space when internal queue size tuning is not right and all the memory is used.

merlimat · 2021-04-10T19:46:42Z

pulsar-testclient/src/main/java/org/apache/pulsar/testclient/PerformanceConsumer.java

@@ -167,6 +172,9 @@

        @Parameter(names = {"--batch-index-ack" }, description = "Enable or disable the batch index acknowledgment")
        public boolean batchIndexAck = false;
+
+        @Parameter(names = { "-pm", "--pool-messages" }, description = "Use the pooled message")
+        private boolean poolMessages = false;


Again, for tools, I'd keep it always enabled.

merlimat · 2021-04-10T22:51:56Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ByteBufferSchema.java

        if (null == byteBuf) {
            return null;
+        } else if(byteBuf.isDirect()){
+            return byteBuf.nioBuffer();


We can return the NIO buffer even if it's on the heap

merlimat · 2021-04-10T22:54:00Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageImpl.java

+            msg.payload = payload;
+            payload.retain();


Suggested change

msg.payload = payload;

payload.retain();

msg.payload = payload.retain();

merlimat · 2021-04-10T22:55:33Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageImpl.java

+        msg.cnx = cnx;
+        msg.redeliveryCount = redeliveryCount;
+
+        msg.poolMessage = poolMessage;


This portion is being repeated from the other init() method. Is there a way to reuse that?

merlimat · 2021-04-10T22:59:12Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageImpl.java

@@ -310,6 +378,11 @@ public boolean publishedEarlierThan(long timestamp) {
        if (msgMetadata.isNullValue()) {
            return null;
        }
+        if (poolMessage) {


Why do we need the special case here?

merlimat · 2021-04-10T23:00:15Z

pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageImpl.java

+        if (msgMetadata.isNullValue()) {
+            return 0;
+        }
+        return poolMessage ? payload.readableBytes() : getData().length;


Shouldn't payload.readableBytes() work in both cases?

fix api, buffer-access, duplicate code

eolivelli

LGTM

great work

rdhabalia · 2021-04-20T18:04:08Z

@eolivelli @merlimat addressed the feedback.

eolivelli · 2021-04-20T18:19:28Z

I have already approved.
Awesome work

rdhabalia added the area/client label Apr 10, 2021

rdhabalia added this to the 2.8.0 milestone Apr 10, 2021

rdhabalia requested review from merlimat, sijie, jiazhai, codelipenghui and nkurihar April 10, 2021 02:38

rdhabalia self-assigned this Apr 10, 2021

eolivelli requested changes Apr 10, 2021

View reviewed changes

merlimat reviewed Apr 10, 2021

View reviewed changes

rdhabalia force-pushed the consumer_direct branch from a5159c1 to 910354c Compare April 20, 2021 07:00

PIP 83 : Pulsar client: Message consumption with pooled buffer

ab537da

fix api, buffer-access, duplicate code

rdhabalia force-pushed the consumer_direct branch from 910354c to ab537da Compare April 20, 2021 07:03

eolivelli approved these changes Apr 20, 2021

View reviewed changes

merlimat approved these changes Apr 20, 2021

View reviewed changes

rdhabalia merged commit ef06691 into apache:master Apr 20, 2021

rdhabalia deleted the consumer_direct branch April 20, 2021 23:26

Lanayx mentioned this pull request Jun 26, 2021

Add support for pooled memory buffers in consumer messages fsprojects/pulsar-client-dotnet#171

Open

rdhabalia mentioned this pull request Aug 19, 2021

PIP 83 : Pulsar Reader: Message consumption with pooled buffer #11725

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PIP 83 : Pulsar client: Message consumption with pooled buffer #10184

PIP 83 : Pulsar client: Message consumption with pooled buffer #10184

rdhabalia commented Apr 10, 2021

eolivelli left a comment

eolivelli Apr 10, 2021

rdhabalia Apr 10, 2021

eolivelli Apr 10, 2021

eolivelli Apr 10, 2021

merlimat Apr 10, 2021

merlimat Apr 10, 2021

rdhabalia Apr 20, 2021

merlimat Apr 10, 2021

rdhabalia Apr 20, 2021

merlimat Apr 10, 2021

merlimat Apr 10, 2021

merlimat Apr 10, 2021

merlimat Apr 10, 2021

merlimat Apr 10, 2021

merlimat Apr 10, 2021

eolivelli left a comment

rdhabalia commented Apr 20, 2021

eolivelli commented Apr 20, 2021

	msg.payload = payload;
	payload.retain();
	msg.payload = payload.retain();

PIP 83 : Pulsar client: Message consumption with pooled buffer #10184

PIP 83 : Pulsar client: Message consumption with pooled buffer #10184

Conversation

rdhabalia commented Apr 10, 2021

Motivation

Modification

eolivelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eolivelli left a comment

Choose a reason for hiding this comment

rdhabalia commented Apr 20, 2021

eolivelli commented Apr 20, 2021