Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[client] Introduce batchingMaxBytes setting in pulsar producer #5045

Merged
merged 14 commits into from
Nov 15, 2019

Conversation

sijie
Copy link
Member

@sijie sijie commented Aug 26, 2019

Motivation

The message size can vary between applications. Using number of messages to estimate
the resources used for batching leads to unpredictability.

This pull request introduces a new setting batchingMaxBytes in pulsar producer.
It allows applications planning the resource usage for batching in a better way.

*Motivation*

The message size can vary between applications. Using number of messages to estimate
the resources used for batching leads to unpredictability.

This pull request introduces a new setting `batchingMaxBytes` in pulsar producer.
It allows applications planning the resource usage for batching in a better way.
@sijie sijie added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/client labels Aug 26, 2019
@sijie sijie added this to the 2.5.0 milestone Aug 26, 2019
@sijie sijie self-assigned this Aug 26, 2019
@sijie sijie requested a review from wolfstudy August 26, 2019 22:52
@@ -212,6 +212,12 @@ private ProducerBuilderImpl(PulsarClientImpl client, ProducerConfigurationData c
return this;
}

@Override
public ProducerBuilder<T> batchingMaxBytes(int batchingMaxBytes) {
conf.setBatchingMaxBytes(batchingMaxBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say to do validation check here for > 0 and < MAX_FRAME_SIZE (depending on the max frame size configured)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the check for >0 to allow users choosing the right batching strategy.

I will add the check for MAX_FRAME_SIZE.

@@ -83,5 +89,9 @@ public void setProducer(ProducerImpl<?> producer) {
.convertToWireProtocol(producer.getConfiguration().getCompressionType());
this.compressor = CompressionCodecProvider.getCompressionCodec(compressionType);
this.maxNumMessagesInBatch = producer.getConfiguration().getBatchingMaxMessages();
this.maxBytesInBatch = producer.getConfiguration().getBatchingMaxBytes();
if (this.maxBytesInBatch <= 0) {
this.maxBytesInBatch = MAX_MESSAGE_BATCH_SIZE_BYTES;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant MAX_MESSAGE_BATCH_SIZE_BYTES is set to 128 KB which is now different from the default max size (1MB).

In any case, I'd prefer the sanity check to be in the builder itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't realize MAX_MESSAGE_BATCH_SIZE_BYTES is 128KB. will fix

@sijie
Copy link
Member Author

sijie commented Aug 26, 2019

@merlimat

I fixed the batch container check logic for handling max message size. although I didn't add any validations in producer configuration data. because:

  1. <=0 is used for disabling number-based or size-based batching strategy.
  2. it is hard to get the upper bound for max message size because it is a dynamic number. in that sense, if I configure the max message size to 10MB and maxBytes should be able to set to be more than 5MB but it should be capped to 10MB.

*/
ProducerBuilder<T> batchingMaxMessages(int batchMessagesMaxMessagesPerBatch);

/**
* Set the maximum number of bytes permitted in a batch. <i>default: 1MB</i>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Set the maximum number of bytes permitted in a batch. <i>default: 1MB</i>
* Set the maximum number of bytes permitted in a batch. <i>default: 128KB</i>

@@ -212,6 +212,12 @@ private ProducerBuilderImpl(PulsarClientImpl client, ProducerConfigurationData c
return this;
}

@Override
public ProducerBuilder<T> batchingMaxBytes(int batchingMaxBytes) {
conf.setBatchingMaxBytes(batchingMaxBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check for batchingMaxBytes <= ClientCnx.getMaxMessageSize().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do that. Because the max message size is only set when a client connects to a broker.

I explained that case in previous comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, max message size can be set at broker side.

protected int numMessagesInBatch = 0;
protected long currentBatchSizeBytes = 0;

protected static final int INITIAL_BATCH_BUFFER_SIZE = 1024;
protected static final int MAX_MESSAGE_BATCH_SIZE_BYTES = 128 * 1024;
protected static final int INITIAL_BATCH_BUFFER_SIZE = 128 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to keep INITIAL_BATCH_BUFFER_SIZE as 1KB, in key_based batcher properly create lot of ByteBuf, if the INITIAL_BATCH_BUFFER_SIZE is 128KB, more memory overhead may be incurred.

We can add the option for init batch buffer size in the future, so that user can decide according to their own situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will probably just assign different values for different containers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, initially we were using the max batch size for this buffer, but it was using a huge amount of memory, especially when publishing on a partitioned topic.

The buffer size is self-adapting to the needed usage (eg. it will keep using the same buffer size), so I don't think we need to make it configurable.

@codelipenghui
Copy link
Contributor

run cpp tests
run java8 tests

@jiazhai
Copy link
Member

jiazhai commented Sep 9, 2019

run java8 tests

1 similar comment
@aahmed-se
Copy link
Contributor

run java8 tests

@sijie
Copy link
Member Author

sijie commented Sep 9, 2019

@codelipenghui @jiazhai @merlimat I have addressed all the comments. PTAL

@sijie
Copy link
Member Author

sijie commented Sep 16, 2019

retest this please

@sijie
Copy link
Member Author

sijie commented Sep 16, 2019

ping @jiazhai @codelipenghui @merlimat

// run cpp tests
// run java8 tests

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@codelipenghui
Copy link
Contributor

run cpp tests
run java8 tests

@sijie
Copy link
Member Author

sijie commented Sep 19, 2019

run java8 tests

@sijie
Copy link
Member Author

sijie commented Sep 22, 2019

run java8 tests

1 similar comment
@jiazhai
Copy link
Member

jiazhai commented Sep 28, 2019

run java8 tests

@jiazhai
Copy link
Member

jiazhai commented Oct 7, 2019

run java8 tests

1 similar comment
@jiazhai
Copy link
Member

jiazhai commented Oct 15, 2019

run java8 tests

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@codelipenghui
Copy link
Contributor

retest this please

@codelipenghui
Copy link
Contributor

@sijie Please take a look the tests.

@sijie
Copy link
Member Author

sijie commented Nov 12, 2019

run java8 tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/client type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants