Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-client] Remove UUID generation on sending message #7705

Merged
merged 2 commits into from
Aug 2, 2020

Conversation

rdhabalia
Copy link
Contributor

Motivation

Pulsar client producer requires uuid while sending chunked messages. right now, Pulsar-client lib generates UUID for every send message which is expensive and impacts publish performance. Therefore, UUID generation shouldn't be expensive and producer should not generate uuid for non-chunked messages.

Modification

  • Avoid uuid generation for non-chunk messages
  • Generate uuid for message using global-producer name and message-sequenceId.

Note
This fix should be cherry-picked to 2.6.1 release as well.

@rdhabalia rdhabalia added this to the 2.7.0 milestone Jul 31, 2020
@rdhabalia rdhabalia self-assigned this Jul 31, 2020
// Globally unique producer name
/**
* Globally unique producer name generated by server. It should be the same as {@link #producerName} unless user
* configures {@link ProducerConfigurationData::setProducerName}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always be the same as producerName. If the client overrides it, the server will use that name, but if a producer with same name is already connected, it will instead error it out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it but I don't think user_producer_id + sequence_id can create UUID because user_producer_id can be reused by other producer later on and sequence_id can start with 0 again and it will create a duplicate combination which will not be unique across messages on the topic and that can be issue for message chunking.
eg:
Producer P1 gives name userProducer1 and publish message with sequenceId=0. Now that process died and another process created producer with similar name userProducer1 and started message with sequenceId=0 in which case UUID will not be unique.
Therefore, using serverProducerName always gives a guarantee for unique producer-id and UUID. Adding new string serverProducerId will not create any overhead as well. So, can you please let me know your thought on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merlimat did you get chance to read my above comment before merging the PR?

@srkukarni srkukarni merged commit 094d90e into apache:master Aug 2, 2020
jerrypeng pushed a commit to jerrypeng/incubator-pulsar that referenced this pull request Aug 14, 2020
* [pulsar-client] Remove UUID generation on sending message

* fix prod name
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
* [pulsar-client] Remove UUID generation on sending message

* fix prod name
lbenc135 pushed a commit to lbenc135/pulsar that referenced this pull request Sep 5, 2020
* [pulsar-client] Remove UUID generation on sending message

* fix prod name
lbenc135 pushed a commit to lbenc135/pulsar that referenced this pull request Sep 5, 2020
* [pulsar-client] Remove UUID generation on sending message

* fix prod name
lbenc135 pushed a commit to lbenc135/pulsar that referenced this pull request Sep 5, 2020
* [pulsar-client] Remove UUID generation on sending message

* fix prod name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants