Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[broker] Optimize blocking backlogQuotaCheck to non-blocking in ServerCnx#handleProducer #12874

Merged
merged 4 commits into from
Jan 25, 2022

Conversation

Jason918
Copy link
Contributor

@Jason918 Jason918 commented Nov 18, 2021

Motivation

Currently, when broker receive "Producer" command, it will check the topic if "isBacklogQuotaExceeded".
While in PersistentTopic#isBacklogQuotaExceeded, isTimeBacklogExceeded is used, in which will turns to a blocking operation if "isPreciseTimeBasedBacklogQuotaCheck" is set as true.

The blocking operations in pulsar io threads may impact broker performance, this PR optimized this blocking procedure to async mode.

Modifications

Add "CompletableFuture checkTimeBacklogExceeded()" in PersistentTopic for the async check procedure.
Update corresponding method calls to async mode in ServerCnx#handleProducer.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as org.apache.pulsar.broker.admin.TopicPoliciesTest and org.apache.pulsar.broker.service.BacklogQuotaManagerTest

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • no-need-doc

Only code optimize.

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Nov 18, 2021
@Jason918 Jason918 force-pushed the fix-blocking-ops-in-topic-creation branch from 9fad6cc to 1adf3fc Compare November 18, 2021 13:57
@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

4 similar comments
@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918
Copy link
Contributor Author

@codelipenghui @hangc0276 @congbobo184 Please help take a look.

@codelipenghui codelipenghui added this to the 2.10.0 milestone Nov 22, 2021
@codelipenghui
Copy link
Contributor

@Jason918 Do you want the resolve the conflicts?

@Jason918
Copy link
Contributor Author

@Jason918 Do you want the resolve the conflicts?

Resolved, sorry for missing this.

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918 Jason918 closed this Nov 23, 2021
@Jason918 Jason918 reopened this Nov 23, 2021
Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change. I think we should probably discuss it on the mailing list before moving forward since it includes breaking changes to the Topic interface. @eolivelli, @codelipenghui, @hangc0276 - do you agree, or is it minor enough that we can move forward here?

topic.checkBacklogQuotaExceeded(producerName, BacklogQuotaType.message_age));
backlogQuotaCheckFuture.exceptionally(throwable -> {
//throwable should be CompletionException holding TopicBacklogQuotaExceededException
BrokerServiceException.TopicBacklogQuotaExceededException exception =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is something else?
A ClassCastException or a NPE.
This will break the system

I believe we should handle the case in which this expectation is not met

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two situations:

  1. If preciseTimeBasedBacklogQuotaCheck set as true (default is false) and some other constrains met, this will be execute in the callback thread of ManagedLedgerImpl#asyncReadEntry(...).
  2. Otherwise everything is executed in previous thread. Nothing async.

CompletableFuture<Void> backlogQuotaCheckFuture = CompletableFuture.allOf(
topic.checkBacklogQuotaExceeded(producerName, BacklogQuotaType.destination_storage),
topic.checkBacklogQuotaExceeded(producerName, BacklogQuotaType.message_age));
backlogQuotaCheckFuture.exceptionally(throwable -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which thread will execute this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am changing the outer thenAccept to thenCompose, the outside exceptionally will handle all unexpected exceptions.


disableTcpNoDelayIfNeeded(topicName.toString(), producerName);
backlogQuotaCheckFuture.thenRun(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which thread will execute this code?

The current thread (in case we reach here and the futures are already completed) or some other thread on completion of the one of the two futures above.

I am not sure we have control over what's happening here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trigger thread is the same as the above backlogQuotaCheckFuture.exceptionally, it's current thread or ReadEntry call back thread.

@@ -2518,17 +2528,26 @@ public boolean isSizeBacklogExceeded() {
* @return determine if backlog quota enforcement needs to be done for topic based on time limit
*/
public boolean isTimeBacklogExceeded() {
try {
return checkTimeBacklogExceeded().get();
} catch (Throwable e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a code smell

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for keeping the same behavior of isTimeBacklogExceeded as before. This method is called somewhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem with this approach is that it completely hides from the caller the fact that the method is internally blocking. We have many other examples of this in the code base

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it. Thx.
I will remove this isTimeBacklogExceeded.

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Jason918 Jason918 force-pushed the fix-blocking-ops-in-topic-creation branch from da61c1b to e1196d1 Compare November 30, 2021 06:47
@Jason918
Copy link
Contributor Author

Jason918 commented Dec 6, 2021

@eolivelli PTAL

CompletableFuture<Void> backlogQuotaCheckFuture = CompletableFuture.allOf(
topic.checkBacklogQuotaExceeded(producerName, BacklogQuotaType.destination_storage),
topic.checkBacklogQuotaExceeded(producerName, BacklogQuotaType.message_age));
backlogQuotaCheckFuture.exceptionally(throwable -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thenCompose return the backlogQuotaCheckFuture to the next stage, is it better to handle the TopicBacklogQuotaExceededException here https://github.com/apache/pulsar/pull/12874/files#diff-1e0e8195fb5ec5a6d79acbc7d859c025a9b711f94e6ab37c94439e99b3202e84R1262 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelipenghui Moved TopicBacklogQuotaExceededException handling outside. PTAL.

@codelipenghui
Copy link
Contributor

@eolivelli Please help review this PR again

@Jason918 Jason918 force-pushed the fix-blocking-ops-in-topic-creation branch from e1196d1 to 1b8ea00 Compare January 18, 2022 09:10
@Jason918
Copy link
Contributor Author

/pulsarbot run-failure-checks

@codelipenghui
Copy link
Contributor

@eolivelli Please help review this PR again

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eolivelli eolivelli merged commit 8c8738f into apache:master Jan 25, 2022
codelipenghui pushed a commit that referenced this pull request Jan 28, 2022
### Motivation

We should only send the error response to the client when the code is able to complete the `producerFuture`. This logic is described here:

https://github.com/apache/pulsar/blob/2285d02aa9957af7877b9d3d3c628a750d813ca7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1286-L1293

Edit: in a previous version of this motivation section, I attributed the current behavior to #12874. That PR did not introduce this behavior, though.

### Modifications

* Move the response to the client into a conditional block that only runs when this section of the code is able to complete the future.
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
…he#13949)

### Motivation

We should only send the error response to the client when the code is able to complete the `producerFuture`. This logic is described here:

https://github.com/apache/pulsar/blob/2285d02aa9957af7877b9d3d3c628a750d813ca7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1286-L1293

Edit: in a previous version of this motivation section, I attributed the current behavior to apache#12874. That PR did not introduce this behavior, though.

### Modifications

* Move the response to the client into a conditional block that only runs when this section of the code is able to complete the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants