Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Consumer thread safe and lock-free #10352

Merged
merged 14 commits into from
May 5, 2021
Merged

Conversation

315157973
Copy link
Contributor

Motivation

Lock-free solution for #10240

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we are limiting the number of concurrent reads on the client, by the size of the executor.

@315157973
Copy link
Contributor Author

315157973 commented Apr 25, 2021

The performance test results are in two folders 2.6.1 and 2.8.0
https://github.com/315157973/openmessaging-benchmark/tree/executor/bin
Now the dependency in pom is 2.8.0, if you want to switch to 2.6.1, you only need to change the dependency to 2.6.1 version of pulsar-client-all.

Test command: bin/benchmark --drivers driver-pulsar/pulsar.yaml workloads/max-rate-1-topic-20-partitions-20p-20c-1kb.yaml

5 IO Thread
It seems that the performance has been improved by 50%, which is similar to the previous test results

client 2.6.1 :
image

client:2.8.0-snapshot (use executor)
image

@315157973
Copy link
Contributor Author

Wow, all unit tests are done at once

@315157973
Copy link
Contributor Author

I will re-trigger CI several times to see if there will be some occasional problems

@315157973
Copy link
Contributor Author

In the case of adding lock, there are still the following flaky tests
https://github.com/apache/pulsar/pull/10240/checks?check_run_id=2435235663#step:9:783
image

In the case of adding thread pool
https://github.com/apache/pulsar/pull/10352/checks?check_run_id=2440197210#step:9:782
image

Therefore, the performance of adding a thread pool and adding a lock is the same. I haven't seen other flaky errors when using the thread pool.

# Conflicts:
#	pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
@315157973 315157973 marked this pull request as ready for review April 28, 2021 13:13
@315157973
Copy link
Contributor Author

@merlimat @sijie @rdhabalia @codelipenghui @eolivelli PTAL, thanks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall is good to me.
I left a comment.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MarvinCai
Copy link
Contributor

nice

@sijie sijie added this to the 2.8.0 milestone May 4, 2021
}
singleMessagePayload.release();
singleMessagePayload.release();
tryTriggerListener();
Copy link
Contributor

@linlinnn linlinnn May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall is good, just left a comment.
I think we don't need move tryTriggerListener into cycle, because we cannot use Listener while calling receive or receiveAsync.

Copy link
Contributor Author

@315157973 315157973 May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior here is consistent, the previous logic is

if(){
  // do some thing
}else{
  // do some thing
}
triggerListener

I just put the triggerListener into the if and else respectively.

If I don’t do this, triggerListener will execute in parallel with the if-else logic

Copy link
Contributor

@linlinnn linlinnn May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You put triggerListener into the cycle
Maybe have a bad influence when the batchSize is big
we enqueue all message and trigger once before, now we enqueue one message and trigger one time.

What about just

if(){
  // do some thing
}else{
  // do some thing
}
internalPinnedExecutor.execute(() -> {
      triggerListener();
});

@eolivelli
Copy link
Contributor

waiting for @linlinnn approval before merging

Copy link
Contributor

@linlinnn linlinnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@eolivelli
Copy link
Contributor

thank you @linlinnn for your comment.

let's wait for CI to complete

@sijie sijie merged commit def1932 into apache:master May 5, 2021
@315157973 315157973 deleted the lock-free branch May 11, 2021 09:02
eolivelli pushed a commit to eolivelli/pulsar that referenced this pull request May 11, 2021
codelipenghui pushed a commit that referenced this pull request Jul 1, 2021
Lock-free solution for #10240

(cherry picked from commit def1932)
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request Dec 2, 2021
Lock-free solution for apache#10240

(cherry picked from commit def1932)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Dec 2, 2021
Lock-free solution for apache#10240

(cherry picked from commit def1932)
codelipenghui added a commit to codelipenghui/incubator-pulsar that referenced this pull request Jun 21, 2022
…ving pending batch receives requests

### Motivation

The consumer will apply the default batch receive policy even if the user will not use the batch receive API.

https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/BatchReceivePolicy.java#L60-L61

This will consume lots of CPU if the client have many consumers (100k consumers)

The Pulsar perf tool can also reproduce the problem if run the test with many consumers

### Modification

If there is no pending batch receive operation for a consumer, no need to trigger the
batch timeout task periodically. We can only start the timeout check after adding batch
receive request to pending request queue.

Remove the lock in MultiTopicsConsumerImpl as apache#10352 does

### Verification

Added new test to verify the batch receive timeout task will not start if no batch
receive request
codelipenghui added a commit that referenced this pull request Jun 23, 2022
…ving pending batch receives requests (#16160)

### Motivation

The consumer will apply the default batch receive policy even if the user will not use the batch receive API.

https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/BatchReceivePolicy.java#L60-L61

This will consume lots of CPU if the client have many consumers (100k consumers)

The Pulsar perf tool can also reproduce the problem if run the test with many consumers

### Modification

If there is no pending batch receive operation for a consumer, no need to trigger the
batch timeout task periodically. We can only start the timeout check after adding batch
receive request to pending request queue.

Remove the lock in MultiTopicsConsumerImpl as #10352 does

### Verification

Added new test to verify the batch receive timeout task will not start if no batch
receive request
codelipenghui added a commit that referenced this pull request Jun 28, 2022
…ving pending batch receives requests (#16160)

The consumer will apply the default batch receive policy even if the user will not use the batch receive API.

https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/BatchReceivePolicy.java#L60-L61

This will consume lots of CPU if the client have many consumers (100k consumers)

The Pulsar perf tool can also reproduce the problem if run the test with many consumers

If there is no pending batch receive operation for a consumer, no need to trigger the
batch timeout task periodically. We can only start the timeout check after adding batch
receive request to pending request queue.

Remove the lock in MultiTopicsConsumerImpl as #10352 does

Added new test to verify the batch receive timeout task will not start if no batch
receive request

(cherry picked from commit a0ccdc9)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Jul 4, 2022
…ving pending batch receives requests (apache#16160)

The consumer will apply the default batch receive policy even if the user will not use the batch receive API.

https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/BatchReceivePolicy.java#L60-L61

This will consume lots of CPU if the client have many consumers (100k consumers)

The Pulsar perf tool can also reproduce the problem if run the test with many consumers

If there is no pending batch receive operation for a consumer, no need to trigger the
batch timeout task periodically. We can only start the timeout check after adding batch
receive request to pending request queue.

Remove the lock in MultiTopicsConsumerImpl as apache#10352 does

Added new test to verify the batch receive timeout task will not start if no batch
receive request

(cherry picked from commit a0ccdc9)
(cherry picked from commit 6ed4ed0)
congbobo184 pushed a commit that referenced this pull request Nov 10, 2022
…ving pending batch receives requests (#16160)

The consumer will apply the default batch receive policy even if the user will not use the batch receive API.

https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/BatchReceivePolicy.java#L60-L61

This will consume lots of CPU if the client have many consumers (100k consumers)

The Pulsar perf tool can also reproduce the problem if run the test with many consumers

If there is no pending batch receive operation for a consumer, no need to trigger the
batch timeout task periodically. We can only start the timeout check after adding batch
receive request to pending request queue.

Remove the lock in MultiTopicsConsumerImpl as #10352 does

Added new test to verify the batch receive timeout task will not start if no batch
receive request

(cherry picked from commit a0ccdc9)
congbobo184 pushed a commit that referenced this pull request Nov 26, 2022
…ving pending batch receives requests (#16160)

The consumer will apply the default batch receive policy even if the user will not use the batch receive API.

https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/BatchReceivePolicy.java#L60-L61

This will consume lots of CPU if the client have many consumers (100k consumers)

The Pulsar perf tool can also reproduce the problem if run the test with many consumers

If there is no pending batch receive operation for a consumer, no need to trigger the
batch timeout task periodically. We can only start the timeout check after adding batch
receive request to pending request queue.

Remove the lock in MultiTopicsConsumerImpl as #10352 does

Added new test to verify the batch receive timeout task will not start if no batch
receive request

(cherry picked from commit a0ccdc9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants