Publish rate limit on broker to avoid OOM #5513

jiazhai · 2019-10-31T07:10:50Z

Out Of Memory is seen frequently while client produced message too quickly and used all the direct memory.
This happens like this:

Client send message to broker;
Broker call bookkeeper client to write message to Bookie, and hold the direct memory until write success;
If in the 2nd step the direct memory not get released, then the Broker will get OOM.

It would be great to have a way to detect the memory pressure, and do the publish rate limit if lack of memory; once the memory get back, cancel the rate limit.

merlimat · 2019-10-31T07:12:45Z

There's already a limit per-connection of max outstanding entries between broker and bookies. Do you have a precise scenario for this to happen?

rdhabalia · 2019-11-01T00:50:15Z

@jiazhai have you started seeing OOM recently in 2.5 or it was there before 2.5?
are you proposing similar publish rate implemented at #3985

sijie · 2019-11-04T02:58:32Z

@rdhabalia

We have users doing load tests over Pulsar broker. The behavior we have observed is as follows:

disks are overwhelmed. the entries are queuing up in the broker side. the load test is continuing and causing broker OOM.

The tests were done in 2.4.2. They are upgrading the cluster to 2.5.0 and run the test again.

Regarding the feature you introduced in #3985, we have asked them to try it out. However they can't really predicate the traffic and set a good publish rate in advance. Hence we are looking for a better solution at the broker level. We are thinking of reusing the mechanism you introduced in #3985. The idea is to disable autoRead when the max pending requests exceeds a threshold or the direct memory usage exceeds a threshold.

jiazhai · 2019-11-04T07:38:38Z

Hi @rdhabalia @merlimat. Thanks for your concern. And thanks @rdhabalia for your great feature of #3985.
Recently there is a user trying to use Pulsar along with Flink, and there is a significant amount of messages produced. They meet the OOM of direct memory very often.
#3985 and the setting that Matteo mentioned may solve part of the issue, but the OOM in Broker may still exists.
I thought we could leverage part of the change in #3985, and add another broker scope config to do the publish rate limiting.

rdhabalia · 2019-11-04T23:17:57Z

the setting that Matteo mentioned may solve part of the issue, but the OOM in Broker may still exists.

Is it possible for you to reproduce OOM with master code-base. Just make sure that broker has change #3985 and you don't have to enable publish-rate limiting. I know disabling auto-read was not working all the time with ByteToMessageDecoder because of this reason and I think it's taken care in #3985.

So, if you are still able to reproduce OOM then can you please provide the steps to reproduce and error-log?

sijie · 2019-11-20T19:14:20Z

@rdhabalia

We are able to reproduce OOM problem with master code-base (including your code change and doesn't enable publish-rate limiting).

The steps to reproduce:

setup a bookkeeper cluster with 10 nodes. Each node has one SSD disk for both journal and ledger directory.
setup a broker with 10 nodes.
create a topic with 20 partitions.
launch a flink job with a data-generator source (without throttling) and a flink pulsar sink (https://github.com/streamnative/pulsar-flink), with parallelism == 20. The producer setting in the pulsar sink is the default producer setting.
run the flink job for a while. you can observe the OOM very quickly.

The heapdump of broker shows a lot of pending entries because the flink job is producing more than the bookkeeper cluster can accept. so all the entries are accumulating at the broker and cause broker crash due to OOM.

The expectation is broker should degrade when reaching its capacity limitation and give back pressure to the clients. Broker shouldn't crash due to OOM.

If we don't provide the capability as Jia proposed in #5710, pulsar can't be used in high-volume ingestion workload.

rdhabalia · 2019-11-25T22:39:50Z

@merlimat
There's already a limit per-connection of max outstanding entries between broker and bookies. Do you have a precise scenario for this to happen?

@sijie
create a topic with 20 partitions. with parallelism == 20. The producer setting in the pulsar sink is the default producer setting.

I gave a try with similar setup and somehow I couldn't reproduce it. and reason broker doesn't go OOM because I think what @merlimat mentioned. Broker restricts max pending publish request per connection.
https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L130

also, introducing counter across all topics for throttling can cause bottleneck while publishing for all topics so, this feature might not be recommended for most of the users.

so, I would recommend to depend on maxPendingRequestPerConnection rather adding more complexity and if that's not working then it's worth to investigate why it disabling channel still cause OOM. I have also created #5742 which can allow users to configure max-pending requests per connection if needed.

sijie · 2019-11-25T23:15:07Z

@rdhabalia

I gave a try with similar setup and somehow I couldn't reproduce it.

I am not sure how do you setup. Are you running the flink connector? or you simulate it? Did you run enough parallelism to stress test the cluster?

OOM is a common scene during our test.

Broker restricts max pending publish request per connection.

But the broker is still facing OOM when the number of connections increase, no?

also, introducing counter across all topics for throttling can cause bottleneck while publishing for all topics so, this feature might not be recommended for most of the users.

The feature we are adding is controlled by a flag. People need to pre-configure what is the rate that a broker can accept. That rate is typically aligned with your NIC configuration. E.g. 80% of your NIC bandwidth. We might make it smart by automatically adjust the rate based on NIC bandwidth and memory usage. but we will always make it easy to turn on/off with a flag.

What is the side effect of adding this rate limiter at broker level?

depend on maxPendingRequestPerConnection rather adding more complexity

We can't really depend on maxPendingRequestPerConnection. Because when you have a lot of concurrent clients (connections), that setting doesn't worker. Also using number of requests doesn't give an accurate estimation. because the size of per request varies between applications. A broker level mechanism that based on bytes not requests is much robust than the other approaches.

You mentioned that "I know disabling auto-read was not working all the time with ByteToMessageDecoder because of this reason and I think it's taken care in #3985." But the implementation that Jia proposed is using your implementation at #3985. If the problem is taken care by #3985, then there shouldn't be a problem in #5710; if the problem is not taken care by #3985, then it is a problem for both namespace level rate limiter and a broker level rate limiter, which I am not sure it is a problem for adding a broker level rate limiter not a problem for adding the namespace level rate limiter.

rdhabalia · 2019-11-25T23:30:55Z

I am not sure how do you setup. Are you running the flink connector?

no, I have pref broker setup and trying to publish messages with multiple processes of perf-producers.

my only point is to figure out root cause of OOM and address it. if broker is going OOM with 20 topics and 20-400 producers then do we think auto-read is not working as expected and broker is still accepting messages after auto-read disable and keeping them in memory is causing OOM?

sijie · 2019-11-25T23:52:46Z

my only point is to figure out root cause of OOM and address it.
do we think auto-read is not working as expected

Yes. we think auto-read is not working as expected, because maxPendingRequestPerConnection is not a useful mechanism. 1) it is not a broker level mechanism. it only limits per connection. when a broker has increase number of connections, the limit is changed; 2) it is not a bytes based mechanism. it limits on number of requests, the request size can change when the batching behavior changes in the client side. the number of bytes pending on the channel is non-deterministic. 3) it is almost impossible to determine a value based on maxPendingRequestPerConnection in a real production traffic. (think about the peak traffic in China's shopping festival day).

The requirement from us is deadly simple - I have a broker, no matter how many clients send the request and how clients batch the request, broker should work as normal and give backpressure to the client when exceed the capacity that a broker offers. The ideal perfect solution is to disable auto-read when the resource usage (aka memory usage, cpu usage, or network usage) exceed a threshold and the process should be done automatically. However to achieve a perfect solution like that takes time and is usually complicated. The closest solution that we can provide is to have a broker-level rate limiter : 1) we can limit the traffic based on an aggregate throughput (bytes/second); 2) the mechanism is already available by #3985, we just piggyback.

sijie · 2019-11-25T23:57:37Z

@rdhabalia putting OOM question aside, what are the concerns of adding a broker-level rate limiter as a feature? Just looking into features that Pulsar has, it usually has broker-level settings and namespace-level settings. As a feature, doesn't it make sense to also have a broker-level rate limiter in addition to namespace-level rate limiter?

rdhabalia · 2019-11-26T00:08:19Z

putting OOM question aside, what are the concerns of adding a broker-level rate limiter

I wanted to figure out root cause of OOM as we are trying to target that issue mainly. I had verified autoread behavior while implementing #3985 so, was curious about the actual issue. also, all threads are going to do rate limit using one counter and that could be bottleneck as well so, one wants to avoid it as well.

sijie · 2019-11-26T00:25:20Z

I wanted to figure out root cause of OOM as we are trying to target that issue mainly

what else we can provide you to figure out the root cause of OOM?

I had verified autoread behavior while implementing #3985 so,

We also verified #3985 works as expected. As I explained, auto-read mechanism is not a problem. The problem is we need to a broker-level metric/mechanism to disable auto-read, which is not a namespace based or a connection based mechanism. That's why we need a broker-level rate limiter.

also, all threads are going to do rate limit using one counter and that could be bottleneck as well so, one wants to avoid it as well.

this is a feature that users can turn it on/off. If people doesn't need this feature, they don't have to turn on broker-level rate limiting.
If current counter is a bottleneck, we can seek for an improved method to improve counting. e.g. a better counter implementation.

The question here is more about: do we need a broker-level mechanism to meet the requirements of running Pulsar at high load? If we need, then why not adopt the rate limiter approach? If we don't, then what are the other approaches? (I have tried to explain that existing approaches don't apply to our requirements as above)

sijie · 2019-11-26T00:35:16Z

all threads are going to do rate limit using one counter and that could be bottleneck as well so, one wants to avoid it as well.

The current implementation of RateLimiter is using LongAdder for counting the bytes and msgs. A LongAdder maintains multiple cells for counters. They are only aggregated when sum is called. The sum is called when the monitor, which is done periodically in the background thread. I am not sure how this would be a bottleneck.

LongAdder is also used for other metrics. Shouldn't they be all concerned as well?

rdhabalia · 2019-11-26T00:36:35Z

(I have tried to explain that existing approaches don't apply to our requirements as above)

sure. If you feel this will be useful for your usecase and useful to your users then we can add it. 👍

jiazhai · 2019-11-26T01:52:17Z

@rdhabalia @sijie Thanks for your comments. And many thanks for @rdhabalia 's PR #3985, It worked well in our test.
And as the steps that @sijie mentioned, It could be reproduced easy in user side. And user complained a lot for this OOM. This should be a good way to solve this issue.

Fixes #5513 ### Motivation Through #3985, user could set the publish rate for each topic, but the topic number for each broker is not limited, so there is case that a lot of topics served in same broker, and if each topic send too many message, it will cause the messages not able to send to BookKeeper in time, and messages be hold in the direct memory of broker, and cause Broker out of direct memory. ### Modifications - add broker publish rate limit base on #3985, - add unit test. ### Verifying this change unit test passed.

jiazhai added the type/feature The PR added a new feature or issue requested a new feature label Oct 31, 2019

jiazhai added this to the 2.5.0 milestone Oct 31, 2019

jiazhai added the triage/week-43 label Oct 31, 2019

jiazhai mentioned this issue Nov 20, 2019

Add publish rate limit for each broker to avoid OOM #5710

Merged

sijie assigned jiazhai Nov 25, 2019

jiazhai closed this as completed in #5710 Nov 26, 2019

andrekramer1 mentioned this issue Dec 20, 2019

Out of Memory with many consumers #5896

Closed

sijie mentioned this issue Dec 27, 2019

ISSUE-5896: Out of Memory with many consumers streamnative/pulsar-archived#380

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish rate limit on broker to avoid OOM #5513

Publish rate limit on broker to avoid OOM #5513

jiazhai commented Oct 31, 2019

merlimat commented Oct 31, 2019

rdhabalia commented Nov 1, 2019

sijie commented Nov 4, 2019

jiazhai commented Nov 4, 2019

rdhabalia commented Nov 4, 2019

sijie commented Nov 20, 2019

rdhabalia commented Nov 25, 2019

sijie commented Nov 25, 2019

rdhabalia commented Nov 25, 2019

sijie commented Nov 25, 2019 •

edited

Loading

sijie commented Nov 25, 2019

rdhabalia commented Nov 26, 2019

sijie commented Nov 26, 2019

sijie commented Nov 26, 2019

rdhabalia commented Nov 26, 2019

jiazhai commented Nov 26, 2019

Publish rate limit on broker to avoid OOM #5513

Publish rate limit on broker to avoid OOM #5513

Comments

jiazhai commented Oct 31, 2019

merlimat commented Oct 31, 2019

rdhabalia commented Nov 1, 2019

sijie commented Nov 4, 2019

jiazhai commented Nov 4, 2019

rdhabalia commented Nov 4, 2019

sijie commented Nov 20, 2019

rdhabalia commented Nov 25, 2019

sijie commented Nov 25, 2019

rdhabalia commented Nov 25, 2019

sijie commented Nov 25, 2019 • edited Loading

sijie commented Nov 25, 2019

rdhabalia commented Nov 26, 2019

sijie commented Nov 26, 2019

sijie commented Nov 26, 2019

rdhabalia commented Nov 26, 2019

jiazhai commented Nov 26, 2019

sijie commented Nov 25, 2019 •

edited

Loading