m broker <-> n threads #679

janmejay · 2016-06-01T12:59:38Z

create pre-configured number of broker threads (these threads do everything that broker threads used to do in 1:1 broker to thread model). The way they pace computation is different though, because they are multiplexing many brokers now.
new entity called rd_kafka_broker_thread_t (rkbt), which gets brokers assigned in a way that tries to balances number of brokers assigned per-thread.
termination and freeing of rkb is performed on rkbt
rkbt has a separate rkb array owned by the thread which is re-populated every time a broker is added or removed. Here addition or removal use rkbt broker-assignment lock, but this array is exclusively owned by rkbt. The rkb pointers are copied over this array and then used lock-lessly.
this introduces a new config parameter called 'broker.thread.count' (defaults to 1) and identifies number of threads user intends to run for all the broker related work, which allows user to configure n (thread count) independent of m (broker count)

Note: have tried to fix windows support in transport (WSApoll call), but not sure if that is the right thing to do. That area needs a very close review.

- create pre-configured number of broker threads (these threads do everything that broker threads used to do in 1:1 broker to thread model). The way they pace computation is different though, because they are multiplexing many brokers now. - new entity called rd_kafka_broker_thread_t (rkbt), which gets brokers assigned in a way that tries to balances number of brokers assigned per-thread. - termination and freeing of rkb is performed on rkbt - rkbt has a separate rkb array owned by the thread which is re-populated every time a broker is added or removed. Here addition or removal use rkbt broker-assignment lock, but this array is exclusively owned by rkbt. The rkb pointers are copied over this array and then used lock-lessly. - this introduces a new config parameter called 'broker.thread.count' (defaults to 1) and identifies number of threads user intends to run for all the broker related work, which allows user to configure n (thread count) independent of m (broker count) Note: have tried to fix windows support in transport (WSApoll call), but not sure if that is the right thing to do. That area needs a very close review.

janmejay · 2016-06-01T14:38:50Z

@edenhill here is the status of tests with this change:

all tests pass (bare)
all valgrind tests except 0014 pass (0014 is failing on master, haven't checked why yet, it fails in PR too)
all valgrind tests pass with this patch over HEAD as of Apr 30 (https://github.com/janmejay/librdkafka/commits/multiple_brokers_per_thread)
all helgrind tests pass this patch over HEAD as of Apr 30 (same branch)
some helgrind tests fail as of now on this rebased patch (they same set of tests fail on master too, without this patch)

I haven't tested it on windows and because I have changed WSApoll call, this requires a deeper review of that area.

Also found a problem in trivup (the kafka_path is hardcoded for your dev-env, I think). I had to change it to match my dev-env. Other than that, it worked good.

janmejay · 2016-06-01T15:19:13Z

TEST 0038 takes the same time across master and this patch. Both take 9 seconds. This is with default of broker-threads = 1 when using this patch.

janmejay · 2016-06-01T15:24:45Z

BTW, this is with trivup pointed to 0.9.0.1. I'll do another run with 0.10 and post the results here.

janmejay · 2016-06-01T15:36:58Z

0.10.0.0 comes out clean too (both bare and valgrind)

janmejay · 2016-06-02T10:48:53Z

Everything comes clean on 0.8.2.1 as well (haven't tested other 0.8.x.y versions).

BTW, 0014 with valgrind fails intermittently. Haven't had a chance to dig more yet.

I'll roll this out in a test cluster and plan to perform large cluster benchmarks with it after validation run.

Will keep you posted.

janmejay · 2016-06-10T10:33:48Z

This has been deployed (0.9.1 + patch) to a production env and has been working well for a few days now. In this scenario it is being used as a publisher that auto-partitions across 49 brokers (each broker having 10s of partitions). This is with default broker.thread.count (which is 1). Each producer node is pushing at ~100 Mbps and there are 66 producer nodes.

janmejay · 2016-06-20T11:00:13Z

So benchmark with 1 or 2 broker thread does extremely good (no CPU burn, no run-queue load visible). This is with 300 brokers.

Here is a profile with the change (2 broker-threads, 300 brokers):

Here is the old one for comparison (this was using the old thread-per-broker model) that I shared with you over mail when we started talking about this.

janmejay · 2016-06-20T11:04:33Z

In the second screenshot (old), section marked "Production" was running 49 brokers (which is why the problems does not show up). The one marked "Benchmark" was running 300 brokers (afair), which clearly shows the problem.

edenhill · 2016-07-05T10:35:43Z

Sorry for not getting back to you sooner, this is a substantial change and I need time to go through it.

I have some questions:

How are brokers balanced among the available threads?
When are brokers rebalanced?
How is throughput affected? (e.g., using rdkafka_performance)

janmejay · 2016-07-05T11:17:30Z

Hi Magnus, no problem. Answers:

Q: How are brokers balanced among the available threads?
A: All broker threads start with 0 brokers assigned. When adding a new broker it picks the broker threads with lowest number of assigned brokers (a per-broker-thread counter).

Q: When are brokers rebalanced?
A: As of now, never. But this is fairly simple to do. Intuitively it requires the same level of exclusive access as broker-addition. We can basically take all brokers out and add them back in according to the balancing criterion.

Q: How is throughput affected? (e.g., using rdkafka_performance)
A: Data from separate load-tests:

With 2 broker-threads, 300 brokers, 150 producers and aggregate throughput of 6M messages per second with each message between 1.5 to 1.8 KB: This was with snappy. Aggregate TP per producer: 64 Mbps (compressed).
With 1 broker-thread, 100 Mbps per producer, compressed(snappy), in a cluster of 49 brokers receiving from 66 producers.

Intuitively, because work that was dispatched on the broker thread hasn't changed, I don't anticipate any drop in performance in small clusters (small clusters can afford higher thread:broker ratio too). Large clusters such as the first-test above show remarkable improvement in screenshots). With 1:1 model we used to grind-halt in terms of throughput due to CPU burn. Even on 49 node kafka cluster (49 brokers) it used to burn atleast > 30% more CPU with 1:1 model, so lower compute overhead should lead to better throughput.

Every load-testing environment is different, so I think it may be worth running a perf-test in controlled environment. Because number of broker threads is configurable, smaller clusters can choose 1:1 model (balancing criterion will ensure 1:1 if number of broker threads is same as number of brokers).

…o unset/clear it, if connect was not successful by the first poll call. So this was a race-condition between connect finishing successfully and first poll call returning. Sometimes poll call returned first which lead to POLLOUT event-mask being cleared from pollfd, causing broker to remain in RD_KAFKA_BROKER_STATE_CONNECT perpetually. This fixes it by not trying to set state while connecting, but instead recognizing before poll that RD_KAFKA_BROKER_STATE_CONNECT state requires POLLOUT enabled for connect-completion to be recorded and hence state-transition to UP state.

janmejay · 2016-08-24T20:19:22Z

Fixed a bug (race condition between connect and poll) that left brokers in RD_KAFKA_BROKER_STATE_CONNECT state if acquiring the connection took long enough.

This is not related to m broker <-> n threads patch, this was an existing unrelated bug.

ghost · 2016-11-02T13:33:17Z

@edenhill Hi Magnus, can this patch getting merged into trunk? The Kafka Java client already does this and this would help us run librdkafka with a large number of broker machines.

ljackson · 2017-02-07T18:13:08Z

Bump? Whats the status of this change wrt master we may need this optimization soon?

edenhill · 2017-02-07T20:01:38Z

I really appreciate the effort in this PR, and I'll make use of some parts of it, but the way forward is to split up broker threads into IO threads (low-latency) and partition threads (high concurrency).
The target is the June release.

@ljackson Can you tell me what problems you are seeing?

ljackson · 2017-02-07T21:38:00Z

Nothing as of yet but we are just ramping up the replacement of Sarama with librdkafka/golang wrapper and we have large Kafka clusters and will be adding more. Mainly I wanted to understand how you were addressing the one thread per broker potential issues. Thx

edenhill · 2017-02-07T21:51:35Z

Happy to hear that.
What number of brokers, topics and typical partition counts do you reckon you'll see the coming 12 months?

janmejay · 2017-02-08T07:33:37Z

@edenhill can you describe the design you have in mind in more detail?

edenhill · 2017-02-08T07:51:21Z

@janmejay This design stems from solving the latency issue of waiting for condvars and IO simultaneously, which is not possible on most platforms:

have one (or more) IO threads that use the most appropriate IO event mechanism for the platform (epoll on linux, kqueues on osx, ...). They only wait on IO, not buffer queues, and it is up to the other threads to enable POLLOUT on fds when they have something to send. E.g., IO-based wakeups rather than condvar-based. This solves the latency issue. Transmits will be immediate. One IO thread should be enough.
have a set of partition worker threads that does batching, (de)compression, protocol parsing, etc. This concurrency is for performance. How this thread pool is scaled (automatically, statically, weighed, ..) is not decided on.
the broker control plane is moved to the main rdkafka thread, with some parts (fetch decisions) moved to partition threads.

rgerhards · 2018-01-04T08:04:10Z

Just for the records: we are also getting user questions on kafka tuning from high-performance rsyslog users.

edenhill · 2018-01-04T15:14:24Z

We're looking to address this issue in Q2.

edenhill · 2018-01-04T15:14:55Z

#825

qduyang · 2018-08-02T04:46:42Z

Hi, there are too many threads if i have multiple consumers, and it will cause performance issue if there are thread context switching.
May I know when we could have the feature to reduce thread count?

rnpridgeon · 2018-08-02T12:41:18Z

@qduyang
#1659

qduyang · 2018-08-03T03:37:11Z

@rnpridgeon Thanks for your information. But the solution isn't going to solve my problem, because I have to consume messages from multiple sources in parallel. For example I have 10 separate Kafka message sources (with each source has 3 brokers), in this case there would have about ~40 threads in background and which would cause heavy CPU contention.

rgerhards · 2019-07-12T13:10:02Z

@edenhill does the "close" mean this issue has now been addressed?

edenhill · 2019-07-12T15:21:44Z

@rgerhards No, we're still spawning one thread per broker.
We'll eventually look into fixing this, but not in the near term.

hardy-luo · 2023-02-23T09:13:01Z

create pre-configured number of broker threads (these threads do everything that broker threads used to do in 1:1 broker to thread model). The way they pace computation is different though, because they are multiplexing many brokers now.

new entity called rd_kafka_broker_thread_t (rkbt), which gets brokers assigned in a way that tries to balances number of brokers assigned per-thread.

termination and freeing of rkb is performed on rkbt

rkbt has a separate rkb array owned by the thread which is re-populated every time a broker is added or removed. Here addition or removal use rkbt broker-assignment lock, but this array is exclusively owned by rkbt. The rkb pointers are copied over this array and then used lock-lessly.

this introduces a new config parameter called 'broker.thread.count' (defaults to 1) and identifies number of threads user intends to run for all the broker related work, which allows user to configure n (thread count) independent of m (broker count)

Note: have tried to fix windows support in transport (WSApoll call), but not sure if that is the right thing to do. That area needs a very close review.
@janmejay hello janmejay, could you please modify your code with Newest Librdkafka ?

janmejay · 2023-03-06T05:12:13Z

@hardy-luo I am completely out of cycles at this point. I can't hope to look into this.

rgerhards mentioned this pull request Jan 4, 2018

Hang condition or Memory leak with rsyslog 8.30 with omkafka rsyslog/rsyslog#1991

Closed

edenhill closed this Jul 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

m broker <-> n threads #679

m broker <-> n threads #679

janmejay commented Jun 1, 2016

janmejay commented Jun 1, 2016

janmejay commented Jun 1, 2016

janmejay commented Jun 1, 2016

janmejay commented Jun 1, 2016 •

edited

Loading

janmejay commented Jun 2, 2016

janmejay commented Jun 10, 2016

janmejay commented Jun 20, 2016

janmejay commented Jun 20, 2016

edenhill commented Jul 5, 2016

janmejay commented Jul 5, 2016

janmejay commented Aug 24, 2016

ghost commented Nov 2, 2016

ljackson commented Feb 7, 2017

edenhill commented Feb 7, 2017

ljackson commented Feb 7, 2017

edenhill commented Feb 7, 2017

janmejay commented Feb 8, 2017 •

edited

Loading

edenhill commented Feb 8, 2017

rgerhards commented Jan 4, 2018

edenhill commented Jan 4, 2018

edenhill commented Jan 4, 2018

qduyang commented Aug 2, 2018

rnpridgeon commented Aug 2, 2018

qduyang commented Aug 3, 2018

rgerhards commented Jul 12, 2019

edenhill commented Jul 12, 2019

hardy-luo commented Feb 23, 2023

janmejay commented Mar 6, 2023

m broker <-> n threads #679

m broker <-> n threads #679

Conversation

janmejay commented Jun 1, 2016

janmejay commented Jun 1, 2016

janmejay commented Jun 1, 2016

janmejay commented Jun 1, 2016

janmejay commented Jun 1, 2016 • edited Loading

janmejay commented Jun 2, 2016

janmejay commented Jun 10, 2016

janmejay commented Jun 20, 2016

janmejay commented Jun 20, 2016

edenhill commented Jul 5, 2016

janmejay commented Jul 5, 2016

janmejay commented Aug 24, 2016

ghost commented Nov 2, 2016

ljackson commented Feb 7, 2017

edenhill commented Feb 7, 2017

ljackson commented Feb 7, 2017

edenhill commented Feb 7, 2017

janmejay commented Feb 8, 2017 • edited Loading

edenhill commented Feb 8, 2017

rgerhards commented Jan 4, 2018

edenhill commented Jan 4, 2018

edenhill commented Jan 4, 2018

qduyang commented Aug 2, 2018

rnpridgeon commented Aug 2, 2018

qduyang commented Aug 3, 2018

rgerhards commented Jul 12, 2019

edenhill commented Jul 12, 2019

hardy-luo commented Feb 23, 2023

janmejay commented Mar 6, 2023

janmejay commented Jun 1, 2016 •

edited

Loading

janmejay commented Feb 8, 2017 •

edited

Loading