-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
m broker <-> n threads #679
m broker <-> n threads #679
Conversation
- create pre-configured number of broker threads (these threads do everything that broker threads used to do in 1:1 broker to thread model). The way they pace computation is different though, because they are multiplexing many brokers now. - new entity called rd_kafka_broker_thread_t (rkbt), which gets brokers assigned in a way that tries to balances number of brokers assigned per-thread. - termination and freeing of rkb is performed on rkbt - rkbt has a separate rkb array owned by the thread which is re-populated every time a broker is added or removed. Here addition or removal use rkbt broker-assignment lock, but this array is exclusively owned by rkbt. The rkb pointers are copied over this array and then used lock-lessly. - this introduces a new config parameter called 'broker.thread.count' (defaults to 1) and identifies number of threads user intends to run for all the broker related work, which allows user to configure n (thread count) independent of m (broker count) Note: have tried to fix windows support in transport (WSApoll call), but not sure if that is the right thing to do. That area needs a very close review.
@edenhill here is the status of tests with this change:
I haven't tested it on windows and because I have changed WSApoll call, this requires a deeper review of that area. Also found a problem in trivup (the kafka_path is hardcoded for your dev-env, I think). I had to change it to match my dev-env. Other than that, it worked good. |
TEST 0038 takes the same time across master and this patch. Both take 9 seconds. This is with default of broker-threads = 1 when using this patch. |
BTW, this is with trivup pointed to 0.9.0.1. I'll do another run with 0.10 and post the results here. |
0.10.0.0 comes out clean too (both bare and valgrind) |
Everything comes clean on 0.8.2.1 as well (haven't tested other 0.8.x.y versions). BTW, 0014 with valgrind fails intermittently. Haven't had a chance to dig more yet. I'll roll this out in a test cluster and plan to perform large cluster benchmarks with it after validation run. Will keep you posted. |
This has been deployed (0.9.1 + patch) to a production env and has been working well for a few days now. In this scenario it is being used as a publisher that auto-partitions across 49 brokers (each broker having 10s of partitions). This is with default broker.thread.count (which is 1). Each producer node is pushing at ~100 Mbps and there are 66 producer nodes. |
In the second screenshot (old), section marked "Production" was running 49 brokers (which is why the problems does not show up). The one marked "Benchmark" was running 300 brokers (afair), which clearly shows the problem. |
Sorry for not getting back to you sooner, this is a substantial change and I need time to go through it. I have some questions:
|
Hi Magnus, no problem. Answers: Q: How are brokers balanced among the available threads? Q: When are brokers rebalanced? Q: How is throughput affected? (e.g., using rdkafka_performance)
Intuitively, because work that was dispatched on the broker thread hasn't changed, I don't anticipate any drop in performance in small clusters (small clusters can afford higher thread:broker ratio too). Large clusters such as the first-test above show remarkable improvement in screenshots). With 1:1 model we used to grind-halt in terms of throughput due to CPU burn. Even on 49 node kafka cluster (49 brokers) it used to burn atleast > 30% more CPU with 1:1 model, so lower compute overhead should lead to better throughput. Every load-testing environment is different, so I think it may be worth running a perf-test in controlled environment. Because number of broker threads is configurable, smaller clusters can choose 1:1 model (balancing criterion will ensure 1:1 if number of broker threads is same as number of brokers). |
…o unset/clear it, if connect was not successful by the first poll call. So this was a race-condition between connect finishing successfully and first poll call returning. Sometimes poll call returned first which lead to POLLOUT event-mask being cleared from pollfd, causing broker to remain in RD_KAFKA_BROKER_STATE_CONNECT perpetually. This fixes it by not trying to set state while connecting, but instead recognizing before poll that RD_KAFKA_BROKER_STATE_CONNECT state requires POLLOUT enabled for connect-completion to be recorded and hence state-transition to UP state.
Fixed a bug (race condition between connect and poll) that left brokers in RD_KAFKA_BROKER_STATE_CONNECT state if acquiring the connection took long enough. This is not related to m broker <-> n threads patch, this was an existing unrelated bug. |
@edenhill Hi Magnus, can this patch getting merged into trunk? The Kafka Java client already does this and this would help us run librdkafka with a large number of broker machines. |
Bump? Whats the status of this change wrt master we may need this optimization soon? |
I really appreciate the effort in this PR, and I'll make use of some parts of it, but the way forward is to split up broker threads into IO threads (low-latency) and partition threads (high concurrency). @ljackson Can you tell me what problems you are seeing? |
Nothing as of yet but we are just ramping up the replacement of Sarama with librdkafka/golang wrapper and we have large Kafka clusters and will be adding more. Mainly I wanted to understand how you were addressing the one thread per broker potential issues. Thx |
Happy to hear that. |
@edenhill can you describe the design you have in mind in more detail? |
@janmejay This design stems from solving the latency issue of waiting for condvars and IO simultaneously, which is not possible on most platforms:
|
Just for the records: we are also getting user questions on kafka tuning from high-performance rsyslog users. |
We're looking to address this issue in Q2. |
Hi, there are too many threads if i have multiple consumers, and it will cause performance issue if there are thread context switching. |
@rnpridgeon Thanks for your information. But the solution isn't going to solve my problem, because I have to consume messages from multiple sources in parallel. For example I have 10 separate Kafka message sources (with each source has 3 brokers), in this case there would have about ~40 threads in background and which would cause heavy CPU contention. |
@edenhill does the "close" mean this issue has now been addressed? |
@rgerhards No, we're still spawning one thread per broker. |
|
@hardy-luo I am completely out of cycles at this point. I can't hope to look into this. |
Note: have tried to fix windows support in transport (WSApoll call), but not sure if that is the right thing to do. That area needs a very close review.