Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is sarama so fast compared to segmentio's kafka-go? #1647

Closed
aneeskA opened this issue Mar 19, 2020 · 2 comments
Closed

Why is sarama so fast compared to segmentio's kafka-go? #1647

aneeskA opened this issue Mar 19, 2020 · 2 comments

Comments

@aneeskA
Copy link

aneeskA commented Mar 19, 2020

This is not an issue; but more of a question:

I am trying to send 100GB of data into kafka topic by breaking the 100GB into batches of 100 lines.

Using kafka-go, I see that it writes 1 message per second, as. per : https://www.gitmemory.com/issue/segmentio/kafka-go/326/519375403 . To over this issue, I created a go routine for each write. This immediately improved the throughput.

But the application was quickly killed by OOM Killer since the data to be written was created faster than writing to kafka and the data that was accumulated with kafka-go exhausted the memory.

But when I did the same experiment using sarama, 100GB of data was moved to kafka in 2hr10mins!! There was no concurrent routines to do the writes. It was done one write after the other - serially.

Why is this so? How come sarama is able to move data so fast? Is there any tradeoff that comes with this? Can anyone please explain.

@ilia-b
Copy link

ilia-b commented May 7, 2020

I've seen this issue in segmentio's kafka-go.
Something with settings. It's keeping produce buffer for a second, before sending to a broker (to accumulate).
Could not figure the settings out then...
Also using Sarama now.

@dnwe
Copy link
Collaborator

dnwe commented May 7, 2020

Yes I'd imagine this is somewhat due to different defaults in terms of producer buffering and queue sizes, much like with the Java client where you can tune linger time and batching sizes to optimise for throughout vs latency and memory usage.

However, obviously we'd recommend you use Sarama rather than segmentio kafka-go anyway 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants