Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Consumer benchmark test for paused partitions #7221
For details about this new Kafka Consumer benchmark test see Jira issue KAFKA-8814. Original PR and Jira:
To recreate the tests from the Jira issue:
hachikuji left a comment •
@seglo Thanks, this is pretty cool. I'm kind of debating whether this is a general enough need that it makes sense to add it the consumer performance tool. It is definitely useful to understand how pause/resume impacts performance, but it feels a bit too tailored to the consumer api. For example, we resume immediately after each poll rather than having a pause duration or something like that. We could also try to tie it to the data more closely. I think in streams, we use the pause api to control the maximum time lag between different partitions. Would it make sense to do something similar so that the benchmark could be more realistic?
@hachikuji Thanks for the reply. When I first started exploring the way to benchmark this work I had some reservations about modifying the consumer performance tool as well. It makes sense that the existing benchmarks use this tool, but it does place limits on the types of consumer scenarios that can be tested.
There does seem to be precedent to modify the tools for system testing. Some of the apps in
I like your idea about testing how the partition pauses affect Kafka Streams, but I'm not very familiar with the use case or if this fix has much impact for it. I can speak to how the Alpakka Kafka project will benefit from this fix. The consumer
IIRC the original issue was reported by LinkedIn WRT how Samza pauses partitions during its operation, but I'm not familiar with that use case either. I think there's value in demonstrating the performance gain with a low level test like this one because it's simpler to understand, but I agree that maybe it should avoid modifying the consumer performance tool.
Perhaps I could modify
I looked at Kafka Streams partition pausing use cases, but I'm not sure how to use Kafka Streams in a way that would trigger lots of partition pause/resumes to demonstrate the issue like I have in this PR, or with external projects that use the
I looked at
I considered making a copy
I think there is precedent for modifying the public-facing perf tools for system tests, as I mentioned in this comment: #7221 (comment)
@ijuma Do you have any suggestions?