Skip to content

Consumer benchmark test for paused partitions#7221

Closed
seglo wants to merge 2 commits into
apache:trunkfrom
seglo:seglo/KAFKA-8814
Closed

Consumer benchmark test for paused partitions#7221
seglo wants to merge 2 commits into
apache:trunkfrom
seglo:seglo/KAFKA-8814

Conversation

@seglo
Copy link
Copy Markdown
Member

@seglo seglo commented Aug 19, 2019

For details about this new Kafka Consumer benchmark test see Jira issue KAFKA-8814. Original PR and Jira:

To recreate the tests from the Jira issue:

# Run on trunk
TC_PATHS="tests/kafkatest/benchmarks/core/benchmark_test.py::Benchmark.test_consumer_throughput" bash tests/docker/run_tests.sh
# Rebase onto tag 2.3.0
git rebase --onto 2.3.0 trunk
# Run on 2.3.0
TC_PATHS="tests/kafkatest/benchmarks/core/benchmark_test.py::Benchmark.test_consumer_throughput" bash tests/docker/run_tests.sh

@ijuma @hachikuji Please review at your convenience.

@seglo seglo force-pushed the seglo/KAFKA-8814 branch from 54b839a to 6b5914c Compare August 19, 2019 02:45
@seglo
Copy link
Copy Markdown
Member Author

seglo commented Aug 19, 2019

@ijuma I don't see benchmark related test results in the PR-triggered Jenkins build. Is there a benchmark build that can be run with this branch?

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seglo Thanks, this is pretty cool. I'm kind of debating whether this is a general enough need that it makes sense to add it the consumer performance tool. It is definitely useful to understand how pause/resume impacts performance, but it feels a bit too tailored to the consumer api. For example, we resume immediately after each poll rather than having a pause duration or something like that. We could also try to tie it to the data more closely. I think in streams, we use the pause api to control the maximum time lag between different partitions. Would it make sense to do something similar so that the benchmark could be more realistic?

@seglo
Copy link
Copy Markdown
Member Author

seglo commented Aug 20, 2019

@hachikuji Thanks for the reply. When I first started exploring the way to benchmark this work I had some reservations about modifying the consumer performance tool as well. It makes sense that the existing benchmarks use this tool, but it does place limits on the types of consumer scenarios that can be tested.

There does seem to be precedent to modify the tools for system testing. Some of the apps in org.apache.kafka.tools appear to exist just for this purpose (VerifiableConsumer, VerifiableLog4jAppender, VerifiableProducer). In TransactionalMessageCopier there's an argument called --enable-random-aborts which is only used for testing:

Whether or not to enable random transaction aborts (for system testing)

I like your idea about testing how the partition pauses affect Kafka Streams, but I'm not very familiar with the use case or if this fix has much impact for it. I can speak to how the Alpakka Kafka project will benefit from this fix. The consumer Source (which contains a Kafka Consumer) will always poll on a set interval, but it pauses partitions when there is no demand for records downstream (via akka streams back pressure). The source will still poll regularly to handle any offset commit acknowledgements that might be outstanding, but this would cause the consumer to throw away data pre-fetched data when partitions are paused due to back pressure.

IIRC the original issue was reported by LinkedIn WRT how Samza pauses partitions during its operation, but I'm not familiar with that use case either. I think there's value in demonstrating the performance gain with a low level test like this one because it's simpler to understand, but I agree that maybe it should avoid modifying the consumer performance tool.

Perhaps I could modify VerifiableConsumer instead to support this use case since it's only used for system testing? I could also create a new tool.

@seglo seglo force-pushed the seglo/KAFKA-8814 branch from 6b5914c to 6a6a665 Compare August 24, 2019 22:31
@seglo
Copy link
Copy Markdown
Member Author

seglo commented Sep 2, 2019

I looked at Kafka Streams partition pausing use cases, but I'm not sure how to use Kafka Streams in a way that would trigger lots of partition pause/resumes to demonstrate the issue like I have in this PR, or with external projects that use the KafkaConsumer. @mjsax @guozhangwang Do you have any ideas on how to structure a Kafka Streams perf test that would demonstrate the performance improvement from #6988 ?

I looked at org.apache.kafka.tools.VerifiableConsumer. It could be modified to support partition pausing like I've done with ConsumerPerformance, but it doesn't feel like an appropriate place to add it since it is generally used to assert consumer state rather than performance.

I considered making a copy ConsumerPerformance and stripping it down to only support partition pausing so that it's not exposed to end users through kafka-consumer-perf-test.sh, but this wouldn't be a very DRY implementation.

I think there is precedent for modifying the public-facing perf tools for system tests, as I mentioned in this comment: #7221 (comment)

@ijuma Do you have any suggestions?

@github-actions
Copy link
Copy Markdown

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions Bot added the stale Stale PRs label Nov 21, 2024
@github-actions
Copy link
Copy Markdown

This PR has been closed since it has not had any activity in 120 days. If you feel like this
was a mistake, or you would like to continue working on it, please feel free to re-open the
PR and ask for a review.

@github-actions github-actions Bot added the closed-stale PRs that were closed due to inactivity label Dec 22, 2024
@github-actions github-actions Bot closed this Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

closed-stale PRs that were closed due to inactivity stale Stale PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants