-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-4000: Collect and record per-topic consumer metrics #1684
Conversation
@hachikuji Is this fix close to what you had in mind? |
@vahidhashemian I'm not sure this quite does it. It looks like this is going to keep aggregating across multiple fetches. I was actually wondering if there was a way to reuse |
@hachikuji Thanks for the quick feedback. I believe this line would make it aggregate within the same fetch only. But I'm going to give what you suggested a try and update the PR. Thanks again. |
Hmm... that still doesn't seem quite right. That resets the counts on every call to |
OK. I think I'm still missing something. Based on what you mentioned, it seems we would want to collect metrics at a higher level (at fetch response levels instead of In your first comment I believe you were referring to this Thanks a lot in advance for clearing this up. |
@vahidhashemian Just to clarify, a single fetch response contains fetch data from multiple partitions, some of which may be from the same topic. My interpretation of the topic-level fetch metrics is that they should be recording the per-topic number of records/bytes fetched from the entire fetch response rather than from each partition separately (does this seem correct?). That means we need to aggregate these stats by topic for each fetch response. The tricky thing is that we now parse the fetch data from each partition separately, so we need somewhere to store the stats so that they can be aggregated incrementally. That is the purpose of |
@hachikuji Thanks for explaining in details how you envision these metrics should be collected at topic level. It makes sense. Other than tracking, I assume we also need to "record" these per-topic metrics; perhaps through per-topic sensors. This can be done at the same time partition metrics are recorded (here). Would you agree? |
@vahidhashemian Yep, I think that will work. |
abb65f6
to
fe343bb
Compare
Thanks @hachikuji. I hope this better aligns with what you explained above. |
@@ -704,14 +712,22 @@ public CompletedFetch(TopicPartition partition, | |||
*/ | |||
private static class FetchResponseMetricAggregator { | |||
private final FetchManagerMetrics sensors; | |||
private final Map<String, FetchManagerMetrics> topicSensors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? There should only ever be one instance of FetchManagerMetrics
, and it already exposes a method recordTopicFetchMetrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, you're right. I overlooked that existing method. Thanks.
fe343bb
to
145b3d7
Compare
Thanks again @hachikuji for your feedback. The PR is updated. |
One question. I think we may not need this line anymore now that we are properly recording fetch metrics for each topic. What do you think? |
@vahidhashemian Yeah, good call. We should definitely remove that line. |
145b3d7
to
4db6a15
Compare
if (unrecordedPartitions.isEmpty()) { | ||
// collect and aggregate per-topic metrics | ||
String topic = partition.topic(); | ||
if (!this.topicFetchMetrics.containsKey(topic)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a micro-optimization, so feel free to ignore. An alternative idiom is to call map.get()
and check the result against null. This works because we know the value will never be null and it saves a hash lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it makes sense. I'll update the PR.
4db6a15
to
a790756
Compare
String topic = partition.topic(); | ||
if (this.topicFetchMetrics.get(topic) == null) | ||
this.topicFetchMetrics.put(topic, new FetchMetrics()); | ||
FetchMetrics topicFetchMetric = this.topicFetchMetrics.get(topic); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, to clarify, I was trying to suggest moving this line above null check. Something like this:
FetchMetrics topicFetchMetric = this.topicFetchMetrics.get(topic);
if (topicFetchMetric == null) {
topicFetchMetric = new FetchMetrics();
this.topicFetchMetrics.put(topic, topicFetchMetric);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, right, sorry I missed the redundant call. Will fix it now. Thanks.
a790756
to
7c0d5b9
Compare
@@ -706,8 +705,18 @@ public CompletedFetch(TopicPartition partition, | |||
private final FetchManagerMetrics sensors; | |||
private final Set<TopicPartition> unrecordedPartitions; | |||
|
|||
private int totalBytes; | |||
private int totalRecords; | |||
private static class FetchMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: could we move this below the record() function? It would be nice to keep the class fields together.
Left a minor comment, but overall LGTM. Maybe @ijuma can take a look when he has time. |
Improve consumer metric collection by collecting and recording metrics per topic.
7c0d5b9
to
ef8332d
Compare
@hachikuji I moved the class as you suggested. Thanks for all the feedback on this PR. |
@hachikuji A while back you gave your blessings on this PR after some reviews, but it was before you had the "commit" superpower. It would be great if we can merge it if you see no issues. This one is also related, but not reviewed yet. |
@vahidhashemian Thanks for the remainder (even if I got to it a week late). Took me a while to remember what this issue was about, but the patch still LGTM. |
BTW, I made a few minor visibility tweaks before merging. |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Improve consumer metric collection by collecting and recording metrics per topic. Author: Vahid Hashemian <vahidhashemian@us.ibm.com> Reviewers: Jason Gustafson <jason@confluent.io> Closes apache#1684 from vahidhashemian/KAFKA-4000
Improve consumer metric collection by collecting and recording metrics per topic.