KAFKA-4000: Collect and record per-topic consumer metrics #1684

vahidhashemian · 2016-07-29T22:27:02Z

Improve consumer metric collection by collecting and recording metrics per topic.

vahidhashemian · 2016-07-29T22:28:13Z

@hachikuji Is this fix close to what you had in mind?

hachikuji · 2016-07-29T22:37:27Z

@vahidhashemian I'm not sure this quite does it. It looks like this is going to keep aggregating across multiple fetches. I was actually wondering if there was a way to reuse FetchResponseMetricAggregator to accumulate the metrics for each topic. For example, maybe we just need to move the map from topic to FetchMetrics that you have here into that class. Then we can increment it accordingly in calls to record(). Would that work?

vahidhashemian · 2016-07-29T22:46:48Z

@hachikuji Thanks for the quick feedback. I believe this line would make it aggregate within the same fetch only. But I'm going to give what you suggested a try and update the PR. Thanks again.

hachikuji · 2016-07-29T22:50:55Z

Hmm... that still doesn't seem quite right. That resets the counts on every call to fetchedRecords, but the same fetch response could be split across several calls, right?

vahidhashemian · 2016-07-30T00:14:25Z

OK. I think I'm still missing something. Based on what you mentioned, it seems we would want to collect metrics at a higher level (at fetch response levels instead of fetchedRecords).

In your first comment I believe you were referring to this record() call (as a member of FetchResponseMetricAggregator), and not the call to recordTopicFetchMetrics() which I thought was the culprit. That record() method doesn't seem to aggregate metrics per partition. It just aggregates across all partitions in the fetch response. Do we want to change that and collect per topic, or just add per-topic metric aggregation too?

Thanks a lot in advance for clearing this up.

hachikuji · 2016-08-01T16:39:56Z

@vahidhashemian Just to clarify, a single fetch response contains fetch data from multiple partitions, some of which may be from the same topic. My interpretation of the topic-level fetch metrics is that they should be recording the per-topic number of records/bytes fetched from the entire fetch response rather than from each partition separately (does this seem correct?). That means we need to aggregate these stats by topic for each fetch response. The tricky thing is that we now parse the fetch data from each partition separately, so we need somewhere to store the stats so that they can be aggregated incrementally. That is the purpose of FetchResponseMetricAggregator. It contains a set of all the partitions that were contained in the fetch response and as we parse the data from each partition, we remove that partition from the set. Once all partitions have been parsed, we can record the fetch metrics. So my thought was that maybe we only needed to extend that idea. In addition to keeping the total number of bytes/records from the fetch response in FetchResponseMetricAggregator, we can also track the per-topic tallies. Would that make sense or not?

vahidhashemian · 2016-08-01T20:27:30Z

@hachikuji Thanks for explaining in details how you envision these metrics should be collected at topic level. It makes sense. Other than tracking, I assume we also need to "record" these per-topic metrics; perhaps through per-topic sensors. This can be done at the same time partition metrics are recorded (here). Would you agree?

hachikuji · 2016-08-01T20:50:30Z

@vahidhashemian Yep, I think that will work.

vahidhashemian · 2016-08-01T23:22:52Z

Thanks @hachikuji. I hope this better aligns with what you explained above.

hachikuji · 2016-08-02T00:17:22Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

@@ -704,14 +712,22 @@ public CompletedFetch(TopicPartition partition,
     */
    private static class FetchResponseMetricAggregator {
        private final FetchManagerMetrics sensors;
+        private final Map<String, FetchManagerMetrics> topicSensors;


Is this necessary? There should only ever be one instance of FetchManagerMetrics, and it already exposes a method recordTopicFetchMetrics.

Aah, you're right. I overlooked that existing method. Thanks.

vahidhashemian · 2016-08-02T16:40:41Z

Thanks again @hachikuji for your feedback. The PR is updated.

vahidhashemian · 2016-08-02T16:49:12Z

One question. I think we may not need this line anymore now that we are properly recording fetch metrics for each topic. What do you think?

hachikuji · 2016-08-02T17:19:01Z

@vahidhashemian Yeah, good call. We should definitely remove that line.

hachikuji · 2016-08-02T17:40:14Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

-            if (unrecordedPartitions.isEmpty()) {
+            // collect and aggregate per-topic metrics
+            String topic = partition.topic();
+            if (!this.topicFetchMetrics.containsKey(topic))


This is a micro-optimization, so feel free to ignore. An alternative idiom is to call map.get() and check the result against null. This works because we know the value will never be null and it saves a hash lookup.

Sure, it makes sense. I'll update the PR.

hachikuji · 2016-08-02T17:59:41Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

+            String topic = partition.topic();
+            if (this.topicFetchMetrics.get(topic) == null)
+                this.topicFetchMetrics.put(topic, new FetchMetrics());
+            FetchMetrics topicFetchMetric = this.topicFetchMetrics.get(topic);


Haha, to clarify, I was trying to suggest moving this line above null check. Something like this:

FetchMetrics topicFetchMetric = this.topicFetchMetrics.get(topic); if (topicFetchMetric == null) { topicFetchMetric = new FetchMetrics(); this.topicFetchMetrics.put(topic, topicFetchMetric); }

Aah, right, sorry I missed the redundant call. Will fix it now. Thanks.

hachikuji · 2016-08-02T19:47:41Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java

@@ -706,8 +705,18 @@ public CompletedFetch(TopicPartition partition,
        private final FetchManagerMetrics sensors;
        private final Set<TopicPartition> unrecordedPartitions;

-        private int totalBytes;
-        private int totalRecords;
+        private static class FetchMetrics {


nitpick: could we move this below the record() function? It would be nice to keep the class fields together.

hachikuji · 2016-08-02T19:48:19Z

Left a minor comment, but overall LGTM. Maybe @ijuma can take a look when he has time.

Improve consumer metric collection by collecting and recording metrics per topic.

vahidhashemian · 2016-08-02T19:56:58Z

@hachikuji I moved the class as you suggested. Thanks for all the feedback on this PR.

vahidhashemian · 2016-12-01T23:22:46Z

@hachikuji A while back you gave your blessings on this PR after some reviews, but it was before you had the "commit" superpower. It would be great if we can merge it if you see no issues.

This one is also related, but not reviewed yet.

hachikuji · 2016-12-09T22:52:27Z

@vahidhashemian Thanks for the remainder (even if I got to it a week late). Took me a while to remember what this issue was about, but the patch still LGTM.

hachikuji · 2016-12-09T23:14:26Z

BTW, I made a few minor visibility tweaks before merging.

asfbot · 2016-12-09T23:32:52Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/54/
Test PASSed (JDK 8 and Scala 2.12).

asfbot · 2016-12-10T00:03:46Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/55/
Test FAILed (JDK 8 and Scala 2.11).

asfbot · 2016-12-10T00:42:59Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/53/
Test PASSed (JDK 7 and Scala 2.10).

Improve consumer metric collection by collecting and recording metrics per topic. Author: Vahid Hashemian <vahidhashemian@us.ibm.com> Reviewers: Jason Gustafson <jason@confluent.io> Closes apache#1684 from vahidhashemian/KAFKA-4000

vahidhashemian force-pushed the KAFKA-4000 branch 3 times, most recently from abb65f6 to fe343bb Compare August 1, 2016 23:18

vahidhashemian changed the title ~~KAFKA-4000: Aggregate partitions of each topic for consumer metrics~~ KAFKA-4000: Collect and record per-topic consumer metrics Aug 1, 2016

hachikuji reviewed Aug 2, 2016
View reviewed changes

vahidhashemian force-pushed the KAFKA-4000 branch from fe343bb to 145b3d7 Compare August 2, 2016 00:36

vahidhashemian force-pushed the KAFKA-4000 branch from 145b3d7 to 4db6a15 Compare August 2, 2016 17:25

hachikuji reviewed Aug 2, 2016
View reviewed changes

vahidhashemian force-pushed the KAFKA-4000 branch from 4db6a15 to a790756 Compare August 2, 2016 17:49

hachikuji reviewed Aug 2, 2016
View reviewed changes

vahidhashemian force-pushed the KAFKA-4000 branch from a790756 to 7c0d5b9 Compare August 2, 2016 18:07

hachikuji reviewed Aug 2, 2016
View reviewed changes

KAFKA-4000: Collect and record per-topic consumer metrics

ef8332d

Improve consumer metric collection by collecting and recording metrics per topic.

vahidhashemian force-pushed the KAFKA-4000 branch from 7c0d5b9 to ef8332d Compare August 2, 2016 19:55

asfgit closed this in 7f8edbc Dec 9, 2016

efeg pushed a commit to efeg/kafka that referenced this pull request May 29, 2024

Bump version for cruise-control-client changes (apache#1684)

f81c81b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-4000: Collect and record per-topic consumer metrics #1684

KAFKA-4000: Collect and record per-topic consumer metrics #1684

vahidhashemian commented Jul 29, 2016 •

edited

Loading

vahidhashemian commented Jul 29, 2016

hachikuji commented Jul 29, 2016

vahidhashemian commented Jul 29, 2016

hachikuji commented Jul 29, 2016

vahidhashemian commented Jul 30, 2016

hachikuji commented Aug 1, 2016

vahidhashemian commented Aug 1, 2016

hachikuji commented Aug 1, 2016

vahidhashemian commented Aug 1, 2016

hachikuji Aug 2, 2016

vahidhashemian Aug 2, 2016

vahidhashemian commented Aug 2, 2016

vahidhashemian commented Aug 2, 2016

hachikuji commented Aug 2, 2016

hachikuji Aug 2, 2016

vahidhashemian Aug 2, 2016

hachikuji Aug 2, 2016

vahidhashemian Aug 2, 2016

hachikuji Aug 2, 2016

hachikuji commented Aug 2, 2016

vahidhashemian commented Aug 2, 2016

vahidhashemian commented Dec 1, 2016

hachikuji commented Dec 9, 2016

hachikuji commented Dec 9, 2016

asfbot commented Dec 9, 2016

asfbot commented Dec 10, 2016

asfbot commented Dec 10, 2016

KAFKA-4000: Collect and record per-topic consumer metrics #1684

KAFKA-4000: Collect and record per-topic consumer metrics #1684

Conversation

vahidhashemian commented Jul 29, 2016 • edited Loading

vahidhashemian commented Jul 29, 2016

hachikuji commented Jul 29, 2016

vahidhashemian commented Jul 29, 2016

hachikuji commented Jul 29, 2016

vahidhashemian commented Jul 30, 2016

hachikuji commented Aug 1, 2016

vahidhashemian commented Aug 1, 2016

hachikuji commented Aug 1, 2016

vahidhashemian commented Aug 1, 2016

hachikuji Aug 2, 2016

Choose a reason for hiding this comment

vahidhashemian Aug 2, 2016

Choose a reason for hiding this comment

vahidhashemian commented Aug 2, 2016

vahidhashemian commented Aug 2, 2016

hachikuji commented Aug 2, 2016

hachikuji Aug 2, 2016

Choose a reason for hiding this comment

vahidhashemian Aug 2, 2016

Choose a reason for hiding this comment

hachikuji Aug 2, 2016

Choose a reason for hiding this comment

vahidhashemian Aug 2, 2016

Choose a reason for hiding this comment

hachikuji Aug 2, 2016

Choose a reason for hiding this comment

hachikuji commented Aug 2, 2016

vahidhashemian commented Aug 2, 2016

vahidhashemian commented Dec 1, 2016

hachikuji commented Dec 9, 2016

hachikuji commented Dec 9, 2016

asfbot commented Dec 9, 2016

asfbot commented Dec 10, 2016

asfbot commented Dec 10, 2016

vahidhashemian commented Jul 29, 2016 •

edited

Loading