-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-3715: add granular metrics per node #1446
Conversation
…register non latency metrics in StreamsMetrics From apache#1362 (comment) We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging. We can consider adding rate / total cumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging. We can consider adding metrics for each stream partition's timestamp. This is helpful in debugging. Besides the latency metrics, we can also add throughput latency in terms of source records consumed. More discussions here https://issues.apache.org/jira/browse/KAFKA-3715
…register non latency metrics in StreamsMetrics From apache#1362 (comment) We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging. We can consider adding rate / total cumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging. We can consider adding metrics for each stream partition's timestamp. This is helpful in debugging. Besides the latency metrics, we can also add throughput latency in terms of source records consumed. More discussions here https://issues.apache.org/jira/browse/KAFKA-3715
…register non latency metrics in StreamsMetrics From apache#1362 (comment) We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging. We can consider adding rate / total cumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging. We can consider adding metrics for each stream partition's timestamp. This is helpful in debugging. Besides the latency metrics, we can also add throughput latency in terms of source records consumed. More discussions here https://issues.apache.org/jira/browse/KAFKA-3715
…register non latency metrics in StreamsMetrics From apache#1362 (comment) ******************************************************************** We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging. We can consider adding rate / total accumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging. We can consider adding metrics for each stream partition's timestamp. This is helpful in debugging. Besides the latency metrics, we can also add throughput latency in terms of source records consumed. ******************************************************** More discussions here https://issues.apache.org/jira/browse/KAFKA-3715
@guozhangwang, what do you think? I was able to run the examples and see the metrics per node in a jmx console. |
Thanks @aartigupta , @enothereska could you take a look first at this ticket? I have assigned you as the reviewer on the ticket, and please feel free to re-assign to me otherwise. |
|
||
fetcherThread.shutdown() | ||
} | ||
|
||
private def allMetricsNames = Metrics.defaultRegistry().allMetrics().asScala.keySet.map(_.getName) | ||
private def allMetricNames = Metrics.defaultRegistry().allMetrics().asScala.keySet.map(_.getName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These name changes are not strictly part of this fix, I'm wondering if we can open a MINOR pr for these while having this PR focus on streams only (to avoid confusion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, theses were not intended for this fix, they managed to sneak their way in. My bad, fixed it now
@aartigupta perhaps the PR name should be "KAFKA-3715: add granular metrics per node"? The JIRA number is usually part of the PR name. Minor thing but just for consistency. |
…register non latency metrics in StreamsMetrics From apache#1362 (comment) ******************************************************************** We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging. We can consider adding rate / total accumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging. We can consider adding metrics for each stream partition's timestamp. This is helpful in debugging. Besides the latency metrics, we can also add throughput latency in terms of source records consumed. ******************************************************** More discussions here https://issues.apache.org/jira/browse/KAFKA-3715
|
||
|
||
public NodeMetricsImpl(StreamsMetrics metrics, String | ||
name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line break here seems unnecessary.
Thanks @aartigupta . Two higher level questions: does it make sense to add a unit test or two for the new metrics? And do we have any overhead measurements in the sense of how much to the new recordings add to the end to end latency? |
Ran org.apache.kafka.streams.perf.SimpleBenchmark with the following configuration (i.e. without state store backed streams and simple print statements indicating which part of the benchmark is being run)
// benchmark.processStreamWithStateStore(); then attached a yourkit profiler and saw the following differences (see attached screenshots) without any changes to the code and using CPU Sampling in yourkit saw 61% cpu contention without any changes org.apache.kafka.streams.perf.SimpleBenchmark with node metrics Ran this multiple times and the results varied between 63%(no changes) and 72%(with per node metrics) The difference seems to be around the point at which yourkit profiler is attached That said, not sure if this is a valid load simulating scenario
@guozhangwang Is the simpleBenchmark a good scenario to be profiling ? Still working on the unit tests for per node metrics. |
hey @aartigupta it's kind of hard to tell based on your screenshots where the time is going since I don't see any drilldown into the call stacks of the StreamThread run loops. It's probably necessary for you to flip things on in the YourKit profiler so you can get the full call stacks and determine if |
Thanks @aartigupta , some general comments:
So we need to make sure that the hierarchy is sufficient for different reporters to differentiate these metrics in their own space. |
Btw the |
@guozhangwang Mackbook 12 inch 2015 early edition, 1.3GHz dual-core Intel Core M processor (Turbo Boost up to 2.9GHz) with 4MB shared L3 cache. producer Process finished with exit code 0 |
Is there a way to register user-defined metrics? |
@aartigupta would you still have time for this PR or should I have a look? Thanks. |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
* Record a value with this sensor | ||
* @param value The value to record | ||
* @throws QuotaViolationException if recording this value moves a metric beyond its configured maximum or minimum | ||
* Record a name with this sensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? Ditto below.
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
unrelated: kafka.api.SslProducerSendTest.testCloseWithZeroTimeoutFromSenderThread |
The old org.apache.kafka.streams.integration.ResetIntegrationTest failure is back but shouldn't be related to PR. |
Merged to trunk. Many thanks to @aartigupta and @enothereska !! |
… logging levels to Metrics Kafka Streams: add granular metrics per node and per task, also expose ability to register non latency metrics in StreamsMetrics Also added different recording levels to Metrics. This is joint contribution from Eno Thereska and Aarti Gupta. from #1362 (comment) We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging. We can consider adding rate / total cumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging. We can consider adding metrics for each stream partition timestamp. This is helpful in debugging. ## Besides the latency metrics, we can also add throughput latency in terms of source records consumed. More discussions here https://issues.apache.org/jira/browse/KAFKA-3715, KIP-104, KIP-105 Author: Eno Thereska <eno@confluent.io> Author: Aarti Gupta <aartiguptaa@gmail.com> Reviewers: Greg Fodor, Ismael Juma, Damian Guy, Guozhang Wang Closes #1446 from aartigupta/trunk
Refer to this link for build results (access rights to CI server needed): |
… logging levels to Metrics Kafka Streams: add granular metrics per node and per task, also expose ability to register non latency metrics in StreamsMetrics Also added different recording levels to Metrics. This is joint contribution from Eno Thereska and Aarti Gupta. from apache#1362 (comment) We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging. We can consider adding rate / total cumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging. We can consider adding metrics for each stream partition timestamp. This is helpful in debugging. ## Besides the latency metrics, we can also add throughput latency in terms of source records consumed. More discussions here https://issues.apache.org/jira/browse/KAFKA-3715, KIP-104, KIP-105 Author: Eno Thereska <eno@confluent.io> Author: Aarti Gupta <aartiguptaa@gmail.com> Reviewers: Greg Fodor, Ismael Juma, Damian Guy, Guozhang Wang Closes apache#1446 from aartigupta/trunk
Kafka Streams: add granular metrics per node, also expose ability to …
…register non latency metrics in StreamsMetrics
from #1362 (comment)
We can consider adding metrics for process / punctuate / commit rate at the granularity of each processor node in addition to the global rate mentioned above. This is very helpful in debugging.
We can consider adding rate / total cumulated metrics for context.forward indicating how many records were forwarded downstream from this processor node as well. This is helpful in debugging.
We can consider adding metrics for each stream partition's timestamp.
This is helpful in debugging.
Besides the latency metrics, we can also add throughput latency in terms of source records consumed.
More discussions here https://issues.apache.org/jira/browse/KAFKA-3715