New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kafka] Improve example configuration #2079
Conversation
Collecting MeanRate from Kafka's Meter JMX Mbeans isn't terribly helpful for graphing in DD because the rate is the mean rate for the entire lifetime of the Kafka broker. [1] Instead of collecting the MeanRate, we could collect the load average style rates or we can simply grab the monotonically increasing counts and calculate our own rate from that raw counters. Since DD captures metrics at smaller time intervals than a minute, calcuating our own rate gives the user the most granular view of cluster activity so that's what the example configuration file will contain. In addition to fixing the collection of these rate metrics, the example configuration is also updated to collect nearly all the useful metrics recommended by Confluent for alerting and trending on a Kafka cluster. [2] [1] http://metrics.dropwizard.io/3.1.0/manual/core/#meters [2] http://docs.confluent.io/1.0.1/kafka/monitoring.html
This changes the Chef template to render a file that closely aligns with the example configuration provided with the dd-agent code. [1] More details on the changes in the configuration file can be found the dd-agent repository. With this change, the rendered file will only correctly collect metrics on brokers that are on 0.8.2 or higher. [1] DataDog/dd-agent#2079
Tests are passing here. The one flake8 error will be addressed by PR #2080 |
Thanks a lot @dougbarth ! @DorianZaccaria can you review this one please ? |
domain: 'kafka.controller' | ||
bean: 'kafka.controller:type=KafkaController,name=OfflinePartitionsCount' | ||
attribute: | ||
Count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only attribute available for this bean is Value
(using Kafka 0.8.2.2).
It makes sense to me to use the
Everything was tested on Kafka 0.8.2.2. |
Hi @dougbarth, have you been able to look at the comments above? We'd very much like to include these changes in the next release of the agent (5.7.0), which will be out soon. Thanks! |
Hi @dougbarth Have you had a chance to review the feedback here? Our merge window for 5.7 is closing soon and we'd love to include your improvements to the Kafka check. |
PR catch from @DorianZaccaria. Thank you!
@irabinovitch @olivielpeau this PR is updated. So sorry for the delay! @DorianZaccaria thank you very much for the detailed check against this PR. I appreciate it. |
Thanks again @dougbarth! Everything looks good, merging. |
[kafka] Improve example configuration
* Use `version` attribute in `kafka` recipe to version, i.e. select the appriopriate YAML configuration file. Two versions are currently available * `1` (Default): Legacy YAML configuration file, compatible with Kafka < 0.8.2. * `2`: Required for Kafka > 0.8.2, use the YAML configuration file introduced by DataDog/dd-agent#2079
Use `version` attribute in `kafka` recipe to version, i.e. select the appropriate YAML configuration file. Two versions are currently available: * `1` (Default): Legacy YAML configuration file, compatible with Kafka < 0.8.2. * `2`: Required for Kafka > 0.8.2, use the YAML configuration file introduced by DataDog/dd-agent#2079
Collecting MeanRate from Kafka's Meter JMX Mbeans isn't terribly helpful for graphing in DD because the rate is the mean rate for the entire lifetime of the Kafka broker.
Instead of collecting the MeanRate, we could collect the load average style rates or we can simply grab the monotonically increasing counts and calculate our own rate from that raw counters. Since DD captures metrics at smaller time intervals than a minute, calcuating our own rate gives the user the most granular view of cluster activity so that's what the example configuration file will contain.
In addition to fixing the collection of these rate metrics, the example configuration is also updated to collect nearly all the useful metrics recommended by Confluent for alerting and trending on a Kafka cluster.