Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kafka] Improve example configuration #2079

Merged
merged 2 commits into from Feb 19, 2016

Conversation

dougbarth
Copy link
Contributor

Collecting MeanRate from Kafka's Meter JMX Mbeans isn't terribly helpful for graphing in DD because the rate is the mean rate for the entire lifetime of the Kafka broker.

Instead of collecting the MeanRate, we could collect the load average style rates or we can simply grab the monotonically increasing counts and calculate our own rate from that raw counters. Since DD captures metrics at smaller time intervals than a minute, calcuating our own rate gives the user the most granular view of cluster activity so that's what the example configuration file will contain.

In addition to fixing the collection of these rate metrics, the example configuration is also updated to collect nearly all the useful metrics recommended by Confluent for alerting and trending on a Kafka cluster.

Collecting MeanRate from Kafka's Meter JMX Mbeans isn't terribly helpful
for graphing in DD because the rate is the mean rate for the entire
lifetime of the Kafka broker. [1]

Instead of collecting the MeanRate, we could collect the load average
style rates or we can simply grab the monotonically increasing counts
and calculate our own rate from that raw counters. Since DD captures
metrics at smaller time intervals than a minute, calcuating our own rate
gives the user the most granular view of cluster activity so that's what
the example configuration file will contain.

In addition to fixing the collection of these rate metrics, the example
configuration is also updated to collect nearly all the useful metrics
recommended by Confluent for alerting and trending on a Kafka cluster.
[2]

[1] http://metrics.dropwizard.io/3.1.0/manual/core/#meters
[2] http://docs.confluent.io/1.0.1/kafka/monitoring.html
dougbarth pushed a commit to dougbarth/chef-datadog that referenced this pull request Nov 13, 2015
This changes the Chef template to render a file that closely aligns with
the example configuration provided with the dd-agent code. [1] More
details on the changes in the configuration file can be found the
dd-agent repository.

With this change, the rendered file will only correctly collect metrics
on brokers that are on 0.8.2 or higher.

[1] DataDog/dd-agent#2079
@irabinovitch
Copy link
Contributor

Tests are passing here. The one flake8 error will be addressed by PR #2080

@remh
Copy link
Contributor

remh commented Nov 18, 2015

Thanks a lot @dougbarth !

@DorianZaccaria can you review this one please ?

@remh remh added this to the 5.7.0 milestone Nov 18, 2015
domain: 'kafka.controller'
bean: 'kafka.controller:type=KafkaController,name=OfflinePartitionsCount'
attribute:
Count:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only attribute available for this bean is Value (using Kafka 0.8.2.2).

@DorianZaccaria
Copy link
Contributor

It makes sense to me to use the Count as a rate. The only issue I found was some metrics trying to fetch the Count attribute of some beans instead of the Value. Here the list of these metrics:

  • kafka.replication.offline_partitions_count
  • kafka.replication.active_controller_count
  • kafka.replication.partition_count
  • kafka.replication.leader_count

Everything was tested on Kafka 0.8.2.2.

@olivielpeau
Copy link
Member

Hi @dougbarth, have you been able to look at the comments above?

We'd very much like to include these changes in the next release of the agent (5.7.0), which will be out soon.

Thanks!

@irabinovitch
Copy link
Contributor

Hi @dougbarth Have you had a chance to review the feedback here? Our merge window for 5.7 is closing soon and we'd love to include your improvements to the Kafka check.

PR catch from @DorianZaccaria. Thank you!
@dougbarth
Copy link
Contributor Author

@irabinovitch @olivielpeau this PR is updated. So sorry for the delay!

@DorianZaccaria thank you very much for the detailed check against this PR. I appreciate it.

@olivielpeau
Copy link
Member

Thanks again @dougbarth! Everything looks good, merging.

olivielpeau added a commit that referenced this pull request Feb 19, 2016
[kafka] Improve example configuration
@olivielpeau olivielpeau merged commit 0e5fa01 into DataDog:master Feb 19, 2016
degemer added a commit to DataDog/chef-datadog that referenced this pull request Aug 11, 2016
* Use `version` attribute in `kafka` recipe to version, i.e.
  select the appriopriate YAML configuration file.
  Two versions are currently available
  * `1` (Default): Legacy YAML configuration file, compatible with
    Kafka < 0.8.2.
  * `2`: Required for Kafka > 0.8.2, use the YAML configuration file
  introduced by DataDog/dd-agent#2079
olivielpeau pushed a commit to DataDog/chef-datadog that referenced this pull request Sep 9, 2016
Use `version` attribute in `kafka` recipe to version, i.e.
select the appropriate YAML configuration file.

Two versions are currently available:

* `1` (Default): Legacy YAML configuration file, compatible with
    Kafka < 0.8.2.
* `2`: Required for Kafka > 0.8.2, use the YAML configuration file
  introduced by DataDog/dd-agent#2079
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants