Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update kafka.yaml: proper spacing & recent changes #163

Closed
wants to merge 1 commit into from
Closed

Update kafka.yaml: proper spacing & recent changes #163

wants to merge 1 commit into from

Conversation

donaldguy
Copy link

With previous spacing check throws exception:
InvalidJMXConfiguration("Each configuration must have an 'include' section.")

In particular incorporates
DataDog/dd-agent@75863d8 - counters-> guages,
DataDog/dd-agent@5382422 - jmxfetch custom tags
and DataDog/dd-agent@8bac88e - support for jmxfetch0.4

With previous spacing check throws exception:
  InvalidJMXConfiguration("Each configuration must have an 'include' section.")

In particular incorporates
DataDog/dd-agent@75863d8 - counters-> guages,
DataDog/dd-agent@5382422 - jmxfetch custom tags
and DataDog/dd-agent@8bac88e - support for jmxfetch0.4
@donaldguy
Copy link
Author

Confusingly using this I am seeing


  Checks
  ======

    network
    -------
      - instance #0 [OK]
      - Collected 0 metrics, 0 events & 1 service check

    kafka
    -----
      - instance #kafka-kafka-2ceb74c0-9999 [OK] collected 22 metrics
      - Collected 22 metrics, 0 events & 0 service checks

    kafka
    -----
      - initialize check class [ERROR]: InvalidJMXConfiguration("Each configuration must have an 'include' section. See http://docs.datadoghq.com/integrations/java/ for more informa

but the latter error kafka section persists even if I delete the yaml from the config altogether and restart the agent. On one of our brokers I tried uninstalling the dd-agent altogether and it still came back ... I don't know if there is still something wrong with the config or if this is just some weird stickiness from previously loading the bad yaml.

Even so, this PR represents a better state than the current master

@remh
Copy link
Contributor

remh commented Dec 18, 2014

Hum that's weird.
Can you stop the agent and delete the file
/tmp/jmx_status_python.yaml

And then restart the agent ? It should fix the issue.

@donaldguy
Copy link
Author

Mhmm deleting that file cleared out that secion, but I have a

jmx
    ---
      - instance #0 [ERROR]: "JMXfetch didn't return any metrics during the last minute"
      - Collected 0 metrics, 0 events & 0 service checks

Notably it does seem like the jmxfetch bit is not getting stopped with the supervisor

@donaldguy
Copy link
Author

cleaning up and starting over this pr seems to go clean... don't know if it lasts

[deploy@kafka-c3948729 ~]$ pgrep -fa datadog
11730 java -classpath /opt/datadog-agent/agent/checks/libs/jmxfetch-0.4.0-jar-with-dependencies.jar org.datadog.jmxfetch.App --check kafka.yaml --check_period 15000 --conf_director$
 /etc/dd-agent/conf.d --log_level INFO --log_location /var/log/datadog/jmxfetch.log --reporter statsd:8125 --status_location /tmp/jmx_status.yaml collect
16882 java -classpath /opt/datadog-agent/agent/checks/libs/jmxfetch-0.4.0-jar-with-dependencies.jar org.datadog.jmxfetch.App --check kafka.yaml --check_period 15000 --conf_directory
 /etc/dd-agent/conf.d --log_level INFO --log_location /var/log/datadog/jmxfetch.log --reporter statsd:8125 --status_location /tmp/jmx_status.yaml collect
[deploy@kafka-c3948729 ~]$ kill 11730
kill: kill 11730 failed: operation not permitted
[deploy@kafka-c3948729 ~]$ sudo kill 11730
[deploy@kafka-c3948729 ~]$ sudo kill 16882
[deploy@kafka-c3948729 ~]$ pgrep -fa datadog
[deploy@kafka-c3948729 ~]$ sudo service datadog-agent
[deploy@kafka-c3948729 ~]$ rm /tmp/jmx_status.yaml /tmp/jmxfetch.pid
rm: remove write-protected regular file '/tmp/jmx_status.yaml'? y
rm: cannot remove '/tmp/jmx_status.yaml': Operation not permitted
rm: remove write-protected regular file '/tmp/jmxfetch.pid'? y
rm: cannot remove '/tmp/jmxfetch.pid': Operation not permitted
[deploy@kafka-c3948729 ~]$ sudo rm /tmp/jmx_status.yaml /tmp/jmxfetch.pid
[deploy@kafka-c3948729 ~]$ sudo service datadog-agent start
 * Starting Datadog Agent (using supervisord) datadog-agent
   ...done.

info now shows

...
 Checks
  ======

    network
    -------
      - instance #0 [OK]
      - Collected 0 metrics, 0 events & 1 service check

    kafka
    -----
      - instance #kafka-kafka-c3948729-9999 [OK] collected 22 metrics
      - Collected 22 metrics, 0 events & 0 service checks


  Emitters
...

@donaldguy
Copy link
Author

Hmmm ...whereas on the two other brokers in the cluster it seems like stopping the supervisor did kill the jmxfetch okay (and delete those files in tmp to boot). Restarting the agent gave a clean status (whereas status was double-kafka-ed and one jmxfetch was running before the stop)

I think I used the most probelmatic of the nodes to do interactive testing of the PR; I may have run sudo -u dd-agent dd-agent check kafka or similar. It seems like somewhere along the line two jmxfetches ended up running and I believe the pid file pointed to the orphaned one? ... I already deleted it

Anyway I think its a transient issue.

@miketheman miketheman added the bug label Dec 30, 2014
@miketheman
Copy link
Contributor

Hi @donaldguy,
Thanks for continuing to test this out. I recently pushed a change to the jmx template and associated tests - would you mind taking a look at that and see if you could add some testing around this particular recipe and template to assist with this PR?

c2e0dbf
https://github.com/DataDog/chef-datadog/tree/master/test/integration/dd-agent-jmx/serverspec

@donaldguy
Copy link
Author

I'm a little confused about appropriate scope

What in particular do you think should be tested? Should I constrain it to testing the contents of the yaml file or should it encompass setting up a node with a toy kafka broker as well?

Put another way, there are existing bats tests for kafka; why are they insufficient?

Seperately there are probably other JMX metrics worth gathering - in particular I probably want to alert on under-replicated partitions. (and maybe encompass all that the LinkedIn folks highlight in https://kafka.apache.org/documentation.html#monitoring ); could I add these here, or do they need to get back to the dd-agent repo ? or should I just aim at an other_beans attribute or something?

CC: @remh @conorbranagan on the latter

@miketheman miketheman added this to the Next minor milestone Jan 26, 2015
@miketheman
Copy link
Contributor

HI @donaldguy!
Thanks for the contribution - as you can see, your patch was merged in to master. I've added the serverspec tests style that will pass in the test-kitchen minimal environment - that's what I was referring to - bats doesn't always cut it and is harder to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants