Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diamond Ceph Stats not received in calamari #384

Open
drolfe opened this issue Jan 18, 2016 · 7 comments
Open

Diamond Ceph Stats not received in calamari #384

drolfe opened this issue Jan 18, 2016 · 7 comments

Comments

@drolfe
Copy link

drolfe commented Jan 18, 2016

Everything is working except the ceph and pool graph stats in the calamari gui, the host stats are working fine

image

image

image

root@calamari:~# dpkg -l | egrep -i "calamari|salt" | awk '{print $2 "\t\t" $3}'
calamari-clients        1.3.1.1-1trusty
calamari-server         1.3.0.1-11-g9fb65ae
salt-common             2014.7.5+ds-1ubuntu1
salt-master             2014.7.5+ds-1ubuntu1
salt-minion             2014.7.5+ds-1ubuntu1
root@calamari:~#
root@ceph1:~# dpkg -l | egrep -i "ceph|salt|diamond" | awk '{print $2 "\t\t" $3}'
ceph                    9.2.0-1trusty
ceph-common             9.2.0-1trusty
ceph-mds                9.2.0-1trusty
diamond                 3.4.67
libcephfs1              9.2.0-1trusty
python-cephfs           9.2.0-1trusty
python-rados            9.2.0-1trusty
python-rbd              9.2.0-1trusty
salt-common             0.17.5+ds-1
salt-minion             0.17.5+ds-1
root@ceph1:~#

Let me know what more I should be checking

Regards, Daniel

@drolfe
Copy link
Author

drolfe commented Jan 24, 2016

Note I've build the latest git stable deb packages via vagrant, still with the same issue

root@calamari:~# dpkg -l | egrep -i "calamari|salt|romana" | awk '{print $2 "\t\t" $3}'
calamari-server         1.3.1.1-105-g79c8df2-1trusty
romana                  1.2.2-36-gc62bb5b
salt-common             2014.7.5+ds-1ubuntu1
salt-master             2014.7.5+ds-1ubuntu1
salt-minion             2014.7.5+ds-1ubuntu1
root@calamari:~#

Also on the client I've matched the salt versions which is recommended

root@ceph1:~# dpkg -l | egrep -i "salt|diamond" | awk '{print $2 "\t\t" $3}'
diamond                 3.4.67
salt-common             2014.7.5+ds-1ubuntu1
salt-minion             2014.7.5+ds-1ubuntu1
root@ceph1:~#

@drolfe
Copy link
Author

drolfe commented Jan 24, 2016

Doing a server diamond restart show the below:

root@ceph1:~# tail -f /var/log/diamond/diamond.log
[2016-01-24 04:19:21,039] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,043] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,044] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,046] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,056] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,074] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:22,252] [Thread-1] Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/diamond/collector.py", line 412, in _run
    self.collect()
  File "/usr/share/diamond/collectors/ceph/ceph.py", line 464, in collect
    self._collect_service_stats(path)
  File "/usr/share/diamond/collectors/ceph/ceph.py", line 450, in _collect_service_stats
    self._publish_stats(counter_prefix, stats, schema, GlobalName)
  File "/usr/share/diamond/collectors/ceph/ceph.py", line 305, in _publish_stats
    assert path[-1] == 'type'
AssertionError
^C

root@ceph1:~# md5sum /usr/lib/pymodules/python2.7/diamond/collector.py
08bb05a483fa3d1d64c0ebf690259a05  /usr/lib/pymodules/python2.7/diamond/collector.py
root@ceph1:~# md5sum /usr/share/diamond/collectors/ceph/ceph.py
aeb3915f8ac7fdea61495805d2c99f33  /usr/share/diamond/collectors/ceph/ceph.py
root@ceph1:~#

@drolfe
Copy link
Author

drolfe commented Jan 24, 2016

Looking at the calamari.log I can see it's looking for missing graphite metric data

root@calamari:/var/log/calamari# tail -f calamari.log
2016-01-23 22:44:54,040 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_space
2016-01-23 22:44:54,041 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_avail
2016-01-23 22:44:58,560 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_objects
2016-01-23 22:44:58,561 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_bytes
2016-01-23 22:44:58,835 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used_bytes
2016-01-23 22:44:58,835 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used
2016-01-23 22:44:58,836 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_space
2016-01-23 22:44:58,836 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_avail
2016-01-23 22:44:58,893 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_objects
2016-01-23 22:44:58,894 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_bytes
2016-01-23 22:45:14,440 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used_bytes
2016-01-23 22:45:14,441 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used
2016-01-23 22:45:14,442 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_space
2016-01-23 22:45:14,442 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_avail
2016-01-23 22:45:18,373 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_objects
2016-01-23 22:45:18,377 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_bytes
2016-01-23 22:45:18,878 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_objects
2016-01-23 22:45:18,879 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_bytes
2016-01-23 22:45:19,269 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used_bytes
2016-01-23 22:45:19,270 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used
2016-01-23 22:45:19,275 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_space
2016-01-23 22:45:19,276 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_avail
^C
root@calamari:/var/log/calamari#

@drolfe
Copy link
Author

drolfe commented Feb 1, 2016

I can see the ok files are there

root@ceph1:/var/run/ceph# ls -la
total 0
drwxrwx---  2 ceph ceph  80 Feb  1 10:51 .
drwxr-xr-x 18 root root 640 Feb  1 10:52 ..
srwxr-xr-x  1 ceph ceph   0 Feb  1 10:51 ceph-mon.ceph1.asok
srwxr-xr-x  1 root root   0 Jan 27 15:08 ceph-osd.0.asok
root@ceph1:/var/run/ceph#
root@ceph1:/var/run/ceph#
root@ceph1:/var/run/ceph#

Running diamond in debug show the below

[2016-02-01 10:55:23,774] [Thread-1] Collecting data from: NetworkCollector
[2016-02-01 10:56:23,484] [Thread-1] Collecting data from: CPUCollector
[2016-02-01 10:56:23,487] [Thread-6] Collecting data from: MemoryCollector
[2016-02-01 10:56:23,489] [Thread-7] Collecting data from: SockstatCollector
[2016-02-01 10:56:23,768] [Thread-1] Collecting data from: CephCollector
[2016-02-01 10:56:23,768] [Thread-1] gathering service stats for /var/run/ceph/ceph-mon.ceph1.asok
[2016-02-01 10:56:24,094] [Thread-1] Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.7/diamond/collector.py", line 412, in _run
    self.collect()
  File "/usr/share/diamond/collectors/ceph/ceph.py", line 464, in collect
    self._collect_service_stats(path)
  File "/usr/share/diamond/collectors/ceph/ceph.py", line 450, in _collect_service_stats
    self._publish_stats(counter_prefix, stats, schema, GlobalName)
  File "/usr/share/diamond/collectors/ceph/ceph.py", line 305, in _publish_stats
    assert path[-1] == 'type'
AssertionError

[2016-02-01 10:56:24,096] [Thread-8] Collecting data from: LoadAverageCollector
[2016-02-01 10:56:24,098] [Thread-1] Collecting data from: VMStatCollector
[2016-02-01 10:56:24,099] [Thread-1] Collecting data from: DiskUsageCollector
[2016-02-01 10:56:24,104] [Thread-9] Collecting data from: DiskSpaceCollector

Check the md5 on the file returns the below:

root@ceph1:/var/run/ceph# md5sum /usr/share/diamond/collectors/ceph/ceph.py
aeb3915f8ac7fdea61495805d2c99f33  /usr/share/diamond/collectors/ceph/ceph.py
root@ceph1:/var/run/ceph#

I've found that replacing the ceph.py file with the below stops the diamond error

Diamond version 3.4.67

https://raw.githubusercontent.com/BrightcoveOS/Diamond/master/src/collectors/ceph/ceph.py

root@ceph1:/usr/share/diamond/collectors/ceph# md5sum ceph.py
13ac74ce0df39a5def879cb5fc530015  ceph.py


[2016-02-01 11:14:33,116] [Thread-42] Collecting data from: MemoryCollector
[2016-02-01 11:14:33,117] [Thread-1] Collecting data from: CPUCollector
[2016-02-01 11:14:33,123] [Thread-43] Collecting data from: SockstatCollector
[2016-02-01 11:14:35,453] [Thread-1] Collecting data from: CephCollector
[2016-02-01 11:14:35,454] [Thread-1] checking /var/run/ceph/ceph-mon.ceph1.asok
[2016-02-01 11:14:35,552] [Thread-1] checking /var/run/ceph/ceph-osd.0.asok
[2016-02-01 11:14:35,685] [Thread-44] Collecting data from: LoadAverageCollector
[2016-02-01 11:14:35,686] [Thread-1] Collecting data from: VMStatCollector
[2016-02-01 11:14:35,687] [Thread-1] Collecting data from: DiskUsageCollector
[2016-02-01 11:14:35,692] [Thread-45] Collecting data from: DiskSpaceCollector

But after all that it's still not working

@drolfe
Copy link
Author

drolfe commented Feb 4, 2016

Ok Thanks to the below reply on the mailing list

John Spray Mon, 01 Feb 2016 04:23:24 -0800

The "assert path[-1] == 'type'" is the error you get when using the
calamari diamond branch with a >= infernalis version of Ceph (where
new fields were added to the perf schema output).  No idea if anyone
has worked on updating Calamari+Diamond for latest ceph.

John

I've downgraded to hammer, now everything is working

I've build the latest calamari server, diamond and new calamari clients (now called romana)

Feel free to use them on your trusty deployments

http://bladeservers.net.au/calamari-server_1.3.1.1-105-g79c8df2-1trusty_amd64.deb
http://bladeservers.net.au/romana_1.2.2-36-gc62bb5b_all.deb
http://bladeservers.net.au/diamond_3.4.725_all.deb

Calamari All working

@kaazoo
Copy link

kaazoo commented Feb 12, 2016

@drolfe
Thanks alot for your packages. In order to make IOPS / usage data appear in graphite / calamaris when running Ceph Infernalis, a small change in the ceph.py collector script is required. See luinnar/Diamond@a9fcc62

@ChristinaMeno
Copy link
Contributor

I've been waiting to get this upstream: python-diamond/Diamond#321
that will get you a newer diamond 4.X and fix infernalis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants