Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Temperature reported twice in metrics for some drives #41
I have some drives (Seagate Nytro XF1230) that report
This is what these drives return in smartctl:
A line from
This has become a problem after I upgraded my monitoring hosts from Debian stretch to buster. The Icinga 2 version 2.6 in stretch fed the first occurrence of the attribute in the performance data to Graphite, whereas the newer version 2.10 from buster seems to use the second. Therefore, all my disks now show a temperature of 99 or 100°C in the database.
As far as I understand it, labels in the performance data should be unique and the order of the label/value pairs in the performance data is irrelevant, so I think Icinga is not at fault here as the behavior in case of non-unique labels is undefined.
Since I deploy
and then a few lines later I added:
Maybe I'm not the only one with this or a similar issue and maybe there is a more generic way to do this (e.g. adding the attribute type to the label in the performance data or implementing more flexible exludes?), so I'll just leave this here.
Hi @der-michik , thanks for reporting!
Your SSD drive shows the value 100 which shows a perfectly healthy drive, according to this attribute.
Can you please check the smartctl/smartmontools version on this particular host? We should probably report this upstream.
Update: Seems already fixed in smartmontools, check out: https://github.com/smartmontools/smartmontools/blob/master/smartmontools/drivedb.h#L4082 and smartmontools/smartmontools@160ecb1#diff-5c51af8dba19f3a4f4187af4b46e415f
And the ultimate finding: smartmontools/smartmontools#4
Ah, interesting, thanks for your research! That explains a lot. I did not think about having a detailed look at smartmontools as upgrading that on the affected systems is not really an option for me anyway whereas patching the script was an easy workaround.
Nevertheless, we maybe should think about a more flexible exclude option. Currently,
Maybe I will have a look at it and do a pull request tomorrow.
That was actually my intended answer here (to use -e attribute_id) :D