New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #6450] ipmi-sensors segfault due to stack size #1674

Closed
icinga-migration opened this Issue Jun 10, 2014 · 38 comments

Comments

Projects
None yet
1 participant
@icinga-migration
Copy link
Member

icinga-migration commented Jun 10, 2014

This issue has been migrated from Redmine: https://dev.icinga.com/issues/6450

Created by dennisp on 2014-06-10 12:27:36 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2014-07-21 11:35:16 +00:00)
Target Version: 2.0.2
Last Update: 2014-07-28 07:15:34 +00:00 (in Redmine)

Icinga Version: 2.0.0~icingaautorelease201406060907~trusty

I added a command with:

object CheckCommand "check_ipmi_dell" {
  import "plugin-check-command"

  command = [
    PluginDir + "/check_ipmi_sensor",
    "-H", "$address$",
    "-T", "$sensor$",
    "-U", "$user$",
    "-P", "$password$",
    "-L", "$privilege$",
  ],
}

apply Service "FAN" {
  import "generic-service",

  check_command = "check_ipmi_dell",
  vars += {
    "sensor" = "fan",
  },

  assign where "idrac-server" in host.templates,
  ignore where !host.vars.address
}

object Host "192.168.12.100" {
  import "idrac-server",

  display_name = "test",

  /* check values */

  vars = {
    address = "192.168.12.100",
    user = "monitoring",
    password = "xxxx",
    privilege = "user",
  },

}

the command will not work as it should. the debug log shows:

 long_output = 'Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.\\nUse of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 336.\\nUse of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 404.\\nUse of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 404.\\nUse of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 404.\\nUse of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 409.\\nUse of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 409.\\nSensor Type(s) fan Status: \\n FreeIPMI returned an empty header map (first line) FreeIPMI could not find any sensors for the given sensor type (option \'-T\').', 

when i add -vvv to the check ipmi command the correct arguments are showed so the command will work when i execute it directly in the shell.

what i can do to get this to work?

Changesets

2014-07-21 11:33:01 +00:00 by gbeutner 5dcf1a7

Fix stack rlimit problem

fixes #6450

Relations:

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jun 10, 2014

Updated by mfriedrich on 2014-06-10 12:34:07 +00:00

  • Status changed from New to Rejected

How does the executed command look like from the logs ('notice' severity)? Doesn't sound like a bug, but a (plugin) configuration issue to me.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jun 10, 2014

Updated by mfriedrich on 2014-06-10 12:34:45 +00:00

  • Status changed from Rejected to Feedback
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jun 10, 2014

Updated by mfriedrich on 2014-06-10 12:34:58 +00:00

  • Assigned to set to dennisp
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jun 10, 2014

Updated by dennisp on 2014-06-10 12:40:03 +00:00

[2014-06-10 14:39:19 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ipmi_sensor', '-H', '192.168.12.100', '-T', 'temperature', '-U', 'monitoring', '-P', 'xxx', '-L', 'user': PID 25529

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jun 10, 2014

Updated by dennisp on 2014-06-10 12:44:28 +00:00

When i use the command directly on shell it works:
/usr/lib/nagios/plugins/check_ipmi_sensor -H 192.168.12.100 -T fan -U monitoring -P xxx-L user
Sensor Type(s) fan Status: OK | 'FAN 1 RPM'=3600.00 'FAN 2 RPM'=3600.00 'FAN 3 RPM'=3600.00 'FAN 4 RPM'=3600.00 'FAN 5 RPM'=3600.00

But not in icinga2 directly i get the error from my report

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jun 10, 2014

Updated by mfriedrich on 2014-06-10 13:00:00 +00:00

  • Description updated

I'm not sure why you're adding the new variable like so. Obviously your macro value is resolved to null. Try

apply Service "FAN" {
  import "generic-service",

  check_command = "check_ipmi_dell",
  vars += {
    sensor = "fan",
  },

  assign where "idrac-server" in host.templates,
  ignore where !host.vars.address
}

or direct access

apply Service "FAN" {
  import "generic-service",

  check_command = "check_ipmi_dell",
  vars.sensor = "fan",

  assign where "idrac-server" in host.templates,
  ignore where !host.vars.address
}

Oh, and omit the commas at line end. You'll only need them as array separators.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jun 10, 2014

Updated by dennisp on 2014-06-10 13:16:22 +00:00

I tried this what u wrote and i get the same problem

template Host "idrac7-server" {
import "idrac-server",
}

apply Service "MEMORY" {
import "generic-service",

check_command = "check_ipmi_dell",
vars += {
sensor = "Memory",
},

assign where "idrac7-server" in host.templates,
ignore where !host.vars.address
}

[2014-06-10 15:12:38 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ipmi_sensor', '-H', '192.168.x.xx', '-T', 'Memory', '-U', 'monitoring', '-P', 'xxx', '-L', 'user': PID 32152

long_output = 'Sensor Type(s) Memory Status: \\n FreeIPMI returned an empty header map (first line) FreeIPMI could not find any sensors for the given sensor type (option \'-T\').',

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jun 10, 2014

Updated by mfriedrich on 2014-06-10 14:25:54 +00:00

  • Status changed from Feedback to Rejected
  • Assigned to deleted dennisp

Hm. Ok. Then everything is working as expected in regards of Icinga 2 executing the command from your configuration. I'd rather check if the sensor type "Memory" really exists.

Since this now really sounds like a configuration or plugin problem, please proceed at the mailing lists or forums where other users might read and help as well. And provide your manual tests and outputs over there too, most likely your tests and configs do not match.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 9, 2014

Updated by WhoCares on 2014-07-09 13:54:06 +00:00

Since I ran into the very same problem today, I'd like to point out that I don't think that "everything is working as expected in regards of Icinga 2 executing the command from your configuration". The initial problem is a direct result from the binary "ipmimonitoring" or "ipmi-sensors" segfaulting when being called from the check_ipmi_sensor script when run from within Icinga 2. Thus, an empty output is returned to the check_ipmi_sensor script which in turn leads to the errors listed above. When check_ipmi_sensor is run from the command line everything is fine.

So I believe that there's something with the environment executed by Icinga 2 that doesn't play well with the "ipmi-sensors" binary. Probably just some memory limit that may need to be raised but I haven't had time to dig into the code and take a look.

Here's some excerpt from my syslog:
Jul 9 14:22:36 deb-adm kernel: [9548484.022968] ipmi-sensors[17520]: segfault at 7fffa9690978 ip 0000000000404f89 sp 00007fffa9690980 error 6 in ipmi-sensors[400000+32000] Jul 9 14:22:38 deb-adm kernel: [9548485.563229] ipmi-sensors[17696]: segfault at 7fffcd73cb38 ip 0000000000404f89 sp 00007fffcd73cb40 error 6 in ipmi-sensors[400000+32000] Jul 9 14:22:40 deb-adm kernel: [9548487.497158] ipmi-sensors[17858]: segfault at 7fffb18ebcd8 ip 0000000000404f89 sp 00007fffb18ebce0 error 6 in ipmi-sensors[400000+32000]

Are there any settings or known limitations when running check_commands? Or any major differences compared to running under plain old "/bin/sh"?

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 9, 2014

Updated by gvegidy on 2014-07-09 14:09:59 +00:00

  • Status changed from Rejected to Feedback

Do you have selinux enabled on your system? If yes, please try either with permissive mode or have a look at the audit-log.

I remember having a similar problem where the called program did not correctly handle the permission denied response given by selinux.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by WhoCares on 2014-07-10 05:50:58 +00:00

Nope, no selinux on that system. Standard Debian Wheezy with the backports repo active and used.
In the meantime I already updated the freeipmi stuff from version 1.1.5 (standard Debian packages) to 1.4.4 to make sure it isn't a problem with a quite dated version of ipmi-sensors. Unfortunatley no luck there, same problem persists with segfaults when executing the command from within Icinga 2.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by dennisp on 2014-07-10 07:45:22 +00:00

good to hear i am not alone. same here no selinux or apparmor

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by WhoCares on 2014-07-10 07:48:44 +00:00

Same segfaults?

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by dennisp on 2014-07-10 07:50:39 +00:00

yes on ubuntu 14 lts. running them with commandline everything is fine

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by mfriedrich on 2014-07-10 07:59:06 +00:00

WhoCares wrote:

Since I ran into the very same problem today, I'd like to point out that I don't think that "everything is working as expected in regards of Icinga 2 executing the command from your configuration".

According to the original check output, it rather looks like an issue with the plugin itself.

'Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.\\nUse of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 336.\\nUse of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 404.\\nUse of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 404.\\nUse of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 404.\\nUse of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 409.\\nUse of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 409.\\nSensor Type(s) fan Status: \\n FreeIPMI returned an empty header map (first line) FreeIPMI could not find any sensors for the given sensor type (option \'-T\').', 

I'm not sure how this check plugin handles "uninitialized values", causing the ipmi binary to segfault. Segfaults shouldn't happen at all cost, and therefore I would rather dive into debugging that plugin and binary calls.

The initial problem is a direct result from the binary "ipmimonitoring" or "ipmi-sensors" segfaulting when being called from the check_ipmi_sensor script when run from within Icinga 2. Thus, an empty output is returned to the check_ipmi_sensor script which in turn leads to the errors listed above. When check_ipmi_sensor is run from the command line everything is fine.

I'd like to see your manual tests including users and environment.

So I believe that there's something with the environment executed by Icinga 2 that doesn't play well with the "ipmi-sensors" binary. Probably just some memory limit that may need to be raised but I haven't had time to dig into the code and take a look.

Here's some excerpt from my syslog:
Jul 9 14:22:36 deb-adm kernel: [9548484.022968] ipmi-sensors[17520]: segfault at 7fffa9690978 ip 0000000000404f89 sp 00007fffa9690980 error 6 in ipmi-sensors[400000+32000] Jul 9 14:22:38 deb-adm kernel: [9548485.563229] ipmi-sensors[17696]: segfault at 7fffcd73cb38 ip 0000000000404f89 sp 00007fffcd73cb40 error 6 in ipmi-sensors[400000+32000] Jul 9 14:22:40 deb-adm kernel: [9548487.497158] ipmi-sensors[17858]: segfault at 7fffb18ebcd8 ip 0000000000404f89 sp 00007fffb18ebce0 error 6 in ipmi-sensors[400000+32000]

Sounds like one needs a gdb wrapper for the ipmi-sensors call to get a full bt output getting an idea what's wrong here.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by WhoCares on 2014-07-10 08:12:52 +00:00

dnsmichi wrote:

According to the original check output, it rather looks like an issue with the plugin itself.

You got it backwards ;) Let me explain:
The "check_ipmi_sensor" script as distributed with the Nagios Plugins package runs (depending on the system) either one of the following two commands:

ipmimonitoring -V

or

ipmi-sensors -V

to determine the version of FreeIPMI installed on the system. On Debian 7/ Ubuntu 14 LTS the latter will be used. Here's the whole function from the perl script where it is failing:

sub get_ipmi_version{ my ipmi_version_output = '';
my $ipmi_version = '';
ipmi_version_output = `$IPMICOMMAND -V`; $ipmi_version = shift(ipmi_version_output);
$ipmi_version =~ /(\d*)\.(\d*)\.(\d+)/;
ipmi_version_output = (); push ipmi_version_output,$1,$2,$3;
return ipmi_version_output; }

Now, the segfault of the external call on the 4th line will result in the variable `@ipmi_version_output` being empty which in turn leads to the error in line 6 where the regex splitting of $ipmi_version is normally about to happen. At least that's what I determined when actually debugging the perl stuff.

I'm not sure how this check plugin handles "uninitialized values", causing the ipmi binary to segfault.

You're mixing up cause and result here. It's not the script that sends an uninitialized output to the binary. It's the binary that doesn't send anything back (due to the segfault) which then causes the "uninitialized variable" error message.

Segfaults shouldn't happen at all cost, and therefore I would rather dive into debugging that plugin and binary calls.

You're very welcome. If you want me to, I'll set up a test system with ssh access so you can have a look for yourself. The plugin itself is pretty straightforward. It could be simplified a fair bit for it still checks for FreeIPMI versions from the stone age but except that, there's nothing spectacular or fancy in there.

Sounds like one needs a gdb wrapper for the ipmi-sensors call to get a full bt output getting an idea what's wrong here.
I could provide that as well but I still think it's got something to do with the environment opened by Icinga 2. As a preliminary test, I'll try to have Icinga 2 call the binary directly. While this will most likely result in unparseable output it should at least show whether the segfaults will happen then as well. Does that sound like an easy way to go before heading full steam into gdb debugging?

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by WhoCares on 2014-07-10 08:40:42 +00:00

Ok, here we go. First the various config snippets:

Command definition:

object CheckCommand "check_ipmi_direct" {
  import "plugin-check-command"

  command = [ "/usr/sbin/ipmi-sensors" ]

  arguments = {
    "-h" = "$host_ipmi_address$",
    "-u" = "ipmi",
    "-p" = "",
    "-l" = "user"
  }

  vars.host_ipmi_address = "$host.vars.ipmi_address$"
}

Service Application:

apply Service "ipmi" {
  import "generic-service"
  check_command = "check_ipmi_direct"
  vars.sla = "24x7"
  assign where host.vars.ipmi_address
  ignore where host.vars.active == "no"
}

Example Host Definition:

object Host "kvm001" {
  import "generic-host"

  address = "10.0.0.23"
  address6 = ""

  vars.os = "Linux"
  vars.sys = "Proxmox"
  vars.sla = "24x7"

  vars.ipmi_address = "10.0.12.10"
}

This should result in a command like:

ipmi-sensors -h 10.0.12.10 -u ipmi -p <redacted> -l user

Running said command directly on the command line gives me:

deb-adm:/etc/icinga2# /usr/sbin/ipmi-sensors -h 10.0.12.10 -u ipmi -p  -l user
ID  | Name             | Type                     | Reading    | Units | Event
1   | Temp             | Temperature              | N/A        | C     | N/A
2   | Temp             | Temperature              | N/A        | C     | N/A
3   | Temp             | Temperature              | N/A        | C     | N/A
...
[shortened for brevity]
...
123 | ROMB Battery     | Battery                  | N/A        | N/A   | 'OK'
125 | vFlash           | Module/Board             | N/A        | N/A   | 'OEM Event = 0000h'
deb-adm:/etc/icinga2# 

Doesn't matter whether I run this as root or as the nagios user:

deb-adm:/etc/icinga2# su - nagios
nagios@deb-adm:~$ id
uid=108(nagios) gid=113(nagios) Gruppen=113(nagios)
nagios@deb-adm:~$ /usr/sbin/ipmi-sensors -h 10.0.12.10 -u ipmi -p  -l user
ID  | Name             | Type                     | Reading    | Units | Event
1   | Temp             | Temperature              | N/A        | C     | N/A
2   | Temp             | Temperature              | N/A        | C     | N/A
3   | Temp             | Temperature              | N/A        | C     | N/A
...
125 | vFlash           | Module/Board             | N/A        | N/A   | 'OEM Event = 0000h'
nagios@deb-adm:~$ 

Now using the Icinga 2 config as given above, I see this in the syslog:

deb-adm:/etc/icinga2# date
Do 10. Jul 10:37:57 CEST 2014
deb-adm:/etc/icinga2# /etc/init.d/icinga2 restart
[ ok ] checking Icinga2 configuration.
...
[2014-07-10 10:38:06 +0200] information/ConfigItem: Checked 1 IcingaApplication(s).
. ok 
deb-adm:/etc/icinga2# tail -f /var/log/syslog
Jul 10 10:38:10 deb-adm kernel: [69349.417894] ipmi-sensors[8173]: segfault at 7fff9ff44658 ip 0000000000407418 sp 00007fff9ff44660 error 6 in ipmi-sensors[400000+32000]
Jul 10 10:38:13 deb-adm kernel: [69353.192740] ipmi-sensors[8473]: segfault at 7fff8beea518 ip 0000000000407418 sp 00007fff8beea520 error 6 in ipmi-sensors[400000+32000]
^C
deb-adm:/etc/icinga2# 

So I think we can now at least agree to the "check_ipmi_sensor" script not being the problem.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by mfriedrich on 2014-07-10 08:49:14 +00:00

Ok, thanks. That sounds like a similar issue to #6588 - could you test the current snapshot builds where a fixed stack size is applied already?

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 10, 2014

Updated by WhoCares on 2014-07-10 09:00:35 +00:00

Updated Icinga 2 to this:

deb-adm:/etc/icinga2# dpkg -l | grep icinga2
ii  icinga2                                    2.0.0+icingasnap201407091522.d56da32~wheezy amd64        host and network monitoring system
ii  icinga2-bin                                2.0.0+icingasnap201407091522.d56da32~wheezy amd64        host and network monitoring system - daemon
ii  icinga2-classicui                          2.0.0+icingasnap201407091522.d56da32~wheezy all          host and network monitoring system - classic ui integration
ii  icinga2-common                             2.0.0+icingasnap201407091522.d56da32~wheezy all          host and network monitoring system - common files
ii  icinga2-doc                                2.0.0+icingasnap201407091522.d56da32~wheezy all          host and network monitoring system - documentation
deb-adm:/etc/icinga2# 

But still got that:

Jul 10 10:58:47 deb-adm kernel: [70586.502317] ipmi-sensors[5784]: segfault at 7fff1b9bdf00 ip 00000000004125ca sp 00007fff1b9bdec0 error 6 in ipmi-sensors[400000+32000]
Jul 10 10:58:47 deb-adm kernel: [70586.902624] ipmi-sensors[5841]: segfault at 7fffbee3f1e0 ip 00000000004125ca sp 00007fffbee3f1a0 error 6 in ipmi-sensors[400000+32000]
Jul 10 10:58:53 deb-adm kernel: [70592.673489] ipmi-sensors[6281]: segfault at 7ffff9380dd0 ip 00000000004125ca sp 00007ffff9380d90 error 6 in ipmi-sensors[400000+32000]
Jul 10 10:58:53 deb-adm kernel: [70592.884064] ipmi-sensors[6389]: segfault at 7fff20de8720 ip 00000000004125ca sp 00007fff20de86e0 error 6 in ipmi-sensors[400000+32000]
Jul 10 10:58:53 deb-adm kernel: [70593.156981] ipmi-sensors[6454]: segfault at 7fff3e366650 ip 00000000004125ca sp 00007fff3e366610 error 6 in ipmi-sensors[400000+32000]
Jul 10 10:58:54 deb-adm kernel: [70593.411595] ipmi-sensors[6520]: segfault at 7fffb716d240 ip 00000000004125ca sp 00007fffb716d200 error 6 in ipmi-sensors[400000+32000]
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 11, 2014

Updated by gbeutner on 2014-07-11 07:52:04 +00:00

  • Target Version set to 2.0.2
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 11, 2014

Updated by WhoCares on 2014-07-11 07:54:21 +00:00

Good timing ;)

I just updated to 2.0.1+icingasnap201407110715.b80c3b2~wheezy and I'm still seeing the segfaults there.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 11, 2014

Updated by gbeutner on 2014-07-11 09:20:51 +00:00

  • Subject changed from check_ipmi_sensor command bug to ipmi-sensors segfault due to stack size
  • Priority changed from Normal to High
  • Estimated Hours set to 8
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 11, 2014

Updated by gbeutner on 2014-07-11 09:21:29 +00:00

Required changes:

  • Reproduce the bug
  • Figure out what to do with the stack rusage code
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 17, 2014

Updated by tobiasvdk on 2014-07-17 14:26:08 +00:00

I also get these segfaults. I'm running icinga v2.0.1-11-g263f198 with 247 ipmi checks on a debian 7.6. Although these segfaults are occuring (some?) of these checks are working because I see (a valid - OK) output in the web ui.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 17, 2014

Updated by mfriedrich on 2014-07-17 14:35:32 +00:00

You could try setting the stack size manually and then calling the ipmi-sensors binary. Posting your results and the value which then works will help find a better solution.

# ulimit -s 1024
# /usr/sbin/ipmi-sensors -h 10.0.12.10 -u ipmi -p  -l user
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 18, 2014

Updated by tobiasvdk on 2014-07-18 09:31:13 +00:00

dnsmichi wrote:

You could try setting the stack size manually and then calling the ipmi-sensors binary. Posting your results and the value which then works will help find a better solution.

[...]

It's the same situation as dennisp ... running the command on the shell works without segfault. Having only one check works.

nagios@icinga21-ka:~$ ulimit -s
16384

I will try to figure out how many checks I can have configured...

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 18, 2014

Updated by WhoCares on 2014-07-18 09:57:59 +00:00

Just ran this:

deb-adm:~# for ul in 128 256 512 1024 2048 4096 8192 16384; do ulimit -s ${ul}; ulimit -s; ipmimonitoring -h 10.0.12.10 -u ipmi -p  -l user; done

And came to this:

128
Speicherzugriffsfehler
256
Speicherzugriffsfehler
512
Speicherzugriffsfehler
1024
Speicherzugriffsfehler
2048
ID  | Name             | Type                     | State    | Reading    | Units | Event
...
125 | vFlash           | Module/Board             | N/A      | N/A        | N/A   | 'OEM Event = 0000h'
4096
ID  | Name             | Type                     | State    | Reading    | Units | Event
...
125 | vFlash           | Module/Board             | N/A      | N/A        | N/A   | 'OEM Event = 0000h'
8192
ID  | Name             | Type                     | State    | Reading    | Units | Event
...
125 | vFlash           | Module/Board             | N/A      | N/A        | N/A   | 'OEM Event = 0000h'
16384
ID  | Name             | Type                     | State    | Reading    | Units | Event
...
125 | vFlash           | Module/Board             | N/A      | N/A        | N/A   | 'OEM Event = 0000h'
deb-adm:~#

This was directly from the command line. If time permits I'm going to try from within Icinga later today.
However, since 8192 is the default for Linux at least I don't think going beyond that makes any sense and it also increases the danger of the system swapping itself to death when running many checks in parallel, since each of those would allocate 16 MB of RAM per thread for stack size alone.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 18, 2014

Updated by tobiasvdk on 2014-07-18 12:28:17 +00:00

root@icinga21-ka:~# for ul in 128 256 512 1024 2048 4096 8192 16384; do ulimit -s ${ul}; ulimit -s; ipmimonitoring -h host -u user -p password -l user; sleep 1; done
128
Segmentation fault
256
Segmentation fault
512
Segmentation fault
1024
Segmentation fault
2048
Segmentation fault
4096
ID | Name        | Type              | State    | Reading    | Units | Event
...
8192
ID | Name        | Type              | State    | Reading    | Units | Event
...
16384
ID | Name        | Type              | State    | Reading    | Units | Event
...

root@icinga21-ka:~# for ul in 128 256 512 1024 2048 4096 8192 16384; do ulimit -s ${ul}; ulimit -s; /usr/lib/nagios/plugins/check_ipmi_sensor -H host -f /etc/icinga2/ipmi.cfg -T "Fan,Temperature,Current,Processor,Power_Supply"; sleep 1; done
128
Use of uninitialized value $ipmi_version in pattern match (m//) at /usr/lib/nagios/plugins/check_ipmi_sensor line 168.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 266.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 270.
Use of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 337.
Use of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 337.
Can't use an undefined value as an ARRAY reference at /usr/lib/nagios/plugins/check_ipmi_sensor line 408.
Sensor Type(s) Fan, Temperature, Current, Processor, Power_Supply Status: 256
Use of uninitialized value $ipmi_version in pattern match (m//) at /usr/lib/nagios/plugins/check_ipmi_sensor line 168.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 266.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 270.
Use of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 337.
Use of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 337.
Can't use an undefined value as an ARRAY reference at /usr/lib/nagios/plugins/check_ipmi_sensor line 408.
Sensor Type(s) Fan, Temperature, Current, Processor, Power_Supply Status: 512
Use of uninitialized value $ipmi_version in pattern match (m//) at /usr/lib/nagios/plugins/check_ipmi_sensor line 168.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 266.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 270.
Use of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[0] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 332.
Use of uninitialized value $ipmi_version[0] in numeric eq (==) at /usr/lib/nagios/plugins/check_ipmi_sensor line 337.
Use of uninitialized value $ipmi_version[1] in numeric gt (>) at /usr/lib/nagios/plugins/check_ipmi_sensor line 337.
Can't use an undefined value as an ARRAY reference at /usr/lib/nagios/plugins/check_ipmi_sensor line 408.
Sensor Type(s) Fan, Temperature, Current, Processor, Power_Supply Status: 1024
Can't use an undefined value as an ARRAY reference at /usr/lib/nagios/plugins/check_ipmi_sensor line 408.
Sensor Type(s) Fan, Temperature, Current, Processor, Power_Supply Status: 2048
Can't use an undefined value as an ARRAY reference at /usr/lib/nagios/plugins/check_ipmi_sensor line 408.
Sensor Type(s) Fan, Temperature, Current, Processor, Power_Supply Status: 4096
Sensor Type(s) Fan, Temperature, Current, Processor, Power_Supply Status: OK | 'System Temp'=28.00 'Fan1'=4185.00 'Fan2'=4320.00 'Fan3'=4320.00 'Fan4'=4320.00 'Fan5'=4320.00 'Fan6'=4320.00 'Fan10'=6480.00 'Fan11'=6480.00
8192
Sensor Type(s) Fan, Temperature, Current, Processor, Power_Supply Status: OK | 'System Temp'=28.00 'Fan1'=4185.00 'Fan2'=4320.00 'Fan3'=4320.00 'Fan4'=4320.00 'Fan5'=4320.00 'Fan6'=4320.00 'Fan10'=6480.00 'Fan11'=6480.00
16384
Sensor Type(s) Fan, Temperature, Current, Processor, Power_Supply Status: OK | 'System Temp'=28.00 'Fan1'=4185.00 'Fan2'=4320.00 'Fan3'=4320.00 'Fan4'=4320.00 'Fan5'=4320.00 'Fan6'=4320.00 'Fan10'=7020.00 'Fan11'=6480.00
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 18, 2014

Updated by WhoCares on 2014-07-18 12:33:06 +00:00

Strange. Makes me wonder why it is working for me at 2048K and for you at 4096K. Here's what I have:

deb-adm:~# uname -a
Linux deb-adm 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.7-1~bpo70+1 (2014-06-21) x86_64 GNU/Linux

deb-adm:~# dpkg -l | grep freeipmi
ii  freeipmi-common                    1.4.4-1                     all          GNU implementation of the IPMI protocol - common files
ii  freeipmi-tools                     1.4.4-1                     amd64        GNU implementation of the IPMI protocol - tools
ii  libfreeipmi16                      1.4.4-1                     amd64        GNU IPMI - libraries
deb-adm:~# 
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 18, 2014

Updated by tobiasvdk on 2014-07-18 12:38:20 +00:00

The "strange" thing is, although these segfaults occur the icinga checks (or only some of them randomly?) return a correct output - as seen in the webui.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 18, 2014

Updated by WhoCares on 2014-07-18 12:40:39 +00:00

I would think it's either random or pure luck on your side. Never returned anything useful for me for > 75 checks running at any given time.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 18, 2014

Updated by tobiasvdk on 2014-07-18 12:46:20 +00:00

WhoCares wrote:

Strange. Makes me wonder why it is working for me at 2048K and for you at 4096K. Here's what I have:

[...]
[...]

root@icinga21:~# uname -a
Linux icinga21 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u1 x86_64 GNU/Linux

root@icinga21:~# dpkg -l | grep freeipmi
ii  freeipmi-common                    1.1.5-3                                     all          GNU implementation of the IPMI protocol - common files
ii  freeipmi-tools                     1.1.5-3                                     amd64        GNU implementation of the IPMI protocol - tools
ii  libfreeipmi12                      1.1.5-3                                     amd64        GNU IPMI - libraries
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 18, 2014

Updated by WhoCares on 2014-07-18 12:56:39 +00:00

Thought as much. So my manually built 1.4.4 seems to reduce the needed stack size.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 21, 2014

Updated by gbeutner on 2014-07-21 08:22:47 +00:00

  • Assigned to set to gbeutner
@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 21, 2014

Updated by gbeutner on 2014-07-21 11:35:16 +00:00

  • Status changed from Feedback to Resolved
  • Done % changed from 0 to 100

Applied in changeset 5dcf1a7.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 21, 2014

Updated by gbeutner on 2014-07-21 11:36:03 +00:00

  • Category set to libbase

Please recheck if my latest patch fixes this issue.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 21, 2014

Updated by WhoCares on 2014-07-21 11:59:42 +00:00

Works fine for me, no more segfaults.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

icinga-migration commented Jul 28, 2014

Updated by tobiasvdk on 2014-07-28 07:15:34 +00:00

Same for me ... works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment