[feature-request] Service (or host) checks should allow SOFT status for OK as well #46

kali-hernandez · 2015-06-11T17:31:47Z

So we have the concept of a SOFT status when a check enters a non-ok state. We can define our checks to be considered as HARD directly, by means of setting only 1 check to trigger an alert.

However, returning the check to an OK state is sort-of HARD only. Any check returning an OK value will be considered by Nagios as a HARD status and therefore immediately trigger a notification setting the problem as resolved.

I think there are some cases where we want the return to OK to be also able to make use of the SOFT status, requiring, for example, 5 OK checks for the service to be considered back to OK state.

I have heard all sort of solutions, using flapping detection being probably the most sensible one. I still dislike doing that since acknowledgements (with non-permanent notes) will be lost when the service flaps to OK and back to critical.

For my users in particular this is extremely inconvenient and they complain all the time to the point they stop using acknowledgement functionality and even completely ignoring nagios alerts.

It would be great to have this functionality which I believe makes complete sense since it is already present for transitions from OK to NON-OK status.

kali-hernandez · 2015-06-11T17:42:03Z

Actually, reading the code (https://github.com/NagiosEnterprises/nagioscore/blob/master/base/checks.c#L713) it seems like there IS already some functionality like the one I am looking for. Is this some new feature not yet in last stable version or some setting I am missing?

kali-hernandez · 2015-06-11T17:47:43Z

Just to respond myself:

http://nagios.manubulon.com/traduction/docs14en/statetypes.html

Soft recovery seems to be just a recovery from a soft non-OK state.

kali-hernandez · 2015-06-15T18:13:46Z

As described by eloyd in the forums (http://support.nagios.com/forum/viewtopic.php?f=7&t=32959&p=141961#p141810), the actual feature request would be better explained as:

However, the OP [...] is looking for a recovery version of max_check_attempts to delay recovery notification by two, three, or more iterations without engaging flap detection (meaning, without it having to go non-OK at some point first).

I see this as being something that has potential for use within the Nagios framework, even though it is creatable with an event handler, so I second the nomination for a feature request. Just because event handlers can be used to do almost every form of notification that Nagios does now, doesn't mean it's the easiest path forward, so I can foresee some circumstances when this might be preferred over writing an event handler.

jfrickson · 2017-02-15T20:54:18Z

I think we will probably not do this, but we will review it and make a decision.

ericloyd · 2017-02-15T21:03:27Z

Sad. I think it would be fairly straightforward to implement the equivalent of a max_retries for recoveries as well. Like, recovery_max_retries. Then the users can make use of it (or not) as they see fit, without worrying about event handlers triggering with every service check.

tmcnag · 2017-02-15T21:15:32Z

@ericloyd - Can always write it and file a pull request :)

We haven't discounted the idea entirely, it just isn't a priority right now and we need to discuss the merits of it before making a decision. Personally, I feel like enabling flapping is the correct course of action here but the acknowledgement issue is valid. This will be discussed as well.

ericloyd · 2017-02-15T21:18:16Z

Yes. Yes, I can. In all my spare time, I'll do that. :-)

tmcnag · 2017-02-15T22:21:41Z

clicks dropdown
assigns to @eloyd
pats self on back for dutiful delegation
#JustManagementThings

ericloyd · 2017-02-16T14:26:06Z

Insert pithy comment here.

hedenface · 2017-06-18T21:04:32Z

I don't think its a bad idea, but I don't think it's a good idea, either. Regardless: we'll discuss it at the next review meeting.

wolfen351 · 2017-10-03T07:56:13Z

Any feedback as to what happened at that meeting? I'm finding myself needing this functionality too..

hedenface · 2017-10-03T10:20:10Z

It actually fell off the radar a bit. I'll add it to the list for 4.4 and see if it's quick to throw in - if it isn't, I'll push it back to 5.0

hedenface · 2017-11-26T03:53:09Z

Just perusing the code a bit ..looks like this can make it into 4.4.0 - I'll try and have a test for it very soon, and then you all can let me know how well it works, hopefully.

…le_async_service_check_results - this only works for services currently #46

hedenface · 2017-11-30T02:03:42Z

Got a good start in 14c514b for services anyway.

hedenface · 2017-12-02T17:47:00Z

Well, I got a lot further with this - and unfortunately it is going to have to wait until 5.0. I can't see a good reason to go in and have to update all the documentation for 4.4 to reflect a complete change different from the rest of 4.

I'm changing the milestone to 5, and will try to get it in that branch soon for testing. This will be implemented with a host/service recovery_retries and recovery_interval. I had initially made a global flag that allowed for the use of the max_attempts and retry_interval - but this wouldn't make much sense as we would potentially want them to be different values.

* Initial commit * Initial import of macro files, .gitignore, and README.md * Styles not supported by github. Removed. * Removed item about using an already existing directory. This would have (in some cases) make things easier, but I tried it, and it doesn't work. * Added the `--squash` parameter to the `pug subtree pull` command * Added the script to create groups and users * Changes for SSL portability. Also uses pgk-config if available. * Having trouble getting gnutls/compat to work. Backing off for now. * Forgot to reset a switch * Change to say ndo2db needs to know the NEB directory Also add some comments about the `need_*` defines * Missed a comma * localstatedir was not being eval'd * Package name and the program name are not always the same For example: PKG_NAME=ndoutils but the daemon is ndo2db. Things like the startup file names, pid file, etc., need to have the daemon name. So an INIT_PROG variable was added, that should be set in configure.ac. If set, it will be used for those objects. * Changes to init stuff for AIX * Typo in ax_nagios_get_paths * HPUX - set init to "unknown". Let admin do it. * daemon-init.in: Nagios exit codes do not comply with the Linux Standard Base: https://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/iniscrptact.html * Fix for Solaris, AIX and HP-UX * Compiler Warnings using Oracle Developer Studio on Solaris Fixes for issue #75. * Export SSL_TYPE variable * Allow more flexible requirements for comments Fix for issues #82 and #180 A section named `COMMAND COMMENTS` has been added to the end of `cgi.cfg`. The configuration variable name is the command name, with `CMD_` replaced with `CMT_` (for CoMmenT). So, for example, if you want to modify the comment requirements for the `CMD_ACKNOWLEDGE_HOST_PROBLEM`, the name in the `cgi.cfg` file would be `CMT_ACKNOWLEDGE_HOST_PROBLEM`. The value part has two parameters. The first is a number that determines if a comment is required for that command: A `0` (zero) means a comment is not allowed, and the `comment` fields on the form will not be displayed. A `1` (one) means that a comment is optional. The `comment` fields on the form will be displayed, but will not be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will not be an error. A `2` means that a comment is required. The `comment` fields on the form will be displayed, and will be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will be an error. The second parameter is optional, and separated from the `required` parameter by a comma. Everything after the comma is considered a default comment for that command, and will be pre-loaded into the comment field on the form. Here are a couple of examples of things you can do: CMT_ACKNOWLEDGE_HOST_PROBLEM=2,This problem is being looked into by [name] This makes comments for host problem acknowledgements required (which is the current behavior) but additionally, pre-loads the comment field on the form with the value `This problem is being looked into by [name]`. CMT_SCHEDULE_SVC_CHECK=1 This makes comments for rescheduling a service check optional. The current behavior is that a comment is required. CMT_SCHEDULE_HOST_DOWNTIME=0 This makes comments not allowed for scheduling downtime for a host. The comment field on the form will not be displayed, where before, it was displayed, and a comment was required. CMT_DISABLE_SVC_CHECK=2 Normally, comments can not be entered when you disable active checks for a service. This makes comments required, displays the comment fields on the form, and gives an error if a comment has not been entered. If no `COMMAND COMMENTS` configuration values are entered in `cgi.cfg`, the defaults are the same as the current behavior. Nagios core currently only has comments for host or service related commands. But comment overrides for every command can be entered in `cgi.cfg`. That means, if you put in `CMD_DISABLE_FLAP_DETECTION=2`, disabling flap detection system-wide will have comment fields on the form and will require a comment. But the comment will just be thrown away. Some non-host/service commands may or may not allow and process comments in the future. * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * fix build error when there is more than one xinetd running https://github.com/NagiosEnterprises/nrpe/pull/105 was merged into NRPE instead of autoconf-macros so I'm updating it here manually. * Check for inetd/xinetd before checking if init is upstart * Allow more flexible requirements for comments Fix for issues #82 and #180 A section named `COMMAND COMMENTS` has been added to the end of `cgi.cfg`. The configuration variable name is the command name, with `CMD_` replaced with `CMT_` (for CoMmenT). So, for example, if you want to modify the comment requirements for the `CMD_ACKNOWLEDGE_HOST_PROBLEM`, the name in the `cgi.cfg` file would be `CMT_ACKNOWLEDGE_HOST_PROBLEM`. The value part has two parameters. The first is a number that determines if a comment is required for that command: A `0` (zero) means a comment is not allowed, and the `comment` fields on the form will not be displayed. A `1` (one) means that a comment is optional. The `comment` fields on the form will be displayed, but will not be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will not be an error. A `2` means that a comment is required. The `comment` fields on the form will be displayed, and will be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will be an error. The second parameter is optional, and separated from the `required` parameter by a comma. Everything after the comma is considered a default comment for that command, and will be pre-loaded into the comment field on the form. Here are a couple of examples of things you can do: CMT_ACKNOWLEDGE_HOST_PROBLEM=2,This problem is being looked into by [name] This makes comments for host problem acknowledgements required (which is the current behavior) but additionally, pre-loads the comment field on the form with the value `This problem is being looked into by [name]`. CMT_SCHEDULE_SVC_CHECK=1 This makes comments for rescheduling a service check optional. The current behavior is that a comment is required. CMT_SCHEDULE_HOST_DOWNTIME=0 This makes comments not allowed for scheduling downtime for a host. The comment field on the form will not be displayed, where before, it was displayed, and a comment was required. CMT_DISABLE_SVC_CHECK=2 Normally, comments can not be entered when you disable active checks for a service. This makes comments required, displays the comment fields on the form, and gives an error if a comment has not been entered. If no `COMMAND COMMENTS` configuration values are entered in `cgi.cfg`, the defaults are the same as the current behavior. Nagios core currently only has comments for host or service related commands. But comment overrides for every command can be entered in `cgi.cfg`. That means, if you put in `CMD_DISABLE_FLAP_DETECTION=2`, disabling flap detection system-wide will have comment fields on the form and will require a comment. But the comment will just be thrown away. Some non-host/service commands may or may not allow and process comments in the future. * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * Allow more flexible requirements for comments Fix for issues #82 and #180 A section named `COMMAND COMMENTS` has been added to the end of `cgi.cfg`. The configuration variable name is the command name, with `CMD_` replaced with `CMT_` (for CoMmenT). So, for example, if you want to modify the comment requirements for the `CMD_ACKNOWLEDGE_HOST_PROBLEM`, the name in the `cgi.cfg` file would be `CMT_ACKNOWLEDGE_HOST_PROBLEM`. The value part has two parameters. The first is a number that determines if a comment is required for that command: A `0` (zero) means a comment is not allowed, and the `comment` fields on the form will not be displayed. A `1` (one) means that a comment is optional. The `comment` fields on the form will be displayed, but will not be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will not be an error. A `2` means that a comment is required. The `comment` fields on the form will be displayed, and will be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will be an error. The second parameter is optional, and separated from the `required` parameter by a comma. Everything after the comma is considered a default comment for that command, and will be pre-loaded into the comment field on the form. Here are a couple of examples of things you can do: CMT_ACKNOWLEDGE_HOST_PROBLEM=2,This problem is being looked into by [name] This makes comments for host problem acknowledgements required (which is the current behavior) but additionally, pre-loads the comment field on the form with the value `This problem is being looked into by [name]`. CMT_SCHEDULE_SVC_CHECK=1 This makes comments for rescheduling a service check optional. The current behavior is that a comment is required. CMT_SCHEDULE_HOST_DOWNTIME=0 This makes comments not allowed for scheduling downtime for a host. The comment field on the form will not be displayed, where before, it was displayed, and a comment was required. CMT_DISABLE_SVC_CHECK=2 Normally, comments can not be entered when you disable active checks for a service. This makes comments required, displays the comment fields on the form, and gives an error if a comment has not been entered. If no `COMMAND COMMENTS` configuration values are entered in `cgi.cfg`, the defaults are the same as the current behavior. Nagios core currently only has comments for host or service related commands. But comment overrides for every command can be entered in `cgi.cfg`. That means, if you put in `CMD_DISABLE_FLAP_DETECTION=2`, disabling flap detection system-wide will have comment fields on the form and will require a comment. But the comment will just be thrown away. Some non-host/service commands may or may not allow and process comments in the future. * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * Update Changelog * Prep for release 4.3.2-rc1 * Add a `statusCRITICALACK` class for the status column Resolves enhancement issue #166 * Allow more flexible requirements for comments Fix for issues #82 and #180 A section named `COMMAND COMMENTS` has been added to the end of `cgi.cfg`. The configuration variable name is the command name, with `CMD_` replaced with `CMT_` (for CoMmenT). So, for example, if you want to modify the comment requirements for the `CMD_ACKNOWLEDGE_HOST_PROBLEM`, the name in the `cgi.cfg` file would be `CMT_ACKNOWLEDGE_HOST_PROBLEM`. The value part has two parameters. The first is a number that determines if a comment is required for that command: A `0` (zero) means a comment is not allowed, and the `comment` fields on the form will not be displayed. A `1` (one) means that a comment is optional. The `comment` fields on the form will be displayed, but will not be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will not be an error. A `2` means that a comment is required. The `comment` fields on the form will be displayed, and will be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will be an error. The second parameter is optional, and separated from the `required` parameter by a comma. Everything after the comma is considered a default comment for that command, and will be pre-loaded into the comment field on the form. Here are a couple of examples of things you can do: CMT_ACKNOWLEDGE_HOST_PROBLEM=2,This problem is being looked into by [name] This makes comments for host problem acknowledgements required (which is the current behavior) but additionally, pre-loads the comment field on the form with the value `This problem is being looked into by [name]`. CMT_SCHEDULE_SVC_CHECK=1 This makes comments for rescheduling a service check optional. The current behavior is that a comment is required. CMT_SCHEDULE_HOST_DOWNTIME=0 This makes comments not allowed for scheduling downtime for a host. The comment field on the form will not be displayed, where before, it was displayed, and a comment was required. CMT_DISABLE_SVC_CHECK=2 Normally, comments can not be entered when you disable active checks for a service. This makes comments required, displays the comment fields on the form, and gives an error if a comment has not been entered. If no `COMMAND COMMENTS` configuration values are entered in `cgi.cfg`, the defaults are the same as the current behavior. Nagios core currently only has comments for host or service related commands. But comment overrides for every command can be entered in `cgi.cfg`. That means, if you put in `CMD_DISABLE_FLAP_DETECTION=2`, disabling flap detection system-wide will have comment fields on the form and will require a comment. But the comment will just be thrown away. Some non-host/service commands may or may not allow and process comments in the future. * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * Update to Changelog * Update Changelog * Add a `statusCRITICALACK` class for the status column Resolves enhancement issue #166 * CSV output for any selection in the Availability Report Fix for issue #169 The Availability Report can already output CSV for any selection (hostgroups, hosts, servicegroups, services) but only if you manually add the `&csvoutput=` parameter at the end of the URL. So the only change I had to make, was to show the checkbox for CSV output all the time. * New Macro(s) to generate URL for host / service object to be used in notifications Feature implementing issue #316 The new macros are $HOSTINFOURL$ and $SERVICEINFOURL$ * Allow more flexible requirements for comments Fix for issues #82 and #180 A section named `COMMAND COMMENTS` has been added to the end of `cgi.cfg`. The configuration variable name is the command name, with `CMD_` replaced with `CMT_` (for CoMmenT). So, for example, if you want to modify the comment requirements for the `CMD_ACKNOWLEDGE_HOST_PROBLEM`, the name in the `cgi.cfg` file would be `CMT_ACKNOWLEDGE_HOST_PROBLEM`. The value part has two parameters. The first is a number that determines if a comment is required for that command: A `0` (zero) means a comment is not allowed, and the `comment` fields on the form will not be displayed. A `1` (one) means that a comment is optional. The `comment` fields on the form will be displayed, but will not be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will not be an error. A `2` means that a comment is required. The `comment` fields on the form will be displayed, and will be marked as `required` (i.e. printed in red). If a comment is entered, it will be processed, but if no comment is entered, it will be an error. The second parameter is optional, and separated from the `required` parameter by a comma. Everything after the comma is considered a default comment for that command, and will be pre-loaded into the comment field on the form. Here are a couple of examples of things you can do: CMT_ACKNOWLEDGE_HOST_PROBLEM=2,This problem is being looked into by [name] This makes comments for host problem acknowledgements required (which is the current behavior) but additionally, pre-loads the comment field on the form with the value `This problem is being looked into by [name]`. CMT_SCHEDULE_SVC_CHECK=1 This makes comments for rescheduling a service check optional. The current behavior is that a comment is required. CMT_SCHEDULE_HOST_DOWNTIME=0 This makes comments not allowed for scheduling downtime for a host. The comment field on the form will not be displayed, where before, it was displayed, and a comment was required. CMT_DISABLE_SVC_CHECK=2 Normally, comments can not be entered when you disable active checks for a service. This makes comments required, displays the comment fields on the form, and gives an error if a comment has not been entered. If no `COMMAND COMMENTS` configuration values are entered in `cgi.cfg`, the defaults are the same as the current behavior. Nagios core currently only has comments for host or service related commands. But comment overrides for every command can be entered in `cgi.cfg`. That means, if you put in `CMD_DISABLE_FLAP_DETECTION=2`, disabling flap detection system-wide will have comment fields on the form and will require a comment. But the comment will just be thrown away. Some non-host/service commands may or may not allow and process comments in the future. * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * Update to Changelog * Update Changelog * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * new status for check dependencies Enhancement for issue #229 There is a newo `SET SERVICE/HOST STATUS WHEN SERVICE CHECK SKIPPED` section in `nagios.cfg`. There are four new settings that control what status hosts and services should get under various circumstances. Normally, if a check is not performed because of a failed dependency, parent status, or host down status, the skipped check causes the status of the host or service to stay the same. So if a host crashes, the services on that host will never be checked until the host comes back up. So the services could remain `STATE_OK`. The new configuration settings will allow you to change that. If a service check is skipped because of a failed dependency, the `service_skip_check_dependency_status` setting is checked. If a parent status causes a check to be skipped, the `service_skip_check_parent_status` setting is checked. If the host for a service is down, the `service_skip_check_host_down_status` is checked. And if a host check is skipped because of a failed dependency, the `host_skip_check_dependency_status` setting is checked. If the setting is `-1` (the default), nothing is done. Otherwise, the status of the host or service is set to the value of the setting. For services, value options are: 0 - STATE_OK, 1 - STATE_WARNING, 2 - STATE_CRITICAL, 3 - STATE_UNKNOWN. For hosts, value options are: 0 - STATE_UP, 1 - STATE_DOWN, 2 - STATE_UNREACHABLE. One or more other settings may be added in the future. In most cases, if this is going to be used at all, the various `service_skip_check_*_status` settings would probably be set to `3` (STATE_UNKNOWN). * Update to Changelog * Update to Changelog * Update to Changelog * Update to Changelog * Add a `statusCRITICALACK` class for the status column Resolves enhancement issue #166 * Update Changelog * CSV output for any selection in the Availability Report Fix for issue #169 The Availability Report can already output CSV for any selection (hostgroups, hosts, servicegroups, services) but only if you manually add the `&csvoutput=` parameter at the end of the URL. So the only change I had to make, was to show the checkbox for CSV output all the time. * New Macro(s) to generate URL for host / service object to be used in notifications Feature implementing issue #316 The new macros are $HOSTINFOURL$ and $SERVICEINFOURL$ * Clean up possible `/` at the end of `website_url` config val * update readme, update license, add changelog, fix issue with inetd,xinetd detection if neither are running * base/utils.c: calculate_time_from_day_of_month(): initialize "struct tm t" completely for negative day offsets * making checks.c more readable (shortening some lines) and added a calculate_execution_time for the handle_async_check_result family of functions * disable NERD by default. re-enable with --enable-nerd during ./configure * fixed content-disposition and filename in header for avail.cgi #452 * start of soft recovery. there are some known issues in a todo in handle_async_service_check_results - this only works for services currently #46 * remove test for enable_*_soft_recovery - pushed back to 5.0. updated handle_async_service_check_result function to remove duplicate logic blocks as well * forgot a paren or two * and another paren. probably more important than the last ones :) * cleaned up checks quite a bit, added inline functions for most of the logic blocks in an easy-to-digest fashion for code maintainability * better gcc options for makefile CFLAGs options * attempting to debug application for memory leaks clutters valgrind. got most of the memory errors out of the way for the pre-daemon fork, and some of the error producing conditional blocks as well * moving forward with memory improvements. parent process (pre-daemon_init) and main nagios process no longer have "memory leaks" [when running for a tiny amount of time for testing purposes] * these debug logs should be in the function they're wrapped around * added some additional logging (check result active/passive processing), cleaned up some my_strtok memory leakage and some code #414 * added a service file. renamed the openrc file for consistency * stupid newline * fixed the daemon-service * #352 added an enable_page_tour interface option * Command line macro detection skips potential macros with no ending dollar sign (#459) * Merge for fix #443, PR #448 into 4.4.0 * fixed the quick macro check for the matching $ * better fix for #459 i think - the issue is really that it is defined behavior to specify a lone $ as $$ - but it was placing the end of the text with a $ as it thought it was a macro. either way - it is fixed * make this look a bit neater * Fixed reloads causing defunct (zombie) processes (#441), Reverted some of the memory cleanup in base/workers.c (from 0e1b0f and 5b4f38) regarding the static kvvec structure (was segfaulting after my "fixes"), cleaned up some comments and define statements and sample config lines, also fixed kill() command in lib/wproc -BH * include config.h in lib/worker as partial fix for #326 -bh i feel like this may break something down the road - it would be nicer to have the library have its own contained config.h - maybe break iobroker.h out into a larger configurable include file to get config options? will wait to see if anything breaks before doing that though. * removed wait3 and wait4 deprecated functions. replaced with waitpid(). this was specifically done for #326, and should fix that issue, but in general is a good idea as well. need to remove all of the `struct rusage` that are cluttered in the header files, but will wait until core5 to do that -bh * fix the systemd service daemon file * prep for core to use autoconf-macros -bh * first round of adding autoconf-macros into core for 4.4.0. testing would appreciated by anyone who receives these messages :) -bh * added an upstart configuration for core. also updated some more of the AC_SUBST for the main service file * prefix and exec_prefix are no longer necessary in default-init * Finishing touches for CVE-2016-10089 (#454). Adjusted the RAMDISK creation in default-init, and cleaned up the check_config() calls a bit -BH * Fixed additive inheritance not testing for duplicates in hosts/services/(+escalations) #392 Added a bit of logic after object resolution, during object registration to loop through the existing contacts and contact_groups and compare the one we're about to add * Update default-init.in Suppress error output during ramdisk detection/creation (#454) * added this for #357. will revert. this needs to be visited for 5.0 instead, as there are some struct changes that need to occur for the host and service -bh * Revert "added this for #357. will revert. this needs to be visited for 5.0 instead, as there are some struct changes that need to occur for the host and service -bh" This reverts commit 16b10b6208ac9d19e41ab35b09d828c46e0acb34. * added macros HOSTNOTIFICATIONENABLED and SERVICENOTIFICATIONENABLED to see if notifications are enabled for either host or service (#419) -BH * forgot the macro count increment * Added system limit detection (RLIMIT_NPROC) to check for anticipated fork() failures (#434) (Bryan Heden) * New Macro(s) for obtaining the host/service notification periods (#350) (Bryan Heden) * whoopsie. fix a temp_host to a temp_service because of my copy pasta * Added stalking on notifications (`N` or `notifications` option when specifying `stalking_options`) (#342) (Bryan Heden) Also fixed a notification glitch I introduced while making the handle_async_*_check_result functions And updated configure with autoconf while I was at it * attempting to fix memory leak (#455) - looks like it was introduced in (ad9d52) based on internal testing (Bryan Heden) * fixing an incredibly silly blunder on my part * changed iocache_capacity return values for determining non-positive returns (#432) * adjust buffer size of iocache if its calculating a read size of 0 or less than in the instances where the data that comes in is larger than the current total buffer size (#432) * Fix for #432 - but larger than that, allows for iocache to grow during a read if a read is attempted and moved into the iocache, it will detect the incoming length properly and attempt to grow as required. can likely change the `iocache_create()`s everywhere to be a bit lower, since the adjustment would allow for them to grow on the fly. might test that eventually. * fix for `check_workers=` after introducing bug in commit: bdd7b3 * Pretty sure @box293 figured out the issue was related to the newly introduced host/service info macros (#316). This should solve the issue presented in #455 * reverting individual changes at a time to identify the leak * fix memory leak resulting from commit e7a17b which was intended to clean up memory leaks. irony: a state of affairs or an event that seems deliberately contrary to what one expects and is often amusing as a result (bryan heden) * fix for #463 - with the addition of the autoconf macros, it gets all happy expecting to have the same amount of startup files that nrpe has. core needs a lot of community love if we're ever going to get there (bryan heden) * update configure file to reflect changes in configure.ac for #463 * clean up a function with proper braces. debugging this function was horrible

kali-hernandez changed the title ~~Service (or host) checks should allow SOFT status for OK as well~~ [feature-request] Service (or host) checks should allow SOFT status for OK as well Jun 17, 2015

tmcnag added the Enhancement label Aug 20, 2015

icinga-migration mentioned this issue Jan 17, 2017

[dev.icinga.com #10114] Service (or host) checks should allow SOFT status for OK as well Icinga/icinga-core#1561

Closed

jfrickson added the Need Review label Feb 15, 2017

hedenface self-assigned this Jun 18, 2017

hedenface added this to the 4.4.0 milestone Oct 3, 2017

hedenface mentioned this issue Nov 27, 2017

Nagios RESTART_PROGRAM external command defunct processes #441

Closed

hedenface added a commit that referenced this issue Nov 30, 2017

start of soft recovery. there are some known issues in a todo in hand…

14c514b

…le_async_service_check_results - this only works for services currently #46

hedenface modified the milestones: 4.4.0, 5.0.0 Dec 2, 2017

hedenface added Approved and removed Need Review labels Dec 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature-request] Service (or host) checks should allow SOFT status for OK as well #46

[feature-request] Service (or host) checks should allow SOFT status for OK as well #46

kali-hernandez commented Jun 11, 2015

kali-hernandez commented Jun 11, 2015

kali-hernandez commented Jun 11, 2015

kali-hernandez commented Jun 15, 2015

jfrickson commented Feb 15, 2017

ericloyd commented Feb 15, 2017

tmcnag commented Feb 15, 2017

ericloyd commented Feb 15, 2017

tmcnag commented Feb 15, 2017

ericloyd commented Feb 16, 2017

hedenface commented Jun 18, 2017

wolfen351 commented Oct 3, 2017

hedenface commented Oct 3, 2017

hedenface commented Nov 26, 2017

hedenface commented Nov 30, 2017

hedenface commented Dec 2, 2017

[feature-request] Service (or host) checks should allow SOFT status for OK as well #46

[feature-request] Service (or host) checks should allow SOFT status for OK as well #46

Comments

kali-hernandez commented Jun 11, 2015

kali-hernandez commented Jun 11, 2015

kali-hernandez commented Jun 11, 2015

kali-hernandez commented Jun 15, 2015

jfrickson commented Feb 15, 2017

ericloyd commented Feb 15, 2017

tmcnag commented Feb 15, 2017

ericloyd commented Feb 15, 2017

tmcnag commented Feb 15, 2017

ericloyd commented Feb 16, 2017

hedenface commented Jun 18, 2017

wolfen351 commented Oct 3, 2017

hedenface commented Oct 3, 2017

hedenface commented Nov 26, 2017

hedenface commented Nov 30, 2017

hedenface commented Dec 2, 2017