In 4.4.2 services go directly into HARD state if host is SOFT DOWN #584

scottwilkerson · 2018-09-28T19:09:30Z

From this thread, and reproduced internally
https://support.nagios.com/forum/viewtopic.php?f=7&t=50384

If a host goes down and the service cannot be reached, when the service results are processed after it is determined that the host is not up, the host is going into a SOFT down state while the service goes into a HARD CRITICAL even if it has not reached it's max check attempts.

This triggers the service to prematurely send a notification even though the host is in a SOFT down.

ziggimon · 2018-10-23T09:31:55Z

This bug has been hindering me upgrading to the latest core and patch for the current CVE's. Is there any incoming release that we may see soon that includes a fix for this issue?

hedenface · 2018-10-23T15:56:01Z

I'm going to be back in the full swing of things here directly. I plan on getting the important/critical bugs fixed in the next release. I can't put a date on it, but it will be priority until released.

jomann09 · 2019-01-03T03:07:05Z

I did a little bit of digging on this because I remember talking about what was supposed to happen when a host was in a DOWN state. This document is what was followed in 4.4 https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/statetypes.html

The doc says that a HARD state occurs:

When a service check results in a non-OK state and its corresponding host is either DOWN or UNREACHABLE.

Now this doesn't actually say if the host needs to be SOFT or HARD down but from testing 4.2 and 4.3 it seems like Core never actually did this. It looks like services do go into a SOFT state and normal service checks would happen but no notification would send.

Reading through the logic, it looks like the notifications are getting triggered for this state change and that is likely a separate bug. I will have to consult with a few people to determine if this should be an intended feature (direct to HARD) or if we should stick with how it's worked. I also need to do a little bit of testing on other past Core systems to see how it handles this and if it follows this logic or not.

…nreachable would cause the service to send out a notification (and eventually a recovery)

jomann09 · 2019-01-03T03:37:13Z

I found in the logic where it was automatically sending a notification out from hard state change and it should no longer do that with the patch above. Will test this in maint branch. Service will still go directly into a HARD state if the host is DOWN or UNREACHABLE.

… notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.

jomann09 · 2019-01-03T05:30:20Z

Okay I think this is properly fixed now. States do turn into HARD states after 1 check attempt and they do not notify while host is DOWN. Need to do a bit more testing but this seems like what is intended.

ziggimon · 2019-01-03T09:08:45Z

I really appreciate this. The main pain was the notifications that the entire on call staff got when hosts got rebooted. And the ignoring of the 5 down ticks before services were sending notifications. Looking forward to testing this out soon.

jomann09 · 2019-01-04T03:43:25Z

This should be good to go, we're testing it internally and you can test it too if you'd like using the maint branch.

…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)

…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.

…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)

…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.

…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)

…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.

…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)

…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.

jomann09 added the Bug label Jan 2, 2019

jomann09 added this to the 4.4.3 milestone Jan 3, 2019

jomann09 added the Need Review label Jan 3, 2019

jomann09 added a commit that referenced this issue Jan 3, 2019

Fixed #584 where services going into hard state while host was down/u…

5c5e3ad

…nreachable would cause the service to send out a notification (and eventually a recovery)

jomann09 added a commit that referenced this issue Jan 3, 2019

Change to fixes for #584 to make HARD states work properly and do not…

fb8fd5a

… notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.

jomann09 mentioned this issue Jan 3, 2019

nagios 4.4.2 recovery notifications from UNREACHABLE/UNKNOWN for hosts/services are sent even if the contact didn't receive the UNREACHABLE/UNKNOWN notification #580

Closed

jomann09 self-assigned this Jan 3, 2019

jomann09 removed the Need Review label Jan 4, 2019

jomann09 closed this as completed Jan 4, 2019

msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Feb 24, 2023

Fixed NagiosEnterprises#584 where services going into hard state whil…

439da55

…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)

msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Feb 28, 2023

Fixed NagiosEnterprises#584 where services going into hard state whil…

0a82db2

…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)

msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Mar 1, 2023

Fixed NagiosEnterprises#584 where services going into hard state whil…

1e57009

…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)

msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Mar 1, 2023

Fixed NagiosEnterprises#584 where services going into hard state whil…

d83e523

…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In 4.4.2 services go directly into HARD state if host is SOFT DOWN #584

In 4.4.2 services go directly into HARD state if host is SOFT DOWN #584

scottwilkerson commented Sep 28, 2018

ziggimon commented Oct 23, 2018

hedenface commented Oct 23, 2018

jomann09 commented Jan 3, 2019 •

edited

jomann09 commented Jan 3, 2019

jomann09 commented Jan 3, 2019

ziggimon commented Jan 3, 2019

jomann09 commented Jan 4, 2019

In 4.4.2 services go directly into HARD state if host is SOFT DOWN #584

In 4.4.2 services go directly into HARD state if host is SOFT DOWN #584

Comments

scottwilkerson commented Sep 28, 2018

ziggimon commented Oct 23, 2018

hedenface commented Oct 23, 2018

jomann09 commented Jan 3, 2019 • edited

jomann09 commented Jan 3, 2019

jomann09 commented Jan 3, 2019

ziggimon commented Jan 3, 2019

jomann09 commented Jan 4, 2019

jomann09 commented Jan 3, 2019 •

edited