Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In 4.4.2 services go directly into HARD state if host is SOFT DOWN #584

Closed
scottwilkerson opened this issue Sep 28, 2018 · 7 comments
Closed
Assignees
Labels
Milestone

Comments

@scottwilkerson
Copy link
Contributor

From this thread, and reproduced internally
https://support.nagios.com/forum/viewtopic.php?f=7&t=50384

If a host goes down and the service cannot be reached, when the service results are processed after it is determined that the host is not up, the host is going into a SOFT down state while the service goes into a HARD CRITICAL even if it has not reached it's max check attempts.

This triggers the service to prematurely send a notification even though the host is in a SOFT down.

@ziggimon
Copy link

This bug has been hindering me upgrading to the latest core and patch for the current CVE's. Is there any incoming release that we may see soon that includes a fix for this issue?

@hedenface
Copy link
Contributor

I'm going to be back in the full swing of things here directly. I plan on getting the important/critical bugs fixed in the next release. I can't put a date on it, but it will be priority until released.

@jomann09 jomann09 added the Bug label Jan 2, 2019
@jomann09 jomann09 added this to the 4.4.3 milestone Jan 3, 2019
@jomann09
Copy link
Contributor

jomann09 commented Jan 3, 2019

I did a little bit of digging on this because I remember talking about what was supposed to happen when a host was in a DOWN state. This document is what was followed in 4.4 https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/statetypes.html

The doc says that a HARD state occurs:

When a service check results in a non-OK state and its corresponding host is either DOWN or UNREACHABLE.

Now this doesn't actually say if the host needs to be SOFT or HARD down but from testing 4.2 and 4.3 it seems like Core never actually did this. It looks like services do go into a SOFT state and normal service checks would happen but no notification would send.

Reading through the logic, it looks like the notifications are getting triggered for this state change and that is likely a separate bug. I will have to consult with a few people to determine if this should be an intended feature (direct to HARD) or if we should stick with how it's worked. I also need to do a little bit of testing on other past Core systems to see how it handles this and if it follows this logic or not.

jomann09 added a commit that referenced this issue Jan 3, 2019
…nreachable would cause the service to send out a notification (and eventually a recovery)
@jomann09
Copy link
Contributor

jomann09 commented Jan 3, 2019

I found in the logic where it was automatically sending a notification out from hard state change and it should no longer do that with the patch above. Will test this in maint branch. Service will still go directly into a HARD state if the host is DOWN or UNREACHABLE.

jomann09 added a commit that referenced this issue Jan 3, 2019
… notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
@jomann09
Copy link
Contributor

jomann09 commented Jan 3, 2019

Okay I think this is properly fixed now. States do turn into HARD states after 1 check attempt and they do not notify while host is DOWN. Need to do a bit more testing but this seems like what is intended.

@ziggimon
Copy link

ziggimon commented Jan 3, 2019

I really appreciate this. The main pain was the notifications that the entire on call staff got when hosts got rebooted. And the ignoring of the 5 down ticks before services were sending notifications. Looking forward to testing this out soon.

@jomann09
Copy link
Contributor

jomann09 commented Jan 4, 2019

This should be good to go, we're testing it internally and you can test it too if you'd like using the maint branch.

@jomann09 jomann09 closed this as completed Jan 4, 2019
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Feb 24, 2023
…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Feb 24, 2023
…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Feb 28, 2023
…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Feb 28, 2023
…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Mar 1, 2023
…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Mar 1, 2023
…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Mar 1, 2023
…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Mar 1, 2023
…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants