New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In 4.4.2 services go directly into HARD state if host is SOFT DOWN #584
Comments
This bug has been hindering me upgrading to the latest core and patch for the current CVE's. Is there any incoming release that we may see soon that includes a fix for this issue? |
I'm going to be back in the full swing of things here directly. I plan on getting the important/critical bugs fixed in the next release. I can't put a date on it, but it will be priority until released. |
I did a little bit of digging on this because I remember talking about what was supposed to happen when a host was in a DOWN state. This document is what was followed in 4.4 https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/statetypes.html The doc says that a HARD state occurs:
Now this doesn't actually say if the host needs to be SOFT or HARD down but from testing 4.2 and 4.3 it seems like Core never actually did this. It looks like services do go into a SOFT state and normal service checks would happen but no notification would send. Reading through the logic, it looks like the notifications are getting triggered for this state change and that is likely a separate bug. I will have to consult with a few people to determine if this should be an intended feature (direct to HARD) or if we should stick with how it's worked. I also need to do a little bit of testing on other past Core systems to see how it handles this and if it follows this logic or not. |
…nreachable would cause the service to send out a notification (and eventually a recovery)
I found in the logic where it was automatically sending a notification out from hard state change and it should no longer do that with the patch above. Will test this in maint branch. Service will still go directly into a HARD state if the host is DOWN or UNREACHABLE. |
… notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
Okay I think this is properly fixed now. States do turn into HARD states after 1 check attempt and they do not notify while host is DOWN. Need to do a bit more testing but this seems like what is intended. |
I really appreciate this. The main pain was the notifications that the entire on call staff got when hosts got rebooted. And the ignoring of the 5 down ticks before services were sending notifications. Looking forward to testing this out soon. |
This should be good to go, we're testing it internally and you can test it too if you'd like using the maint branch. |
…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)
…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)
…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)
…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
…e host was down/unreachable would cause the service to send out a notification (and eventually a recovery)
…operly and do not notify on next check. Also fixes the check_interval used when the state turns HARD when hosts are down or unreachable.
From this thread, and reproduced internally
https://support.nagios.com/forum/viewtopic.php?f=7&t=50384
If a host goes down and the service cannot be reached, when the service results are processed after it is determined that the host is not up, the host is going into a SOFT down state while the service goes into a HARD CRITICAL even if it has not reached it's max check attempts.
This triggers the service to prematurely send a notification even though the host is in a SOFT down.
The text was updated successfully, but these errors were encountered: