-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Describe the bug
When the pping daemon detects a box down event (i.e. a number of ICMP echo replies are missing), it both dispatches a boxDown event and immediately sets netbox.up to n (a value that indicates the device is down).
However, the state machinery of NAV (through eventengine) will not actually give the netbox a state of down until it has been unresponsive for more than 4 minutes (default value) - and no alerts are sent until it has been unresponsive for at least 1 minute.
The net effect is that a short-term packet loss will cause the netbox.up database attribute to flip back and forth before anyone notices.
However, there is a database rule that will forcibly close all ARP records associated with this netbox as soon as netbox.up is set to the down-state. This rule was introduced in 3e6f2df as a result of #596 (i.e. the rule is about 13 years old by now).
The rule may have been well-intentioned. It was likely intended to close ARP records for a device that went "permanently" offline (since NAV cannot collect from the device while it is offline, it cannot reliably decide if ARP records should remain open or closed). However, using netbox.up for this is unreliable, since this flag may flap without signifying any kind of "permanence" of the down-state.
To Reproduce
Do not attempt to reproduce in a production environment.
Steps to reproduce the behavior:
- Find any netbox (router) that has any number of open ARP records in the
arptable, e.g. netbox withid=42: - Issue the following SQL:
UPDATE netbox SET up='n' WHERE netboxid=42;
- Observe that all ARP records for netbox
42have now been closed.
Expected behavior
A netbox' ARP records should not be closed as a consequence of a short-lived ICMP packet loss.
Environment (please complete the following information):
- NAV version installed: 5.9.1
- Method of installation: Any