Agent fty-outage produces pure alerts on _ALERTS_SYS when no data are coming from the device.
How to build
To build fty-outage project run:
./autogen.sh ./configure make make check # to run self-test
How to run
To run fty-outage project:
- from within the source tree, run:
For the other options available, refer to the manual page of fty-outage
- from an installed base, using systemd, run:
systemctl start fty-outage
Configuration file - fty-outage.cfg - is currently ignored.
Agent reads environment variable BIOS_LOG_LEVEL which controls verbosity level.
State file for fty-outage is stored in /var/lib/fty/fty-outage.zpl.
fty-outage is composed of 1 actor and 2 timers.
- fty-outage-server: main actor
First timer is implemented via checking zclock and saves the state of the agent each SAVE_INTERVAL_MS milliseconds (default value 45 minutes).
Second timer is implemented via zpoller timeout and publishes outage alerts for dead devices every TIMEOUT_MS milliseconds (default value 30 seconds) unless such an alert is already active.
Agent doesn't publish any metrics.
Agent publishes alerts on _ALERTS_SYS stream.
It is possible to request the agent fty-outage for:
- putting devices into or returning devices from maintenance mode: this is used to temporarily ignore outages on assets that are known to not be currently serving data (for example, due to a FW upgrade).
Putting devices into or returning devices from maintenance mode
The USER peer sends the following messages using MAILBOX SEND to FTY-OUTAGE-AGENT ("fty-outage") peer:
- REQUEST/'correlation_ID'/MAINTENANCE_MODE//asset1/.../assetN/expiration_ttl - switch 'asset1' to 'assetN' into maintenance
- '/' indicates a multipart string message
- 'correlation_ID' is a zuuid identifier provided by the caller
- MUST be 'enable' or 'disable'
- 'asset1', ..., 'assetN' MUST be the device(s) asset name
- 'expiration_ttl' (optional) is an amount of seconds after which the asset(s) will be automatically returned from maintenance mode. If 'expiration_ttl' is not provided, the default value ('maintenance_expiration') will be used from agent configuration file
- subject of the message is discarded
The FTY-OUTAGE-AGENT peer MUST respond with one of the messages back to USER peer using MAILBOX SEND.
- '/' indicates a multipart frame message
- 'correlation ID' is a zuuid identifier provided by the caller
- 'reason' is string detailing reason for error. Possible values are:
- Invalid command,
- Invalid message type,
- Command failed,
- Missing maintenance mode,
- Unsupported maintenance mode.
Agent is subscribed to streams METRICS, METRICS_UNAVAILABLE, METRICS_SENSOR and ASSETS.
If it gets METRICS_UNAVAILABLE message, it resolves all the stored alerts for specified device.
If it gets METRICS or METRICS_SENSOR message from a device, it resolves all the stored alerts for specified device and marks the device as active.
If it gets ASSETS message, it updates the asset cache. If the message is for operation DELETE or RETIRE, it resolves all the alerts for specified device.