Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerts on restored agent's health #1826

Closed
tnesztler opened this issue Oct 22, 2018 · 1 comment
Closed

Alerts on restored agent's health #1826

tnesztler opened this issue Oct 22, 2018 · 1 comment
Assignees

Comments

@tnesztler
Copy link

tnesztler commented Oct 22, 2018

Description of Issue

The next section assumes you're subscribed to the alerts topics (see here).
For an agent using the health subsystem (such as all historian agents based on the BaseHistorian class), if the health goes from GOOD to BAD (for example due to this case), a message is published as expected.
However, whenever the agent starts publishing again, the health is restored to GOOD but no message is sent stating so.

One use case is an agent publishing the status of selected agents to a Slack channel. It is impossible to know if the agent's status is still BAD or was restored.

Affected Version

5.1 and up

Screenshots

screen shot 2018-10-22 at 4 07 45 pm

In this screenshot, data is published every 5 minutes.

Expected

Message being published to the alerts base topic or "subtopics" at each change of the health of an agent using the health subsystem.

Actual

Only degraded health is reported to the alerts topic.

Steps to Reproduce

  1. Use an agent that is based on the BaseHistorian agent and subscribe to alerts.
  2. Prevent the historian to publish data. Wait for its health to go from GOOD to BAD. A new message is published.
  3. Let the historian publish again. The health should go from BAD to GOOD again after the data backlog have been processed. No message is being published once the health is restored.
@schandrika schandrika self-assigned this Nov 21, 2018
@schandrika
Copy link
Contributor

Fixed as part of #1846. If status becomes bad after the initial setup phase, say a connection failure when trying to write device data to database, then once the connection is back up, database write will happen and status of agent will change to GOOD. If connection failure happens at the time of startup (agent init, startup) then alert is sent to user and user need to fix the issue and restart the agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants