New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the concept of 'maintenance mode'? #111
Comments
This has been on the backlog for a while... waiting for someone with a real use case to request it. Suggestions on how it might work in practice are welcome. I'm looking at using a plugin to suppress the alert if it meets certain criteria -- but what criteria and how it is determined (and updated) is open to debate. |
I was thinking about this. The first thing that came to mind is a list of regular expressions. If an alert matches, note that it's in maintenance mode instead of doing the normal behavior. alerta-api gets CRUD for maintenance mode expressions. Python client and web-UI gets enhancements to deal with them. |
I'm reluctant to support a list of rules based on regular expressions because it then becomes a case of iterating through the list of rules, running the regex and seeing if any match. This could potentially be very bad for performance. Alternatively, you use an event stream processor but that is on another level of complexity. The simple alternative, as far as I can tell, is to use equality match rules that work as a simple query. Alerta defines many different alert attributes that can be used to group alerts and it is these attributes that I have used to define rules eg. I have also supported the use of I have only done the work on the API at present. Feedback on this would be welcome before I start work on adding support to the web UI and CLI tool. |
That makes sense. I think if my users could say something like, "mute all alerts where resource=hostname", that would work. Or "service=some-service". |
The reason I initially thought of regexes, is if the users weren't certain what the 'resource' was going to look like. ie, did they correctly use FQDN, or short name? Or to make it easier to do something like mute hostname[1-30].company.com. |
I completely understand the requirement for something like Using tags is much more flexible as well -- what if you add hostname31.company.com then you'd need to update your regex to |
Right now, I'm roughly achieving this by manually editing the pagerduty.py script which sends pages out. If I know something is going into maintenance and I don't want it to page out, I just pop a line in the script to catch whatever it is, a host, a service, an environment, etc, and then not continue. Obviously this is a horrible way of accomplishing it, and it also requires a restart of the alerta server, but it works until we get something real into Alerta. I think tags are a good way of looking at this for putting certain services into maintenance. But what about a specific host that contains multiple services when those services are also part of a cluster with other nodes? I don't want to maintenance the cluster as a whole, just the service with that name on a specific node, or possibly anything coming from that specific node. |
Actually, it looks like your PR #112 pretty much knocks out everything I'd personally need. 👍 |
@bcwilsondotcom the only combination you mention not currently supported by #112 would be putting into maintenance only certain services from a specific node. Something like this combination could be added if there was no other way to match this category of alerts with the currently proposed rules. |
This is now available for use as version 4.5.0 (both server and client versions). The web UI has also been updated -- the "blackouts" page is a menu option under "Configuration". If you have any problems with it or would like changes/enhancements please raise a new issue. |
Do not ask for credentials on F5 refresh
Does Alerta have a mechanism for the concept of 'maintenance mode'?
ie, I'm going to do work on a server, which would cause alerts, so I want to temporarily mute alerts being generated from it.
The text was updated successfully, but these errors were encountered: