the concept of 'maintenance mode'? #111

blysik · 2015-09-04T22:19:45Z

Does Alerta have a mechanism for the concept of 'maintenance mode'?

ie, I'm going to do work on a server, which would cause alerts, so I want to temporarily mute alerts being generated from it.

satterly · 2015-09-05T22:32:13Z

This has been on the backlog for a while... waiting for someone with a real use case to request it. Suggestions on how it might work in practice are welcome.

I'm looking at using a plugin to suppress the alert if it meets certain criteria -- but what criteria and how it is determined (and updated) is open to debate.

blysik · 2015-09-06T13:38:17Z

I was thinking about this. The first thing that came to mind is a list of regular expressions. If an alert matches, note that it's in maintenance mode instead of doing the normal behavior.

alerta-api gets CRUD for maintenance mode expressions. Python client and web-UI gets enhancements to deal with them.

satterly · 2015-09-07T20:15:52Z

I'm reluctant to support a list of rules based on regular expressions because it then becomes a case of iterating through the list of rules, running the regex and seeing if any match. This could potentially be very bad for performance. Alternatively, you use an event stream processor but that is on another level of complexity.

The simple alternative, as far as I can tell, is to use equality match rules that work as a simple query. Alerta defines many different alert attributes that can be used to group alerts and it is these attributes that I have used to define rules eg. environment, service, group. However, resource and event attributes are still supported for situations that require that level of granularity.

I have also supported the use of tags to define a blackout rule which should allow a lot of flexibility -- one or more tags can be required to match an alert for the suppression to apply. And tags can be added at source, using the alerta CLI, or using a plug-in.

I have only done the work on the API at present. Feedback on this would be welcome before I start work on adding support to the web UI and CLI tool.

blysik · 2015-09-08T18:21:48Z

That makes sense.

I think if my users could say something like, "mute all alerts where resource=hostname", that would work. Or "service=some-service".

blysik · 2015-09-08T18:26:03Z

The reason I initially thought of regexes, is if the users weren't certain what the 'resource' was going to look like. ie, did they correctly use FQDN, or short name? Or to make it easier to do something like mute hostname[1-30].company.com.

satterly · 2015-09-08T20:30:59Z

I completely understand the requirement for something like hostname[1-30].company.com however this is very difficult to do efficiently. I would suggest that to achieve this you would tag those 30 hosts with whatever makes them common (eg. frontend) and use that tag to add/remove them from maintenance.

Using tags is much more flexible as well -- what if you add hostname31.company.com then you'd need to update your regex to hostname[1-31].company.com but if you used tags it would match without any extra work.

bcwilsondotcom · 2015-09-08T20:48:14Z

Right now, I'm roughly achieving this by manually editing the pagerduty.py script which sends pages out. If I know something is going into maintenance and I don't want it to page out, I just pop a line in the script to catch whatever it is, a host, a service, an environment, etc, and then not continue. Obviously this is a horrible way of accomplishing it, and it also requires a restart of the alerta server, but it works until we get something real into Alerta.

I think tags are a good way of looking at this for putting certain services into maintenance. But what about a specific host that contains multiple services when those services are also part of a cluster with other nodes? I don't want to maintenance the cluster as a whole, just the service with that name on a specific node, or possibly anything coming from that specific node.

bcwilsondotcom · 2015-09-09T06:31:59Z

Actually, it looks like your PR #112 pretty much knocks out everything I'd personally need. 👍

satterly · 2015-09-09T09:02:55Z

@bcwilsondotcom the only combination you mention not currently supported by #112 would be putting into maintenance only certain services from a specific node. Something like this combination could be added if there was no other way to match this category of alerts with the currently proposed rules.

satterly · 2015-09-10T10:08:55Z

This is now available for use as version 4.5.0 (both server and client versions). The web UI has also been updated -- the "blackouts" page is a menu option under "Configuration".

If you have any problems with it or would like changes/enhancements please raise a new issue.

Do not ask for credentials on F5 refresh

satterly added the enhancement New feature or request label Sep 5, 2015

satterly self-assigned this Sep 5, 2015

satterly mentioned this issue Sep 7, 2015

Suppress alerts based on blackout rules to support maintenance mode #112

Merged

This was referenced Sep 10, 2015

Add subcommands to support blackout periods alerta/python-alerta-client#14

Merged

add blackout periods to config menu alerta/angular-alerta-webui#25

Merged

satterly closed this as completed Sep 10, 2015

middelthun mentioned this issue Sep 4, 2018

Add blackout permutations #627

Merged

mblancodev pushed a commit to edgeuno/alerta that referenced this issue Oct 18, 2022

Merge pull request alerta#111 from alerta/fix-init-auth

335ced8

Do not ask for credentials on F5 refresh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the concept of 'maintenance mode'? #111

the concept of 'maintenance mode'? #111

blysik commented Sep 4, 2015

satterly commented Sep 5, 2015

blysik commented Sep 6, 2015

satterly commented Sep 7, 2015

blysik commented Sep 8, 2015

blysik commented Sep 8, 2015

satterly commented Sep 8, 2015

bcwilsondotcom commented Sep 8, 2015

bcwilsondotcom commented Sep 9, 2015

satterly commented Sep 9, 2015

satterly commented Sep 10, 2015

the concept of 'maintenance mode'? #111

the concept of 'maintenance mode'? #111

Comments

blysik commented Sep 4, 2015

satterly commented Sep 5, 2015

blysik commented Sep 6, 2015

satterly commented Sep 7, 2015

blysik commented Sep 8, 2015

blysik commented Sep 8, 2015

satterly commented Sep 8, 2015

bcwilsondotcom commented Sep 8, 2015

bcwilsondotcom commented Sep 9, 2015

satterly commented Sep 9, 2015

satterly commented Sep 10, 2015