Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emergency contact (for senior folk) #1358

Closed
PatReynolds opened this issue Dec 1, 2017 · 4 comments
Closed

Emergency contact (for senior folk) #1358

PatReynolds opened this issue Dec 1, 2017 · 4 comments

Comments

@PatReynolds
Copy link

PatReynolds commented Dec 1, 2017

This is for the senior people to alert those people who can do something about it, that there is a major incident (e.g. a website is down).
Create Slack list and protocol for its use
Emergency contact list (emails) and protocol for its use
Possibly Involve google analytics- to send automatic alert to slack.

@PatReynolds PatReynolds added this to the Non Milestone/Phase Specific milestone Dec 1, 2017
@Captainkirkdawson
Copy link
Member

Deeply puzzled by this story. There is already an automatic system that is monitoring all of the servers and apps in quite a bit of detail. Reports of failures are sent to the slack alerts channel. They are also sent to Lemon and I believe to Barry; Lemon offered to have them come to me but I declined. Lemon will normally take action to resolve the issue. If significant he posts to the tech list.

The last reported alert was November 23
Nagios APP [7:34 AM]
[PROBLEM] brazza/root disk WARNING
Nagios APP [7:59 AM]
[PROBLEM] brazza/root disk CRITICAL
[7:59]
[PROBLEM] brazza/mongodb disk WARNING
[8:04]
[RECOVERY] brazza/root disk OK
[8:04]
[RECOVERY] brazza/mongodb disk OK

@PatReynolds
Copy link
Author

This is something different: the automatic system is needed, and works really well (if the absence of downtime on our websites is anything to go by). What we need is a way to communicate with the people who can put in place the softer side of solutions, should anything go wrong - e.g. putting a notice on the login page so that if one server is down, volunteers will understand why sometimes they can login, but sometimes can't, putting a message on social media if a website is down.

@Captainkirkdawson
Copy link
Member

That is a very different story, It is one of the roles that I used to play

@Captainkirkdawson
Copy link
Member

I continue to find no specific system requirement

@ghost ghost removed the ready label Aug 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants