Automated issue-frequency HipChat notifier
Clone or download
latteier Raise threshold for paging
Summary:
Recently we had a beep boop alert false positive that triggered
pagerduty.

I reviewed beep boop alerts for this year:

  date, probabity, num tickets, false or true alert
- feb 15, 1.0000, 41, true
- feb 15, 0.9999, 6, false
- feb 11, 0.9999, 6, false
- feb 06, 0.9991, 9, false (no page)
- feb 05, 0.9991, 9, true (no page)
- feb 05, 0.9991, 9, true (no page)
- feb 05, 0.9992, 9, true (no page)
- feb 03, 1.0000, 7, true
- feb 02, 0.9993, 5, true (no page)
- feb 02, 0.9992, 9, true (no page)
- jan 23, 1.0000, 12, false?
- jan 22, 0.9999, 6, true
- jan 22, 0.9997, 9, true
- jan 16, 0.9997, 9, true?
- jan 10, 1.0000, 13, true
- jan 07, 0.9995, 5, false?

Based on these numbers I don't feel comfortable changing the probably cut
offs (0.9999 for an alert, 0.9995 for a page).

But I do see an opertunity to raise the required number of tickets
for paging. Currently all alerts must have 5 tickets. This diff
raises the threshold to 7 tickets for paging. The effect on this
year's pages would be to remove 3 false pages, and exclude one true
one (which would still create an alert, simply not a page).

Test Plan: - Fingers crossed.

Reviewers: jacqueline, csilvers

Reviewed By: jacqueline, csilvers

Subscribers: csilvers

Differential Revision: https://phabricator.khanacademy.org/D42455
Latest commit 3e73c77 Feb 16, 2018

README.md

beep-boop

This script monitors our Zendesk and Jira accounts and notifies us in a Slack room when the bug report rates are far enough above the mean rate to have a very high probability of being due to an abnormal event (think a newly introduced bug).

"Far enough above" is given by a Poisson distribution with a certain probability threshold -- if the probability of seeing at least this number of bug reports due to random chance is low enough, we send a notification, because it's likely that this elevated rate is due to a bug.

This uses alertlib (a sub-repo) to talk to Slack. alertlib requires being able to import a file called secrets.py with the contents:

slack_alertlib_webhook_url = "<slack url value>"

The script runs as a cron job on toby; see the aws-config repo for the crontab.