Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address the sensitive error rate check #476

Merged
merged 2 commits into from Oct 20, 2017
Merged

Conversation

sjparkinson
Copy link
Contributor

Looking at preflight with these values during a 10 minute period when the alert was going off.

A baseline period of 7 days, a sample period of 10 minutes.

Over 7 days, the baseline error rate for next-preflight was 0.00001.

Over the last 10 minutes, the sample error rate was 0.00000571.

For graphiteSpike we take these values and compute the following...

const ok = sample / baseline < threshold;

https://github.com/Financial-Times/n-health/blob/master/src/checks/graphiteSpike.check.js#L85

With a threshold of 0.04, we find that sample / baseline is 0.571 and so we're not OK.

Using the default threshold of 3 in this case I believe will catch major spikes in errors introduced by a release without being too sensitive.

I think the confusion was in thinking that threshold is a percentage value.

A dashboard for next-preflight I used to work out what Graphite was returning, http://grafana.ft.com/dashboard/db/next-n-express-error-rate.

 🐿 v2.5.16
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.07%) to 88.401% when pulling b1c5dfb on error-rate-patch into 79b1b63 on master.

Copy link
Contributor

@geek-caroline geek-caroline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you understand the maths properly :)

I'll probably pop by on Monday for a full explanation but I think your comments are pretty clear. Thank you for looking into this properly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants