Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOPS-1034 added handling of summarize(sumSeries) healthchecks #164

Merged
merged 1 commit into from
Oct 26, 2021

Conversation

AniaMakes
Copy link
Contributor

@AniaMakes AniaMakes commented Oct 25, 2021

Following #158 we've had a bunch of results come up on the dashboard. One thing that we noticed is that all of the problematic health checks included sumSeries graphite functionality.

There are three various functions that use that:
summarize (which adds the series), asPercent (which adds all of series 1, then all of series 2, then displays them as a percentage of one of the other), and divideSeries (presumably adds all of series 1, then all of series 2 and divides one by the other).

The graphite render API does not, in fact, return the result of the query / equation if there is more than one datapoint in the datapoints array - and we need to add handling for each of the cases in n-health. In fact even in Grafana the "single integer view" is generated from the points underneath.

This PR adds handling for summarize(sumSeries).

An example response (from next-retention), https://graphitev2-api.ft.com/render/?format=json&from=-12hour&target=summarize(sumSeries(next.heroku.retention.*.prod.manageSubscriptionConfirmationHandler.transition-applyTransition.success),%20%221hour%22,%20%22sum%22,%20true)

[{
"datapoints": [[null, 1635125460], [null, 1635129060], [1.0, 1635132660], [1.0, 1635136260], [null, 1635139860], [3.0, 1635143460], [1.0, 1635147060], [1.0, 1635150660], [3.0, 1635154260], [3.0, 1635157860], [null, 1635161460], [2.0, 1635165060]], 
"target": "summarize(sumSeries(next.heroku.retention.*.prod.manageSubscriptionConfirmationHandler.transition-applyTransition.success), \"1hour\", \"sum\", true)", 
"tags": {"aggregatedBy": "sum", "summarize": "1hour", "name": "sumSeries(next.heroku.retention.*.prod.manageSubscriptionConfirmationHandler.transition-applyTransition.success)", "summarizeFunction": "sum"}}]

In the original code we have result.datapoints.some(value => { if (value[0] === null) {... - what happened in the example above is that for each tuple array in the datapoints array the code was checking for value[0] === null, and tripping over each null for no reason - because in the case of sumSeries, the individual nulls don't matter - it is the sums of value[0] that does.

An alternative solution which would fix next-retention but is likely to trip people up again int he future would be to replace
https://graphitev2-api.ft.com/render/?format=json&from=-12hour&target=summarize(sumSeries(next.heroku.retention.*.prod.manageSubscriptionConfirmationHandler.transition-applyTransition.success),%20%221hour%22,%20%22sum%22,%20true)
with
https://graphitev2-api.ft.com/render/?format=json&from=-12hour&target=summarize(sumSeries(next.heroku.retention.*.prod.manageSubscriptionConfirmationHandler.transition-applyTransition.success),%20%2212hour%22,%20%22sum%22,%20true)
(from 1hour to 12hour). Then we'd only get a single datapoint.

@AniaMakes AniaMakes requested a review from a team as a code owner October 25, 2021 15:34
Copy link
Contributor

@serena97 serena97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some minor suggestions but it looks really good to me! It makes sense to handle the summarise here

test/graphiteThreshold.check.spec.js Outdated Show resolved Hide resolved
threshold: 1,
direction: 'below'
}));
check.start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: to make the test more targeted to what we're testing, we can call the tick() function specifically instead of check.start() and assert the result of tick()
eg.

const result = await tick();

expect(result.isFailing).to.be.false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very neat idea, but a bit out of scope - if I change it here, then I'd need to change it throughout this file - and possibly the repo. Maybe something to add to Platforms' Gardening Day?

@AniaMakes AniaMakes merged commit d177efb into main Oct 26, 2021
@AniaMakes AniaMakes deleted the sum-series-zero branch October 26, 2021 10:45
JSRedondo pushed a commit that referenced this pull request Jun 20, 2022
NOPS-1034 added handling of summarize(sumSeries) healthchecks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants