Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IA-1326] adding metric for async cluster creation failure #1081

Merged
merged 4 commits into from Oct 15, 2019

Conversation

jdcanas
Copy link
Collaborator

@jdcanas jdcanas commented Oct 1, 2019

Will add alert after this is in.

Have you read CONTRIBUTING.md lately? If not, do that first.

I, the developer opening this PR, do solemnly pinky swear that:

  • I've documented my API changes in Swagger

In all cases:

  • Get a thumbsworth of review and PO signoff if necessary
  • Verify all tests go green
  • Squash and merge; you can delete your branch after this
  • Test this change deployed correctly and works on dev environment after deployment

@@ -210,6 +210,8 @@ class ClusterMonitorActor(val cluster: Cluster,
for {
// update the cluster status to Error
_ <- dbRef.inTransaction { _.clusterQuery.updateClusterStatus(cluster.id, ClusterStatus.Error) }
_ <- Metrics.newRelic.incrementCounterFuture("asyncClusterCreationFailure")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have this count broken down into different error cases (some general error cases say Initialization action failed, Google error, Internal server error etc. I select * from CLUSTER_ERROR limit 100 to gather some ideas about what some of those categories can be)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree this would be useful and could be easily accomplished by including the error code in the metric name, it would make alerting it a challenge, as newrelic does not support wildcards in this fashion https://discuss.newrelic.com/t/feature-idea-custom-metric-wildcards/23751

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a few major categories like the ones I pointed out (@rtitle might have a better idea what better categories we should monitor), which shouldn't be many. And create alerts for each of them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I added the code to the metric. There are only 17 distinct ones. We can choose to alert on which ones we find relevant and change that at will.

@jdcanas jdcanas merged commit c0009cb into develop Oct 15, 2019
@jdcanas jdcanas deleted the jc-failure-notification branch October 15, 2019 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants