New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IA-1326] adding metric for async cluster creation failure #1081
Conversation
@@ -210,6 +210,8 @@ class ClusterMonitorActor(val cluster: Cluster, | |||
for { | |||
// update the cluster status to Error | |||
_ <- dbRef.inTransaction { _.clusterQuery.updateClusterStatus(cluster.id, ClusterStatus.Error) } | |||
_ <- Metrics.newRelic.incrementCounterFuture("asyncClusterCreationFailure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have this count broken down into different error cases (some general error cases say Initialization action failed
, Google error
, Internal server error
etc. I select * from CLUSTER_ERROR limit 100
to gather some ideas about what some of those categories can be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree this would be useful and could be easily accomplished by including the error code in the metric name, it would make alerting it a challenge, as newrelic does not support wildcards in this fashion https://discuss.newrelic.com/t/feature-idea-custom-metric-wildcards/23751
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a few major categories like the ones I pointed out (@rtitle might have a better idea what better categories we should monitor), which shouldn't be many. And create alerts for each of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I added the code to the metric. There are only 17 distinct ones. We can choose to alert on which ones we find relevant and change that at will.
0de737e
to
9b83db9
Compare
Will add alert after this is in.
Have you read CONTRIBUTING.md lately? If not, do that first.
I, the developer opening this PR, do solemnly pinky swear that:
In all cases: