-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gauge type metric stops populating in AWS CloudWatch after some period of time #713
Comments
Hi @peterlitvak, that sounds frustrating! Would you be able to post your full configuration and more details of your setup? Some questions that would be useful to know the answer to:
|
Thank you for the quick response, here are the details in order:
I also added
One more pice of information is that if when issue occurs I restart the statsd service on the host that publishes actual values the cloudwatch starts displaying data. |
Thanks for the info, especially the config and plugin link. So in theory to duplicate your setup I could setup an app which updated a gauge sporadically and pumped the metrics into statsd and then into cloudwatch using that plugin? How sporadic is the issue, is it once every few days or once every few hours?
Right, so it sounds like it's probably not related to your application but something after you've published metrics. I'm guessing you're using UDP and not TCP to send into statsd? If we can know the rough interval I can setup a local demo and leave it running over night or something and see if I can re-create the issue. I run a bunch of statsd systems with heavy reliance on gauges but none of them go into cloudwatch so my gut tells me it's probably something to do with the plugin, but I have no evidence of that. I see you've opened an issue on the plugin as well. Linking here for future reference: dylanmei/statsd-cloudwatch-backend#5 |
It was happening every 2-3 days. As of now metrics have been published normally for about 2.5 days so if pattern holds it should stop within next day or so. |
Okay great thanks for the info @peterlitvak, I'd be interested to see what the cloudwatch plugin maintainers say. With something like this I'd expect to see a lot of issues very quickly if we had this problem with our graphite backend for example, and I'm personally running one with a graphite backend and I've not had this problem without any restarts for a few months now. Are you on the latest version of statsd? I could try and set up a proof but it'll mean spinning up some infra on aws. I'll give the cloudwatch-backend maintainers a bit of time to respond since they've not made any updates since 2015. If they don't then it might be a case of debugging their module. It totally could be statsd, but that would impact every backend which doesn't seem to be the case at the moment. |
We are at v0.8.6 of statsd. I understand it could be a number of things and greatly appreciate you looking in to this. It is especially hard to troubleshoot since it is pretty sporadic. For example everything is working fine for 4 days in a row now. |
No worries, I've had something running for the last 24h and haven't had the issue, I'll keep it running for a bit longer but since I've not seen this with non-aws related backends I'm incline to say it's unlikely to be the statsd daemon right now. I'll keep things running and see what happens, any logs you do manage to get would be fantastic but I understand that's tough given the scenario. I'm wondering if we could make a logging change to use a log file rotation, it's not something we've needed in the past but it could be the time for it if this issue persists. |
Appreciate your attention to the issue. I've changed our staging code to report same value for the gauge from all of the nodes, will see if that positively affects the stability. |
I hav a gauge type metric that is sent from multiple nodes to AWS cloudwatch. Only one node sends the actual value and the rest sends 0. After some time AWS CloudWatch stops updating the metric in a dashboard (0 is shown).
While troubleshooting the issue I can see in the logs that correct metric values are coming to statsd service while AWS CloudWatch is showing 0.
Looking for any clues as to what can be an issue here.
Statsd version: 0.8.6
The text was updated successfully, but these errors were encountered: