New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rescue all errors when sending/emitting from the handler #98
Conversation
I decided to go with the blanket rescue after thinking about this more. |
@olivielpeau: This is rolled out in our environment as Not sure what protocol is here around releasing versions but I added a changelog entry and left the version as 0.11.1. Happy to modify as needed. |
Dang. After a quiet period following rolling this out we ended up more Net::ReadTimeout cron emails, so this didn't fix it. I'll keep digging. |
It turns out this line was the culprit: https://github.com/DataDog/dogapi-rb/blob/13444cf/lib/dogapi/common.rb#L106 To prevent the cron emails we simply redirected STDERR out to a file. |
Thanks for your investigation @borgstrom, this error may sound like something the handler should handle, unless the workaround you found is good enough for your use case (I'm wondering if redirecting stderr out to a file could also silence other more important errors) |
Hi @olivielpeau ( 👋 ), We did some restructuring to the way we execute chef (no more dumping STDERR to a file) and this came back on our radar. After some quick poking around this morning I found that since Can you take another look at this and let me know if you have any feedback. I'll keep my eye on our chef error logs and cron output for the next couple days and report back if this isn't working (but things have stopped this morning, and it has been particularly noisy today). |
Sorry this fell off my radar @borgstrom, your PR is still relevant, I've made a few minor adjustments and will merge it shortly so it can be released with the next release of this handler. Thanks for your contribution! |
When running Chef (zero) via Cron we see sporadic cron emails that contain just
Net::ReadTimeout
:Also, we have seen one email that contains:
This error stems from the base ruby API library: https://github.com/DataDog/dogapi-rb/blob/ca998bc9aea7a067dae3ea3fae58db6591315f11/lib/dogapi/common.rb#L54-L58
I would like to avoid a storm of cron emails if Datadog is actually down, but am unsure of the best way to handle this. One thought was to just make the existingrescue
block that I addedNet::ReadTimeout
to simply rescue from ALL errors, otherwise I believe the best case would be to have the base API raise an error class instead of a simple string so we can rescue from it here.This makes the handler rescue all errors and log them as Chef errors.
Let me know what your thoughts are. Happy to modify as needed.
Thanks!