Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescue all errors when sending/emitting from the handler #98

Merged
merged 10 commits into from Oct 22, 2019

Conversation

borgstrom
Copy link
Contributor

@borgstrom borgstrom commented Mar 21, 2017

When running Chef (zero) via Cron we see sporadic cron emails that contain just Net::ReadTimeout:

From: Cron Daemon <root@stage-disco-006386fd6518ce570>
To: xxx@yyy.zzz
Subject: Cron <root@stage-disco-006386fd6518ce570> /bin/sleep 569.0; /usr/local/bin/exclusive_cron.sh /tmp/cron-chef-client.lock /usr/bin/run-chef-client
Date: Tue, 21 Mar 2017 17:39:41 +0000 (UTC)

Net::ReadTimeout

Also, we have seen one email that contains:

Invalid JSON Response: <!DOCTYPE html>
<html lang="en">
<head>
        <meta charset="utf-8" />
        <title>Datadog is Down</title>
...

This error stems from the base ruby API library: https://github.com/DataDog/dogapi-rb/blob/ca998bc9aea7a067dae3ea3fae58db6591315f11/lib/dogapi/common.rb#L54-L58

I would like to avoid a storm of cron emails if Datadog is actually down, but am unsure of the best way to handle this. One thought was to just make the existing rescue block that I added Net::ReadTimeout to simply rescue from ALL errors, otherwise I believe the best case would be to have the base API raise an error class instead of a simple string so we can rescue from it here.

This makes the handler rescue all errors and log them as Chef errors.

Let me know what your thoughts are. Happy to modify as needed.

Thanks!

@borgstrom borgstrom changed the title Rescue Net::ReadTimeout in handler Rescue all errors when sending/emitting from the handler Mar 21, 2017
@borgstrom
Copy link
Contributor Author

I decided to go with the blanket rescue after thinking about this more.

@borgstrom
Copy link
Contributor Author

@olivielpeau: This is rolled out in our environment as 0.11.1.pre, I documented how to do it for future contributors -- for the time being DataDog/chef-datadog#413 isn't released so I had to pull a git snapshot with Berkshelf, but I assume the main cookbook will be released soon enough that these are valid as-is.

Not sure what protocol is here around releasing versions but I added a changelog entry and left the version as 0.11.1. Happy to modify as needed.

@borgstrom
Copy link
Contributor Author

Dang. After a quiet period following rolling this out we ended up more Net::ReadTimeout cron emails, so this didn't fix it. I'll keep digging.

@borgstrom borgstrom changed the title Rescue all errors when sending/emitting from the handler [WIP] Rescue all errors when sending/emitting from the handler Mar 24, 2017
@borgstrom
Copy link
Contributor Author

It turns out this line was the culprit: https://github.com/DataDog/dogapi-rb/blob/13444cf/lib/dogapi/common.rb#L106

To prevent the cron emails we simply redirected STDERR out to a file.

@borgstrom borgstrom closed this Mar 29, 2017
@borgstrom borgstrom deleted the rescue-more branch March 29, 2017 05:04
@olivielpeau
Copy link
Member

Thanks for your investigation @borgstrom, this error may sound like something the handler should handle, unless the workaround you found is good enough for your use case (I'm wondering if redirecting stderr out to a file could also silence other more important errors)

@borgstrom borgstrom reopened this Sep 5, 2017
@borgstrom
Copy link
Contributor Author

Hi @olivielpeau ( 👋 ),

We did some restructuring to the way we execute chef (no more dumping STDERR to a file) and this came back on our radar.

After some quick poking around this morning I found that since suppress_error_if_silent was using warn since it had been initialized with silent=true, the easiest thing was just to make sure we set silent=false when we prepare_the_pack.

Can you take another look at this and let me know if you have any feedback.

I'll keep my eye on our chef error logs and cron output for the next couple days and report back if this isn't working (but things have stopped this morning, and it has been particularly noisy today).

@borgstrom borgstrom changed the title [WIP] Rescue all errors when sending/emitting from the handler Rescue all errors when sending/emitting from the handler Sep 5, 2017
@olivielpeau olivielpeau added this to the 0.12.2 milestone Oct 22, 2019
@olivielpeau
Copy link
Member

Sorry this fell off my radar @borgstrom, your PR is still relevant, I've made a few minor adjustments and will merge it shortly so it can be released with the next release of this handler. Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants