failure with Airbrake takes down entire app #23

Closed
skippy opened this Issue Oct 11, 2011 · 4 comments

Comments

Projects
None yet
4 participants
@skippy

skippy commented Oct 11, 2011

related to: #17

when the SSL misconfiguration error is thrown for that hoptoad, it bubbles up the exception. We use hoptoad in a lot of cases to notify us when something happened but we then swallow the error and continue on...and right now it is brining things to a sudden halt.

The SSL misconfiguration issue aside, this concerns me because it means the app is dependent upon a 3rd party system to function; this isn't fire and forget but fire and hope Airbrake doesn't fail or take 5 seconds to return. I would be very interested in working with you all to convert this over to a separate thread that doesn't get in the way of the application. If you are open to that let me know!

thanks

@airbrakeben

This comment has been minimized.

Show comment
Hide comment
@airbrakeben

airbrakeben Oct 14, 2011

Hi Skippy,

We are in the process of moving to http://www.airbrake.io and we have added a new SSL certificate. Can you update the sender test to https://airbrake.io on line 114 & 130.

Can you test this and let me know if your still seeing this issue?

Hi Skippy,

We are in the process of moving to http://www.airbrake.io and we have added a new SSL certificate. Can you update the sender test to https://airbrake.io on line 114 & 130.

Can you test this and let me know if your still seeing this issue?

@skippy

This comment has been minimized.

Show comment
Hide comment
@skippy

skippy Oct 19, 2011

still failing:

https://gist.github.com/1299824

this is a slightly unrelated item though; the way the airbrake code is currently structured, if anything happens it can take down the entire host site, or delay it by up to 5 seconds. That is the issue I'm hoping to raise in this ticket.

thanks!

skippy commented Oct 19, 2011

still failing:

https://gist.github.com/1299824

this is a slightly unrelated item though; the way the airbrake code is currently structured, if anything happens it can take down the entire host site, or delay it by up to 5 seconds. That is the issue I'm hoping to raise in this ticket.

thanks!

@skippy

This comment has been minimized.

Show comment
Hide comment
@skippy

skippy Oct 21, 2011

hey @airbrakeben,

so now that #17 is on its way to being resolved, what do you think about this one? I think there are three levels of fixes, and at least one of them should be done to increase robustness:

  • have a global rescue clause for the send method. Right now you only wrap the http call, and only for some exception. While this covers most cases it doesn't catch all (like #17) and a rescue for the entire method is an easy and safe way to make sure hoptoad doesn't bring down an app again
  • wrap the send call in a simple thread. This is the route we ended up doing, along with a method scoped rescue, because we didn't like the notion of hoptoad being able to hold up the main thread for upto 5 seconds. I know it has gotten a lot faster and that that is an edge case, but by putting it into its own thread it guarantees it won't be an issue. This is a simple and clean implementation but the catch is that we create and then destroy a thread on every Hoptoad call. We don't make that many calls a day (just a few dozen, and mostly for things that are in a 'warning' state), but it will get expensive if called in a tight loop.
  • go a more complicated thread-pooled approach. I haven't fully reviewed the call but it looks like there is no shared state, so it wouldn't be too difficult to do, but it will be more complicated and require thorough testing. This is the approach other API's like NewRelic take.

thoughts?

skippy commented Oct 21, 2011

hey @airbrakeben,

so now that #17 is on its way to being resolved, what do you think about this one? I think there are three levels of fixes, and at least one of them should be done to increase robustness:

  • have a global rescue clause for the send method. Right now you only wrap the http call, and only for some exception. While this covers most cases it doesn't catch all (like #17) and a rescue for the entire method is an easy and safe way to make sure hoptoad doesn't bring down an app again
  • wrap the send call in a simple thread. This is the route we ended up doing, along with a method scoped rescue, because we didn't like the notion of hoptoad being able to hold up the main thread for upto 5 seconds. I know it has gotten a lot faster and that that is an edge case, but by putting it into its own thread it guarantees it won't be an issue. This is a simple and clean implementation but the catch is that we create and then destroy a thread on every Hoptoad call. We don't make that many calls a day (just a few dozen, and mostly for things that are in a 'warning' state), but it will get expensive if called in a tight loop.
  • go a more complicated thread-pooled approach. I haven't fully reviewed the call but it looks like there is no shared state, so it wouldn't be too difficult to do, but it will be more complicated and require thorough testing. This is the approach other API's like NewRelic take.

thoughts?

indirect added a commit to indirect/airbrake that referenced this issue Nov 3, 2011

@dvdplm

This comment has been minimized.

Show comment
Hide comment
@dvdplm

dvdplm Dec 5, 2011

Contributor

I like the method-level catch-all rescue as a first&easy way to counter this.

Am curious about your threaded posting solution. Do you mind sharing a gist of your implementation? Would like to run some benchmarks and see what the mem/cpu impact would be for a few scenarios.

Thanks!

Contributor

dvdplm commented Dec 5, 2011

I like the method-level catch-all rescue as a first&easy way to counter this.

Am curious about your threaded posting solution. Do you mind sharing a gist of your implementation? Would like to run some benchmarks and see what the mem/cpu impact would be for a few scenarios.

Thanks!

@ghost ghost assigned dvdplm Dec 5, 2011

dvdplm added a commit that referenced this issue Dec 5, 2011

Moves http setup code to own method
Reorganizes unit tests in contexts
Airbrake#setup_http_connection rescues exceptions, logs them and re-raises
Airbrake#send_to_airbrake rescues exceptions, logs and swallows (returns nil). Fixes issue #23

@shime shime closed this Apr 26, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment