-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(daemon): turn off net/http behaviour of recovering from panics #350
base: master
Are you sure you want to change the base?
Conversation
defer func() { | ||
err := recover() | ||
if err != nil { | ||
buf := make([]byte, 1024*1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1MB is plenty huge. We could pass runtime.Stack
ever-increasing buffer sizes till it fits, like the code for debug.Stack
does, but that seems overkill. For reference, the large MAAS goroutine stack dump in this comment is about 100KB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use debug.Stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because debug.Stack
only includes the current goroutine, whereas runtime.Stack
takes an all
parameter which we're setting to true to get the stack trace from all goroutines.
By default when Go exits due to a panic it just prints the current goroutine's stack trace (ala debug.Stack
), but that's not particularly helpful for debugging deadlock issues like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall this is an improvement, as we will now see panics as they happen, but it may lead to some unexpected panics in production uses of pebble until we identify the cause of the deadlock that spawned this line of inquiry.
defer func() { | ||
err := recover() | ||
if err != nil { | ||
buf := make([]byte, 1024*1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use debug.Stack?
@hpidcock The thing is, those "unexpected panics in production uses" have a decent chance of deadlocking the entire Pebble process (if a lock is held with |
Given we now know panic-recovery isn't the cause of the issue at hand, I'm going to come back to this after reproducing and fixing #314, but I'll leave this PR open -- I think it's probably a good idea in any case. |
TODO ben: 1) add tests of this, 2) use something more like
debug.Stack
to avoid always allocating 1MB on panic when this occurs.By default, Go's
net/http
recovers panics in theServeHTTP
goroutine and keeps serving. However, if a lock is held when the panic happens, this will mean a complete deadlock and Pebble will go unresponsive. This is almost certainly what's happening in #314.There's no direct way of opting out of this
net/http
behaviour, so you have torecover()
from panics yourself, print the goroutine dump, and exit.I've tested this locally, and it prints the goroutine dump and exits with code 1 on a panic as expected, instead of locking up. It handles any panic in the handler or middleware, eg in
logit
).