New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent "use of closed network connection" in graphite GET's #908
prevent "use of closed network connection" in graphite GET's #908
Conversation
I would sometimes see errors like: graphiteBand: graphite RequestError (http://....): Get failed: Get http://... : read tcp 10.90.128.100:80: use of closed network connection This kind of error is not something that should bubble up to the caller of a http client library, but it does. see also: golang/go#8946 golang/go#9424 there's a bunch more issues about the broken state of error handling in net/http. So anyway the http client tries to reuse an existing connection which has broken. Somehow this is the caller's problem, so we address it by not keeping any idle connection and opening a new connection each time. This should get rid of these errors without adding much overhead. Note that the used http transport is, other than the MaxIdleConnsPerHost setting, the same as the default transport.
We have this issue with OpenTSDB as well. That indicates to me that we have On Fri, Apr 17, 2015 at 12:40 PM, Dieter Plaetinck <notifications@github.com
|
the bug is not really in bosun itself. i've been talking to some people on #go-nuts and everyone seems to agree it's an issue with http.Client, hence the many tickets on the golang issue tracker |
@@ -100,6 +101,15 @@ func readTraceback(resp *http.Response) (*[]string, error) { | |||
// DefaultClient is the default HTTP client for requests. | |||
var DefaultClient = &http.Client{ | |||
Timeout: time.Minute, | |||
Transport: &http.Transport{ | |||
MaxIdleConnsPerHost: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this is different from the default. http.DefaultTransport doesn't initialize MaxIdleConnsPerHost, which means it's already 0. And:
// MaxIdleConnsPerHost, if non-zero, controls the maximum idle
// (keep-alive) to keep per-host. If zero,
// DefaultMaxIdleConnsPerHost is used.
const DefaultMaxIdleConnsPerHost = 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://golang.org/src/net/http/transport.go?s=1324:4038#L370
360 func (t *Transport) putIdleConn(pconn *persistConn) bool {
361 if t.DisableKeepAlives || t.MaxIdleConnsPerHost < 0 {
362 pconn.close()
363 return false
364 }
365 if pconn.isBroken() {
366 return false
367 }
368 key := pconn.cacheKey
369 max := t.MaxIdleConnsPerHost
370 if max == 0 {
371 max = DefaultMaxIdleConnsPerHost
372 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't see it. I still think what I said holds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops you're right. I guess we should just set http.DefaultMaxIdleConnsPerHost to 0 for the proper fix. which also means we don't need the custom transport/client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you speculate that setting to 0 will fix these problems or have you run a test to verify it? I see no mention of MaxIdle in those go issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears you can also set it to < 0 with the same effect, without changing the global setting. This may need some thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't really have time any time soon to try to make it reproducable and verify the fix.
my admittedly crummy plan is to just set http.DefaultMaxIdleConnsPerHost to 0 , run that in prod, and check if errors disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a fine plan. We've tried many things to fix these errors, and that would make it pretty easy. I'm worried about performance, though, since we do a lot of connections between bosun and tsdb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can confirm that even with MaxIdleConnsPerHost: -1
i'm still seeing
graphite RequestError (...): read tcp ...:80: use of closed network connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢
Closing because apparently this doesn't fix it. |
shouldn't we have a (new) ticket open so that at least we keep tabs on the issue? |
I would sometimes see errors like:
graphiteBand: graphite RequestError (http://....): Get failed: Get
http://... : read tcp 10.90.128.100:80: use of closed network connection
This kind of error is not something that should bubble up to the caller
of a http client library, but it does.
see also:
golang/go#8946
golang/go#9424
there's a bunch more issues about the broken state of error handling in
net/http.
So anyway the http client tries to reuse an existing connection which
has broken. Somehow this is the caller's problem, so we address it by
not keeping any idle connection and opening a new connection each time.
This should get rid of these errors without adding much overhead.
Note that the used http transport is, other than the
MaxIdleConnsPerHost setting, the same as the default transport.