Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent "use of closed network connection" in graphite GET's #908

Conversation

Dieterbe
Copy link
Contributor

I would sometimes see errors like:
graphiteBand: graphite RequestError (http://....): Get failed: Get
http://... : read tcp 10.90.128.100:80: use of closed network connection

This kind of error is not something that should bubble up to the caller
of a http client library, but it does.
see also:
golang/go#8946
golang/go#9424

there's a bunch more issues about the broken state of error handling in
net/http.
So anyway the http client tries to reuse an existing connection which
has broken. Somehow this is the caller's problem, so we address it by
not keeping any idle connection and opening a new connection each time.
This should get rid of these errors without adding much overhead.

Note that the used http transport is, other than the
MaxIdleConnsPerHost setting, the same as the default transport.

I would sometimes see errors like:
graphiteBand: graphite RequestError (http://....): Get failed: Get
http://... : read tcp 10.90.128.100:80: use of closed network connection

This kind of error is not something that should bubble up to the caller
of a http client library, but it does.
see also:
golang/go#8946
golang/go#9424

there's a bunch more issues about the broken state of error handling in
net/http.
So anyway the http client tries to reuse an existing connection which
has broken.  Somehow this is the caller's problem, so we address it by
not keeping any idle connection and opening a new connection each time.
This should get rid of these errors without adding much overhead.

Note that the used http transport is, other than the
MaxIdleConnsPerHost setting, the same as the default transport.
@kylebrandt
Copy link
Member

We have this issue with OpenTSDB as well. That indicates to me that we have
a bug in the way we do connection pooling with Bosun. I think we should fix
that across the board, or change to not pooled connections across the board
for now.

On Fri, Apr 17, 2015 at 12:40 PM, Dieter Plaetinck <notifications@github.com

wrote:

I would sometimes see errors like:
graphiteBand: graphite RequestError (http://....): Get failed: Get
http://... : read tcp 10.90.128.100:80: use of closed network connection

This kind of error is not something that should bubble up to the caller
of a http client library, but it does.
see also:
golang/go#8946 golang/go#8946
golang/go#9424 golang/go#9424

there's a bunch more issues about the broken state of error handling in
net/http.
So anyway the http client tries to reuse an existing connection which
has broken. Somehow this is the caller's problem, so we address it by
not keeping any idle connection and opening a new connection each time.
This should get rid of these errors without adding much overhead.

Note that the used http transport is, other than the

MaxIdleConnsPerHost setting, the same as the default transport.

You can view, comment on, or merge this pull request online at:

#908
Commit Summary

  • prevent "use of closed network connection" in graphite GET's

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#908.

@Dieterbe
Copy link
Contributor Author

the bug is not really in bosun itself. i've been talking to some people on #go-nuts and everyone seems to agree it's an issue with http.Client, hence the many tickets on the golang issue tracker

@@ -100,6 +101,15 @@ func readTraceback(resp *http.Response) (*[]string, error) {
// DefaultClient is the default HTTP client for requests.
var DefaultClient = &http.Client{
Timeout: time.Minute,
Transport: &http.Transport{
MaxIdleConnsPerHost: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this is different from the default. http.DefaultTransport doesn't initialize MaxIdleConnsPerHost, which means it's already 0. And:

    // MaxIdleConnsPerHost, if non-zero, controls the maximum idle
    // (keep-alive) to keep per-host.  If zero,
    // DefaultMaxIdleConnsPerHost is used.

const DefaultMaxIdleConnsPerHost = 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://golang.org/src/net/http/transport.go?s=1324:4038#L370

   360  func (t *Transport) putIdleConn(pconn *persistConn) bool {
   361      if t.DisableKeepAlives || t.MaxIdleConnsPerHost < 0 {
   362          pconn.close()
   363          return false
   364      }
   365      if pconn.isBroken() {
   366          return false
   367      }
   368      key := pconn.cacheKey
   369      max := t.MaxIdleConnsPerHost
   370      if max == 0 {
   371          max = DefaultMaxIdleConnsPerHost
   372      }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't see it. I still think what I said holds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops you're right. I guess we should just set http.DefaultMaxIdleConnsPerHost to 0 for the proper fix. which also means we don't need the custom transport/client

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you speculate that setting to 0 will fix these problems or have you run a test to verify it? I see no mention of MaxIdle in those go issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears you can also set it to < 0 with the same effect, without changing the global setting. This may need some thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really have time any time soon to try to make it reproducable and verify the fix.
my admittedly crummy plan is to just set http.DefaultMaxIdleConnsPerHost to 0 , run that in prod, and check if errors disappear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a fine plan. We've tried many things to fix these errors, and that would make it pretty easy. I'm worried about performance, though, since we do a lot of connections between bosun and tsdb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can confirm that even with MaxIdleConnsPerHost: -1 i'm still seeing
graphite RequestError (...): read tcp ...:80: use of closed network connection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

@maddyblue
Copy link
Contributor

Closing because apparently this doesn't fix it.

@maddyblue maddyblue closed this Apr 28, 2015
@Dieterbe
Copy link
Contributor Author

shouldn't we have a (new) ticket open so that at least we keep tabs on the issue?

@maddyblue
Copy link
Contributor

#426

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants