Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tuning net/http MaxIdleConnsPerHost for high throughput #98

Closed
bbrowning opened this issue Apr 3, 2019 · 8 comments
Closed

Allow tuning net/http MaxIdleConnsPerHost for high throughput #98

bbrowning opened this issue Apr 3, 2019 · 8 comments

Comments

@bbrowning
Copy link
Contributor

While load-testing some parts of Knative, I ran into an issue where the cloudevents client with HTTP transport can end up with a lot of sockets in TIME_WAIT when sending events to a single host with a high throughput.

golang/go#16012 (comment) describes both the issue and the solution for the issue I hit, which is to allow tuning the MaxIdleConnsPerHost value of the underlying net/http transport. While I can workaround this as shown in that comment, I opened this issue to discuss surfacing this in cloudevents/transport/http.

@bbrowning
Copy link
Contributor Author

The section "Repro #2: Create excessive TIME_WAIT connections by exceeding connection pool" at http://tleyden.github.io/blog/2016/11/21/tuning-the-go-http-client-library-for-load-testing/ has a nice description of what's happening here as well. And, it hints that we may need to allow tuning both MaxIdleConns and MaxIdleConnsPerHost. Or, if the Knative use-case of sending lots of CloudEvents to a single host is too specific, we can tune it there instead of here. It just requires some internal knowledge of how cloudevents sdk-go works to tune that in Knative.

@matzew
Copy link
Member

matzew commented Apr 4, 2019

I think the use-case of knative is good, it's still generic enough, and useful for a general "from x to cloudEvent" bridge. that may invoke (public) cloud services, distributing CEs, from internal events (wrapped in CEs)

@n3wscott
Copy link
Member

n3wscott commented Apr 8, 2019

The solution for this would be to set these settings on the http client in the cloudevents http transport object, like:

			t, err := cloudevents.NewHTTPTransport(
				cloudevents.WithTarget(env.Target),
				cloudevents.WithEncoding(encoding),
			)
			t.Client = myHttpClient

@bbrowning
Copy link
Contributor Author

Thanks - I'll give that a try and close this issue if that works as expected.

@bbrowning
Copy link
Contributor Author

So, with release 0.4.0 I was able to get acceptable throughput here by tuning the DefaultTransport to raise MaxIdleConnsPerHost. The ability to set a Client (and have that Client actually used for requests) appears to have landed around 0.4.2. However, upgrading to that or newer, I can never get acceptable throughput here.

The issue is

Explicitly telling the client request to close like that means the client is forced to constantly tear down and create new connections, thus leading to resource exhaustion under even moderate load when sending all CloudEvents to a single host. From the net/http docs:

        // Close indicates whether to close the connection after
        // replying to this request (for servers) or after sending this
        // request and reading its response (for clients).
        //
        // For server requests, the HTTP server handles this automatically
        // and this field is not needed by Handlers.
        //
        // For client requests, setting this field prevents re-use of
        // TCP connections between requests to the same hosts, as if
        // Transport.DisableKeepAlives were set.

So, we'll need to track down why we started explicitly asking client requests to close. I'm working on an automated test for this in Knative, but it's more of an integration test since it needs to spin up a webserver and run a sustained throughput against it for at least 30 seconds or more.

@matzew
Copy link
Member

matzew commented Apr 11, 2019 via email

@n3wscott
Copy link
Member

Perhaps the issue is on host vs multi-host targets? you want to close for multi, and leave open for a single target?

@n3wscott
Copy link
Member

Fixed via #110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants