New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
basic/digest authentication causes connection pool leaks #785
Comments
I agree, the code is wrong. Offering back to the pool only happens when the drain callback is called, which never happens when there won't be any more chunks received (regular transfer-encoding). Then, I wonder why we don't always reuse the current channel for sending the next request with the credentials. I think I recall asking @jfarcand about this, and I think his answer had something to do with performance. @jfarcand: WDYT? I'm in favor of always trying to reuse. |
@slandelle thanks for jumping on this so quickly. I'm meeting some resistance in my own efforts at a fix because I'm having trouble grokking the lifecycle of the request (e.g. the relationship between My naive first attempt here: https://github.com/chrisrhut/async-http-client/compare/AsyncHttpClient:1.9.x...chrisrhut:1.9.x?expand=1 introduced a unit test failure in
so it's feeling like I'm not familiar enough with the inner workings to contribute code safely. But please let me know if I can help further on this! |
No pro :) |
BTW, you can use preemptive auth as a workaround so you won't get a 401. |
Thanks for the reminder about preemptive auth. Our service actually uses Digest auth and it's a bit non-trivial to implement a request facility that captures and cache the nonce after a successful request, particularly with the limited access afforded by the Play Framework layer. But that is definitely something worth investigating. |
@slandelle - I'm exploring the preemptive auth option. Is there any way to retrieve the "filled-in" request realm post-response so that I can cache the digest nonce? So far it seems like it's inaccessible (the only realm you can access is still the original empty one). Thanks again! |
@chrisrhut I've pushed a fix in the I'm actually not sure you'll be able to test it in your app: AFAIK, Play 2 still uses 1.8 and not 1.9, those are not compatible, and I don't know if they plan to upgrade. |
I don't think so |
@jfarcand Generally speaking, I think we should reuse the current channel for other cases too (redirect, 407, 100, etc). The only exception is when redirecting to another host. WDYT? |
Netty: Reuse channel on 401, close #785
implemented in 1.9.x in 5c501f4 |
@chrisrhut Could you please give it a try? |
@slandelle sorry for the delay - the day job interfered :) Test cases run with 1.9.4 and show that this has been fixed. Thank you! |
Great, thanks for your help |
Discovered this while load testing a service (built on the Play 2 framework) which makes async HTTP calls to a backend database: repeated concurrent requests eventually cause the OS to run out of file handles and fail due to too many open sockets.
netstat
revealed several thousand open HTTP connections.I have traced the issue to a problem in how AsyncHttpClient handles 401 responses.
Reproducing / Test Case
A simple app that demonstrates the bug is available here: https://gist.github.com/chrisrhut/74988852e5313c3d613b (specify the VM argument
-Dorg.slf4j.simpleLogger.log.com.ning.http=debug
to enable log output).In a terminal window, have this running in the background to see the issue at hand:
while true; do netstat -an | grep "50.16.189.35"; echo ""; sleep 3; done
IMPORTANT: because of the way it's hosted, the IP address of httpbin.org changes with some frequency, so make sure to set it accordingly.Example terminal output after 20 loops:
A key observation is that even with all these established connections, the log output for the code displays:
[Hashed wheel timer #1] DEBUG com.ning.http.client.providers.netty.channel.pool.DefaultChannelPool - Entry count for : http://httpbin.org:80 : 1
Investigation / Root Cause
Further investigation of the log output yields more clues. Each GET request (after the very first one) displays:
So we (correctly) took a connection from the connection pool ("Using cached Channel") to make the initial request, but received a 401 and authentication is required. At this point a single thread should be able to just re-use the existing connection to make the authenticated request, but the log output continues:
So, the authenticated request uses a "non cached" channel and after the 200 OK this new channel is added to the connection pool.
Tracing into the code I discovered that sure enough, for each run of execute(), ChannelPool.poll() is called twice but ChannelPool.offer() is only called once. So a channel is orphaned, requiring the process to internally spawn a new one for every request.
Possible solution
I believe the issue is rooted in
HttpProtocol.exitAfterHandling401()
. The code first checks for NTLM followed by SPEGNO/Kerberos; in those two casesfuture.setReuseChannel()
is set to true. In the else case (including Basic/Digest) this flag is not set. Futhermore, in the callback:we don't close the channel in the case that it's not going to be reused. I noticed that exitAfterHandlingRedirect() in
Protocol.java
does not exhibit this bug; it's avoided by callingChannelManager.tryToOfferChannelToPool()
in its callback. Perhaps a similar step should be carried out here.So I believe the solution is two-fold (I will submit a pull request after filing this issue):
Thanks!
The text was updated successfully, but these errors were encountered: