Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic/digest authentication causes connection pool leaks #785

Closed
chrisrhut opened this issue Dec 26, 2014 · 13 comments
Closed

basic/digest authentication causes connection pool leaks #785

chrisrhut opened this issue Dec 26, 2014 · 13 comments
Assignees
Milestone

Comments

@chrisrhut
Copy link

Discovered this while load testing a service (built on the Play 2 framework) which makes async HTTP calls to a backend database: repeated concurrent requests eventually cause the OS to run out of file handles and fail due to too many open sockets. netstat revealed several thousand open HTTP connections.

I have traced the issue to a problem in how AsyncHttpClient handles 401 responses.

Reproducing / Test Case

A simple app that demonstrates the bug is available here: https://gist.github.com/chrisrhut/74988852e5313c3d613b (specify the VM argument -Dorg.slf4j.simpleLogger.log.com.ning.http=debug to enable log output).

In a terminal window, have this running in the background to see the issue at hand: while true; do netstat -an | grep "50.16.189.35"; echo ""; sleep 3; done IMPORTANT: because of the way it's hosted, the IP address of httpbin.org changes with some frequency, so make sure to set it accordingly.

Example terminal output after 20 loops:

tcp4       0      0  192.168.1.139.51857    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51856    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51853    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51852    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51849    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51846    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51845    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51841    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51838    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51837    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51834    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51832    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51829    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51822    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51821    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51817    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51814    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51813    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51810    50.16.189.35.80        ESTABLISHED
tcp4       0      0  192.168.1.139.51807    50.16.189.35.80        ESTABLISHED

A key observation is that even with all these established connections, the log output for the code displays: [Hashed wheel timer #1] DEBUG com.ning.http.client.providers.netty.channel.pool.DefaultChannelPool - Entry count for : http://httpbin.org:80 : 1

Investigation / Root Cause

Further investigation of the log output yields more clues. Each GET request (after the very first one) displays:

[main] DEBUG com.ning.http.client.providers.netty.request.NettyRequestSender - Using cached Channel [id: 0xcfd11ef5, /192.168.1.139:51930 => httpbin.org/50.16.189.35:80]
 for request 
DefaultHttpRequest(chunked: false)
GET /basic-auth/user/passwd HTTP/1.1
...
Response DefaultHttpResponse(chunked: false)
HTTP/1.1 401 UNAUTHORIZED
Www-Authenticate: Basic realm="Fake Realm"

So we (correctly) took a connection from the connection pool ("Using cached Channel") to make the initial request, but received a 401 and authentication is required. At this point a single thread should be able to just re-use the existing connection to make the authenticated request, but the log output continues:

[New I/O worker #2] DEBUG com.ning.http.client.providers.netty.handler.HttpProtocol - Sending authentication to http://httpbin.org/basic-auth/user/passwd
[New I/O worker #3] DEBUG com.ning.http.client.providers.netty.request.NettyConnectListener - Request using non cached Channel '[id: 0x872c7990, /192.168.1.139:51932 => httpbin.org/50.16.189.35:80]':
DefaultHttpRequest(chunked: false)
GET /basic-auth/user/passwd HTTP/1.1
...
Response DefaultHttpResponse(chunked: false)
HTTP/1.1 200 OK
...
[New I/O worker #3] DEBUG com.ning.http.client.providers.netty.channel.ChannelManager - Adding key: http://httpbin.org:80 for channel [id: 0x872c7990, /192.168.1.139:51932 => httpbin.org/50.16.189.35:80]

So, the authenticated request uses a "non cached" channel and after the 200 OK this new channel is added to the connection pool.

Tracing into the code I discovered that sure enough, for each run of execute(), ChannelPool.poll() is called twice but ChannelPool.offer() is only called once. So a channel is orphaned, requiring the process to internally spawn a new one for every request.

Possible solution

I believe the issue is rooted in HttpProtocol.exitAfterHandling401(). The code first checks for NTLM followed by SPEGNO/Kerberos; in those two cases future.setReuseChannel() is set to true. In the else case (including Basic/Digest) this flag is not set. Futhermore, in the callback:

Callback callback = new Callback(future) {
    public void call() throws IOException {
        channelManager.drainChannel(channel, future);
        requestSender.sendNextRequest(nextRequest, future);
    }
};

we don't close the channel in the case that it's not going to be reused. I noticed that exitAfterHandlingRedirect() in Protocol.java does not exhibit this bug; it's avoided by calling ChannelManager.tryToOfferChannelToPool() in its callback. Perhaps a similar step should be carried out here.

So I believe the solution is two-fold (I will submit a pull request after filing this issue):

  • enable "reuse channel" for Basic/Digest authentication (someone with a more intricate knowledge of the subtleties of HTTP auth, please correct me if I'm wrong)!
  • in the case where we will NOT reuse the channel, make sure it's returned to the connection pool. This will avoid the orphaned connection issue if I'm wrong about the first bullet and/or in other cases not specifically covered by NTLM/Kerberos.

Thanks!

@slandelle
Copy link
Contributor

I agree, the code is wrong. Offering back to the pool only happens when the drain callback is called, which never happens when there won't be any more chunks received (regular transfer-encoding).

Then, I wonder why we don't always reuse the current channel for sending the next request with the credentials. I think I recall asking @jfarcand about this, and I think his answer had something to do with performance.

@jfarcand: WDYT? I'm in favor of always trying to reuse.

@slandelle slandelle added this to the 1.9.4 milestone Dec 26, 2014
@slandelle slandelle self-assigned this Dec 26, 2014
@chrisrhut
Copy link
Author

@slandelle thanks for jumping on this so quickly. I'm meeting some resistance in my own efforts at a fix because I'm having trouble grokking the lifecycle of the request (e.g. the relationship between drainChannel, markAsDone, finishChannel, etc).

My naive first attempt here: https://github.com/chrisrhut/async-http-client/compare/AsyncHttpClient:1.9.x...chrisrhut:1.9.x?expand=1 introduced a unit test failure in NettyBasicAuthTest.basicAuthFileNoKeepAliveTest():

java.util.concurrent.ExecutionException: java.io.IOException: Remotely closed

so it's feeling like I'm not familiar enough with the inner workings to contribute code safely. But please let me know if I can help further on this!

@slandelle
Copy link
Contributor

No pro :)
I have a fix that needs further testing but I'd like some feedback from JFA as I'm not sure about the original intent of not reusing the connection.

@slandelle
Copy link
Contributor

BTW, you can use preemptive auth as a workaround so you won't get a 401.

@chrisrhut
Copy link
Author

Thanks for the reminder about preemptive auth. Our service actually uses Digest auth and it's a bit non-trivial to implement a request facility that captures and cache the nonce after a successful request, particularly with the limited access afforded by the Play Framework layer. But that is definitely something worth investigating.

@chrisrhut
Copy link
Author

@slandelle - I'm exploring the preemptive auth option. Is there any way to retrieve the "filled-in" request realm post-response so that I can cache the digest nonce? So far it seems like it's inaccessible (the only realm you can access is still the original empty one). Thanks again!

slandelle pushed a commit that referenced this issue Dec 28, 2014
@slandelle
Copy link
Contributor

@chrisrhut I've pushed a fix in the fix-785 branch.

I'm actually not sure you'll be able to test it in your app: AFAIK, Play 2 still uses 1.8 and not 1.9, those are not compatible, and I don't know if they plan to upgrade.

@slandelle
Copy link
Contributor

Is there any way to retrieve the "filled-in" request realm post-response so that I can cache the digest nonce?

I don't think so

@slandelle
Copy link
Contributor

@jfarcand Generally speaking, I think we should reuse the current channel for other cases too (redirect, 407, 100, etc). The only exception is when redirecting to another host. WDYT?

slandelle pushed a commit that referenced this issue Jan 7, 2015
Netty: Reuse channel on 401, close #785
@slandelle
Copy link
Contributor

implemented in 1.9.x in 5c501f4

@slandelle
Copy link
Contributor

@chrisrhut Could you please give it a try?

@chrisrhut
Copy link
Author

@slandelle sorry for the delay - the day job interfered :) Test cases run with 1.9.4 and show that this has been fixed. Thank you!

@slandelle
Copy link
Contributor

Great, thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants