New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connections in CLOSE_WAIT are not closed and hang around seemingly forever #2169
Comments
|
According to http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html I suppose the problem is that the AWS Application Load Balancer Target Group's health check has sent the FIN, and the Jetty server may have sent the ACK, however, nothing ever gets the Jetty socket out of CLOSE_WAIT anymore. If I understand things correctly, it would be up to the server (Jetty in this case) to call close() on the socket after having received the FIN and having sent the ACK. Only then would it send the FIN to the health checker, and the socket could reach the CLOSED state. Under these assumptions this really sounds like a Jetty issue to me. How come Jetty wouldn't close() the CLOSE_WAIT sockets after having received the FIN from the health checker? |
|
@axeluhl perform a server dump (see jmx) when the server is in this state. Are you using SSL? |
|
I agree it sounds like a jetty issue, but we've never been able to reproduce it. Typically we'd not look at a 9.2.2 issue without commercial support, but I have also seen reports of this on later versions, so I would really like to get to the bottom of it. We need a reproduction so we can verify what is going on... or at least a bit more info. Is there any chance that you can get a tcpdump/wireshark trace of the health check conversations that end up in close wait. I know it sounds like it should be obvious... ie AWS sends a FIN, we send ACK, but then never send a FIN.... but it is key to understand exactly where in the conversation this is happening, because Jetty should read -1 when the FIN is received and close the connection, thus sending the FIN... but obviously we are not doing this for these particular requests, so it would be good to get a trace so that we can try playing back a near identical conversation and see if we can reproduce. Edit: also the server dump that @joakime asked for would be good... although I think it has less info in 9.2.2 than now. |
|
We're using the following VM (in reply to @joakime 's question): I also managed to connect to the JVM using JConsole, navigate to the org.eclipse.jetty.jmx/mbeancontainer/0/Operations/dump node and create a dump output. What to look for in the output? I cannot find anything about "selector" or "Selector" in the dump. |
|
And no, our Jetty installation doesn't use SSL. It runs in an Equinox OSGi environment, together with several web bundles that it serves. The load balancer's target group was configured to obtain a static HTML resource from one of these web bundles in its health check using HTTP against our Jetty port (8888 in this case). |
|
Ah, now I think I see. org.eclipse.jetty.server/serverconnector/HTTP/1.1@1e35587a/0/Operations has the dump() method you were probably looking for. I've attached the output. Here is what the tcp 0 0 :::8888 :::* LISTEN 22664/java The output of java 22664 sailing 79u IPv6 57717081 0t0 TCP localhost:ddi-tcp-1->localhost:56842 (CLOSE_WAIT) |
|
@axeluhl thanks for that, it does confirm that this is a jetty bug as you can see the state of the endpoints is I'll do you a deal! If you can get me a tcpdump or wireshark capture that gives me enough information to reproduce on 9.2.2 (so I can then test if 9.4.x is vulnerable), I'll produce a patch for 9.2.2 (even though it is end-of-life) for you... even if 9.4.x is not vulnerable to this. |
|
@gregw, cool stuff. Are you saying 9.4.x wouldn't be susceptible to this type of issue anymore? That would be reason enough for us to upgrade. |
|
Here's a tcpdump of what the load balancer tries when health-checking the broken instance: |
|
@axeluhl I would like to think that 9.4.x is not vulnerable to this, but I cannot say for sure as I can't reproduce in 9.2.2, so I can't test the same scenario in 9.4.x. However, a lot of the work we have done in 9.4 has been on the whole close conversation, so there is a good chance it is fixed. 9.4.x is also going to be maintained for several years to come, while we are trying to drop 9.2.x as best we can. So if you can upgrade, do upgrade... for many reasons not least that it may fix this bug. Thanks for the tracedump, but I'd like to get that as a pcap file. Can you use the -w option to write it to a file and then attach the file here. thanks |
|
There you go. HTH |
|
@axeluhl thanks for that... but small complication is that I go on vacation for 2 weeks from tomorrow... so I wont get a chance to look in detail until mid feb. Sorry for the delay |
|
Looks like I'm also running into this issue on jetty9.3.14 that is packaged into solr6.6.2 solr [ /opt/solr ]$ netstat -ptan | awk '{print $6 " " $7 }' | sort | uniq -c
8425 CLOSE_WAIT -
92 ESTABLISHED -
1 FIN_WAIT2 -
1 Foreign Address
6 LISTEN -
712 TIME_WAIT -
1 established)
solr [ /opt/solr ]$ echo "run -b
org.eclipse.jetty.server:context=HTTP/1.1@63e2203c,id=0,type=serverconnector dump " | java -jar jmxterm-1.0.0-uber.jar -l localhost:18983 -v silent -n > jettyJmx.out
solr [ /opt/solr ]$ netstat -anp | grep CLOSE_WAIT | head -n1
tcp 1 0 10.xxx.x.xx:8983 10.xxx.x.xxx:53873 CLOSE_WAIT 1/java
solr [ /opt/solr ]$ grep "10.xxx.x.xxx:53873" jettyJmx.out
| | +- SelectionKey@5ee4ef9f{i=0}->SelectChannelEndPoint@69feb476{/10.xxx.x.xxx:53873<->8983,Open,in,out,-,-,1712/5000,HttpConnection@c93d7fa}{io=0/0,kio=0,kro=1}
solr [ /opt/solr ]$ cat jettyJmx.out | grep 8983 | grep SelectChannelEndPoint | grep "{io=0/0,kio=0,kro=1}" | wc -l
8220 |
|
@axeluhl I'm looking at this issue. From the pcap I can see the following pattern repeated over and over: Jetty typically does not reply this abruptly with a Can you enable the server DEBUG logs ? What we need to understand here is whether it's Jetty that sends that |
|
@sbordet - is there something I can provide since this issue is impacting our production systems ? |
|
@mohsinbeg we need to verify that your case is exactly the same as reported by @axeluhl. If so, what are your answers to my questions above ? |
Yes.
Yes, solr is logging all POST requests in it's log
It's running inside kubernetes cluster in docker on AWS EC2 instance.
No
tcpdump is attached. netstat before and after tcpdump capture |
|
At least in our case we saw a lot of EOFExceptions when traffic was going through Netscalers (NS). From NS end it seemed that NS closed idling connections and sent RST to socket client. But that connection was used over and over again by jetty, as there's no default idle connection timeout set. When we changed clientside idle connection timeout to smaller value than NS idle connection timeout, issue was gone. This happened with jetty-client 9.4.7.v20170914. |
|
@mohsinbeg the TCP dump (well it's missing 55 frames) shows that there are 3 FIN sent by the client that are immediately acknowledged by the server at around 2:22:15.8959. |
|
@sbordet - this issue was related to solr v6.6.2 indexing settings that were impacting jetty. Once solr's solrconfig.xml was tuned for the ingestion load, jetty issues went away. http://lucene.472066.n3.nabble.com/9000-CLOSE-WAIT-connections-in-solr-v6-2-2-causing-it-to-quot-die-quot-td4373765.html has the details.
|
|
@mohsinbeg thanks for the feedback on this issue. @axeluhl do you have more information about this issue ? |
|
We've upgraded our Jetty installation now to 9.4.8. I'll report back here how it goes after the upgrade the next time our server instances get under a lot of pressure. Thanks for now! |
|
@axeluhl we have got the same issue, could you please report yours when you upgraded. Thank you. |
|
The issue hasn't occurred again after we upgraded. However, we've also been cautious to configure the health checks through the Apache reverse proxy. Since we've never run into the issue when the health check went through Apache, I cannot really say whether the Jetty upgrade fixed the issue. |
|
I am facing the above issue in jetty 9.3.14. Has this issue has been resolved and in which version of jetty it has been resolved? If not is there any hack to resolve it? Its been a blocker on my production environment. |
|
@hagrrwal so far this issue has been proven to not be a Jetty problem. Please update to the latest Jetty version (9.4.11 at this time) and report back if you still have the issue. |
|
@sbordet I have updated jetty version with 9.4.11 and deployed it however issue still persist. TCP Connection is going in CLOSE_WAIT forever and not getting closed. |
|
@hagrrwal how do you know it's Jetty that generates those CLOSE_WAIT? I ask because in all other cases, it turned out to be some other library used by the application deployed in Jetty. |
|
@sbordet Embedded jetty is used with Glassfish for container & dependency injection. |
|
@hagrrwal in that dependency tree, you have 2 servers, and 1 client. |
|
Make sure your nginx configuration does not use any sort of connection |
|
OP has filed 2 stackoverflow issues around this ... |
|
@sbordet and @joakime Thank you for looking into it, I have found the cause for my issue. By analysing thread dump I figured out application threads are going in waiting state (java.lang.Thread.State: WAITING (parking)) causing total lockup for entire application and Jetty as well. Basically some other library used by the application deployed in Jetty caused thread wait. Server is jetty and client is ngnix. |
|
Closing the issue as the 3 reports turned out to not be Jetty issues. |
|
hi, I am running the latest Jetty 9.4.14.v20181114 on Java 11. It's possible that something has gone wrong in my application code leading to requests never returning but even if that was case Jetty could probably do something better about it. You could have a sweeper or something that kills off connections after a certain timeout logging that stacktrace that lead to that connection being stuck? (not sure if possible). Unfortunately now I can't even get a thread dump out of the process as all its files are exhausted. thanks! |
|
@mrossi975 please capture a Jetty Server dump and report back. We'll be happy to reopen this issue once the dump has been provided. |
|
@mrossi975 Did you fix your issue? We are having the same problem in our production. Any help is appreciated. |
Alex Uhl recently commented on old jetty issue https://bugs.eclipse.org/bugs/show_bug.cgi?id=482117 that was closed in 2015 as a WONT_FIX - or really a "can't fix in java". Here's his comments copied from bugzilla, but the reader should ensure to follow the link above to the original issue to obtain all the necessary context:
The text was updated successfully, but these errors were encountered: