Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting random too many CLOSE_WAIT states on heavy load #1473

Closed
bhaveshmaniya opened this issue Apr 14, 2017 · 24 comments
Closed

Getting random too many CLOSE_WAIT states on heavy load #1473

bhaveshmaniya opened this issue Apr 14, 2017 · 24 comments
Assignees
Labels
More Info Required Stale For auto-closed stale issues and pull requests Unable To Replicate

Comments

@bhaveshmaniya
Copy link

We have developed Jersey web application(REST APIs) and below are the details about used libraries/technologies:

  • Jersey - v2.23.1 Open JPA
  • v2.4.1(MySQL - 5.5.50)
  • Jetty - v9.3.18(JDK 8+)
  • OS - Ubuntu 14.04

Basically we are getting too many CLOSE_WAIT issue randomly. We’ve tried to figure out the solution and below are the list of references we’ve tried.

As suggested in above references we’ve updated /etc/sysctl.conf, /opt/jetty9/etc/jetty.xml and /etc/security/limits.conf files with below details:

/etc/sysctl.conf

net.ipv4.tcp_fin_timeout=20
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 16384 16777216
net.core.somaxconn=4096
net.core.netdev_max_backlog=16384
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.tcp_syncookies=1
net.ipv4.ip_local_port_range=1024 65535
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_congestion_control=cubic

Then executes

Sysctl -p

/opt/jetty9/etc/jetty.xml

<!-- Server Thread Pool -->
    <Set name="ThreadPool">
      <!-- Default queued blocking threadpool -->
      <New class="org.eclipse.jetty.util.thread.QueuedThreadPool">

        <Arg>
           <New class="java.util.concurrent.ArrayBlockingQueue">
              <Arg type="int">6000</Arg>
           </New>
      </Arg>

        <Set name="minThreads">10</Set>
        <Set name="maxThreads">200</Set>
        <Set name="detailedDump">false</Set>
      </New>
    </Set>

/etc/security/limits.conf

root hard nofile 40000
root soft nofile 40000

We’ve also gone through the code optimization steps, still no luck. I have gone through fundamental of TCP connection states, and cause of CLOSE_WAIT state then tried above ways as mentioned, also gone through many questions related to CLOSE_WAIT on stackoverflow and tried to resolve the issue as people mentioned solutions over there but didn’t get any success.

Can anyone face the same issue and found any solution?

@sbordet
Copy link
Contributor

sbordet commented Apr 14, 2017

Where do you get the CLOSE_WAITs, on the server or on the client ?

@bhaveshmaniya
Copy link
Author

We are getting it on the server. Basically server isn't responding when too many CLOSE_WAITs. By executing below command we can get the total number of CLOSE_WAITs:

lsof | grep CLOSE_WAIT | wc -l

@sbordet
Copy link
Contributor

sbordet commented Apr 14, 2017

What is your configuration for the ServerConnector idleTimeout ?

When the idleTimeout expires, do you see the number of CLOSE_WAITs reduce ?

Can you take a network dump with Wireshark that shows the problem (client closing, but the server not closing), and attach it here ?

@sbordet
Copy link
Contributor

sbordet commented Apr 18, 2017

@bhaveshmaniya point being we cannot reproduce this.

So we need a reproducible case from you, or a network dump that shows the problem, or in general further details on what is going on.

Also, are you sure that the CLOSE_WAITs are related to Jetty ? Do you have, in your server application, a client that connects to somewhere else that may generate the CLOSE_WAITs ?

@bhaveshmaniya
Copy link
Author

bhaveshmaniya commented Apr 18, 2017

What is your configuration for the ServerConnector idleTimeout ?

The idleTimeout for ServerConnector is 30000 milliseconds (which is default)

When the idleTimeout expires, do you see the number of CLOSE_WAITs reduce ?

Yes, we can see it reducing like 400 to 380, 350, etc...

Here I've attached the network dumps taken using Wireshark when CLOSE_WAITs goes around 800 and server taking too long time to respond(isn't responding).
output.pcapng.zip

Regarding, CLOSE_WAITs are related to Jetty?

I am not sure about it. Basically we've also deployed the same application on CentOS-7(other softwares are same), where also we are facing the same issue.

Do you have, in your server application, a client that connects to somewhere else that may generate the CLOSE_WAITs ?

No

@sbordet
Copy link
Contributor

sbordet commented Apr 18, 2017

On what port is your server listening in the pcap you have attached ?

@bhaveshmaniya
Copy link
Author

The server is listening on port 8080.

@sbordet
Copy link
Contributor

sbordet commented Apr 19, 2017

Had a look at the pcap, but unfortunately it does not contain useful information for this issue.

The server also appear to send back either random, or gzipped or encrypted data, so there is no way to figure out if the framing is ok.

You also appear to be using Apache Bench, probably for some load testing.
This would open another discussion on whether you are doing load test properly, so it is entirely possible that the CLOSE_WAITs that you see are an artifact of incorrect load testing.

CLOSE_WAIT is caused by the client sending a TCP FIN, and the server not responding to that.

I need a pcap that shows an HTTP request, followed by a client TCP FIN to which the server does not respond with a correspondent TCP FIN, causing the CLOSE_WAIT.
The HTTP request must be there to figure out whether the framing is correct. It may be possible that the server does not close because the client does not send the data, or because the server replied by the connection got congested, etc.

I would start by tuning down the load testing to a rate that is very light, and see if I still get the CLOSE_WAITs. If not, raise the rate but keep an eye on the client to detect when it is maxing out (which will happen way before the server does).

@sbordet
Copy link
Contributor

sbordet commented Apr 19, 2017

How sure you are that this is not caused by the JDBC driver to MySQL ? Can you actually pinpoint that the CLOSE_WAIT sockets belong to Jetty, by looking at the ports ?

@bhaveshmaniya
Copy link
Author

Hi @sbordet,

Regarding Apache Bench use, basically we would like to increase the load on server as we are facing CLOSE_WAIT issue when too many simultaneous request comes on server.

I've taken CLOSE_WAIT status by executing netstat -nat | grep CLOSE_WAIT command, please refer attached 'close_wait_status.txt' file for the same.
close_wait_status.txt

Also when CLOSE_WAITs increase and server taking too long time to response, we are getting 'java.lang.OutOfMemoryError: GC overhead limit exceeded' exception, refer attached 'jetty_log.txt' file for the same.
jetty_log.txt

Let me know if you need more information.

@sbordet
Copy link
Contributor

sbordet commented Apr 21, 2017

@bhaveshmaniya we don't see this in our load tests, so it must be something peculiar with your setup.
We need a way to reproduce, or a pcap that shows the problem clearly as explained above.

If you run with a very light load, do you see the problem ?

@bhaveshmaniya
Copy link
Author

@sbordet you might be correct, there should be something peculiar with setup, I'll check it again and try to get pcap generated from normal request rather then from Apache Bench.

With light load, we can see the problem, it's working perfect.

Thank you!

@joakime
Copy link
Contributor

joakime commented Jun 19, 2017

No update. closing as invalid.

@joakime joakime closed this as completed Jun 19, 2017
@Mayur1995
Copy link

@bhaveshmaniya are you using apache's CXF library anywhere?.. if yes then you can try changing the log level form ERROR to INFO, which might solve the problem of CLOSE_WAIT socket count... worked for me .

@caimite
Copy link

caimite commented May 29, 2018

Hi Guys,
I am facing this problem of Close_wait with an embedded jetty webser app for a while and I am searching for a way to resolve it. I am able to replicate the close_wait constantly by using apache benchmark to simulate concurrent connection to my jetty server app. I set the same configuration as reported originally. Please can you guys help me with this issue. Thanks

@caimite
Copy link

caimite commented May 29, 2018

I would like this case to be reopened because this is something replicable.

@gregw
Copy link
Contributor

gregw commented May 29, 2018

@caimite can you open a new issue with:

  • the version of jetty you are using?
  • JVM version?
  • what OS and what version?
  • A minimal server that can be used to reproduce.

We do have these occasional reports of this problem, but unless we can reproduce ourselves we cannot debug.

@caimite
Copy link

caimite commented May 29, 2018

@gregw thanks for replying to my post.
Well, It's been around5 month since I am trying resolve the close_wait from jetty.
I will create a new issue and post the number in this thread.
Thanks again.

@caimite
Copy link

caimite commented May 29, 2018

@gregw I've created the new issue can you tell me if everything is Okay

#2591

@whuxiari
Copy link

I have encountered the same problem. The Jetty version is 9.4.16. it's ok to connect to jetty server just after starting the server. And failed to connect to Jetty after the server running for about twenty minutes.After analyzing the network message, we found that the server did not return "server hello" after client sending "client hello".By chance, we found that the server has both external IP and 127.0.0.1. So, we tried to remove the monitor 127.0.01, and finally found that the problem no longer appeared.

@joakime
Copy link
Contributor

joakime commented Jul 24, 2019

@whuxiari please open a new issue.
Include details about the version of jetty, how you started jetty, what jvm, what os, etc ...

@whuxiari
Copy link

whuxiari commented Jul 25, 2019

Jetty is embeded, os is suse12Sp2.
JVM infromation:
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

beside Jetty, we integrated with jersey,hk2...

@whuxiari
Copy link

whuxiari commented Aug 6, 2019

we found new phenomenon,only if the machine have multiple IP, only if we monitoring two or more ips, the problem will appear.

@gregw gregw reopened this Aug 8, 2019
@joakime joakime removed the Invalid label Aug 8, 2019
@stale
Copy link

stale bot commented Aug 8, 2020

This issue has been automatically marked as stale because it has been a full year without activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Stale For auto-closed stale issues and pull requests label Aug 8, 2020
@sbordet sbordet closed this as completed Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
More Info Required Stale For auto-closed stale issues and pull requests Unable To Replicate
Projects
None yet
Development

No branches or pull requests

7 participants