New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS Server stops responding after several days #1143
Comments
One first thing to check would be the version of BlueSSLService you are using. There was a bug introduced briefly in level 0.12.42 (fixed in 0.12.44) that could hang the listener if non-HTTPS connections were attempted (Kitura/BlueSSLService#36). If you have 0.12.44 or newer, you're okay. Assuming that isn't it, could you try enabling verbose logging and see whether anything interesting is logged just prior to the unresponsiveness? Ideally we'd be able to capture the request data that triggers the hang, but that may require some additional work as AFAIK we do not have a facility to log raw request data at present. |
@djones6 I am also facing the same problem. BlueSSLService Version is 0.12.48. |
I'm seeing this on 0.12.47 and from what I could see there is nothing special in the logs, from time to time an error message indicating a wrong request of this or that but nothing special. I wonder if there is an easy way to grab a stack from within the docker container that would helps us figure out what the threads are doing when the Kitura part gets hung. Installing |
How much memory is the process using when it starts slowing down? Run It's quite possible that memory growth has made the container begin to swap, which would cause poor (and unpredictable) performance. |
Nope, it's all stable as it is serving other TCP/UDP connections on other non-Kitura sockets that are part of the same process just fine, I'm waiting for the Kitura part to hang to grab a stack. |
I managed to find out one process where Kitura is hanging and here is some output I managed to dig out from gdb, I need to restart the process now but hopefully it can help figure out where the problem lies:
I don't know what thread 1 is doing or if it is correct, but here is the backtrace:
thread 2 seems similarly mysterious to me:
Thread 3 seems to be the BlueSSL doing crypto:
Thread 4 looks like a libdispatch stuff:
Thread 5 Kitura socket manager:
Thread 6 Kitura socket manager again:
Thread 7, 8, 9, 10 seem to be idle libdispatch workers:
|
Several of my instances on different machines with different IP addresses from the same C segment got the hang today at around the same time. Unfortunately I haven't managed to grab the stack this time but it leads me to believe that it was some kind of bot/malware/scanner thing that is trying various different types of payload which causes Kitura to hang. |
Ahother hang today, here are the stacks for three Kitura related threads: Thread 3 stuck in
|
I think I've recreated this. I believe that it's actually quite a simple problem, but it took the clue in @mman 's previous comment to figure out what's happening: it's not a hang or failure as such, but a weakness in the design of the socket accept loop. An incoming connection that establishes a TCP session but then sends no data will block the server from accepting connections indefinitely. Steps to recreate:
If this really is the case, then at the point where the server has hung, you should be able to do a
If I now try to connect another client (eg. a well-behaved
In my case, the 'bad' connection stands out because there is no data on the receive queue (nothing to read). The connection has been established - and accepted - but SSL now expects to be able to perform the initial handshake, with data the client may never send. In contrast, the well-behaved connection has 269 bytes of data waiting. Regardless of whether this is the same problem, it's definitely a DOS issue which we'll need to address. |
I believe this is an issue that will need addressing in BlueSSLService, I raised Kitura/BlueSSLService#40 |
@djones6 Correction: I did indeed find the lingering inactive connections to the mentioned port, it was only a bit more complex since my app is running inside of docker conteiners and thus the necessary connections were not visible on the main host but rather inside of the container. But they are there. I have also reproduced the problem that you described. Using |
… a separate thread
…Accept on a separate thread
…ptClientConnection and delegate.onAccept
Thanks guys for fixing this issue. Could you please advise when it will be safe to rebuild my app depending on Kitura using |
Apologies for the delay in tagging this fix. It appears to have introduced an intermittent test failure which I am debugging. |
It turned out that the test for this issue had exposed a problem with the way the OpenSSL error queue was being handled, which I've resolved in @mman Kitura-net has now been tagged with a fix for this issue: |
So If I now thanks for all your help, |
@djones6 Answering to myself, yes, simple |
I have a simple Kitura based web server with TSL and basic auth running on multiple hosts inside docker with latest Swift 3.x image which is itself based on Ubuntu 16.04 IIRC. All built using the latest packages from SPM.
This Kitura server is successfully serving under 1 req per minute from valid endpoints on the internet that know the basic auth params and is publicly available on the internet thus being subject to random web traffic requests from bots and malware.
After several days the Kitura server hangs randomly and stops responging to requests and needs to be restarted.
The Kitura webserver is part of a bigger system that handles traffic on other TCP and UDP sockets using my own code (based on libdispatch) at the same time and when Kitura hangs the process itself is still alive and handles traffic on other network ports successfully.
I'd like to learn how to best troubleshoot this and what information would be useful to diagnose this problem. Please advise...
thanks
Martin
The text was updated successfully, but these errors were encountered: