Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS Server stops responding after several days #1143

Closed
mman opened this issue Aug 22, 2017 · 16 comments · Fixed by Kitura/Kitura-net#219
Closed

TLS Server stops responding after several days #1143

mman opened this issue Aug 22, 2017 · 16 comments · Fixed by Kitura/Kitura-net#219
Assignees

Comments

@mman
Copy link

mman commented Aug 22, 2017

I have a simple Kitura based web server with TSL and basic auth running on multiple hosts inside docker with latest Swift 3.x image which is itself based on Ubuntu 16.04 IIRC. All built using the latest packages from SPM.

This Kitura server is successfully serving under 1 req per minute from valid endpoints on the internet that know the basic auth params and is publicly available on the internet thus being subject to random web traffic requests from bots and malware.

After several days the Kitura server hangs randomly and stops responging to requests and needs to be restarted.

The Kitura webserver is part of a bigger system that handles traffic on other TCP and UDP sockets using my own code (based on libdispatch) at the same time and when Kitura hangs the process itself is still alive and handles traffic on other network ports successfully.

I'd like to learn how to best troubleshoot this and what information would be useful to diagnose this problem. Please advise...

thanks
Martin

@djones6
Copy link
Collaborator

djones6 commented Aug 22, 2017

One first thing to check would be the version of BlueSSLService you are using. There was a bug introduced briefly in level 0.12.42 (fixed in 0.12.44) that could hang the listener if non-HTTPS connections were attempted (Kitura/BlueSSLService#36). If you have 0.12.44 or newer, you're okay.

Assuming that isn't it, could you try enabling verbose logging and see whether anything interesting is logged just prior to the unresponsiveness? Ideally we'd be able to capture the request data that triggers the hang, but that may require some additional work as AFAIK we do not have a facility to log raw request data at present.

@keshavakarthik
Copy link

@djones6 I am also facing the same problem. BlueSSLService Version is 0.12.48.

@mman
Copy link
Author

mman commented Aug 24, 2017

I'm seeing this on 0.12.47 and from what I could see there is nothing special in the logs, from time to time an error message indicating a wrong request of this or that but nothing special.

I wonder if there is an easy way to grab a stack from within the docker container that would helps us figure out what the threads are doing when the Kitura part gets hung. Installing lldb inside the container and doing bt could do it, right?

@carlbrown
Copy link
Contributor

How much memory is the process using when it starts slowing down? Run ps aux and look for the RSS column.

It's quite possible that memory growth has made the container begin to swap, which would cause poor (and unpredictable) performance.

@mman
Copy link
Author

mman commented Aug 30, 2017

Nope, it's all stable as it is serving other TCP/UDP connections on other non-Kitura sockets that are part of the same process just fine, I'm waiting for the Kitura part to hang to grab a stack.

@mman
Copy link
Author

mman commented Sep 1, 2017

I managed to find out one process where Kitura is hanging and here is some output I managed to dig out from gdb, I need to restart the process now but hopefully it can help figure out where the problem lies:

(gdb) info threads
  Id   Target Id         Frame 
  1    Thread 0x7f2dce1537c0 (LWP 2510) "rendezvous" (Exiting) 0x00007f2dcad37826 in __GI___sigsuspend (set=0x7f2dce1df0b0 <_dispatch_sigsuspend.mask>)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:30
  2    Thread 0x7f2dc2fee700 (LWP 2539) "rendezvous" 0x00007f2dcbff2a15 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f2dc2fede48, expected=0, 
    futex_word=0x7f2dce20eba0 <scoreboard+16>) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
  3    Thread 0x7f2dc1e04700 (LWP 2541) "rendezvous" 0x00007f2dcbff351d in read () at ../sysdeps/unix/syscall-template.S:84
* 4    Thread 0x7f2dce151700 (LWP 2542) "rendezvous" 0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
  5    Thread 0x7f2dc1603700 (LWP 2543) "rendezvous" 0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
  6    Thread 0x7f2dc0e02700 (LWP 2544) "rendezvous" 0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
  7    Thread 0x7f2dab7fe700 (LWP 2571) "rendezvous" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
  8    Thread 0x7f2d9bfff700 (LWP 14040) "rendezvous" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  9    Thread 0x7f2da9ffb700 (LWP 13692) "rendezvous" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  10   Thread 0x7f2dabfff700 (LWP 530) "rendezvous" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225

I don't know what thread 1 is doing or if it is correct, but here is the backtrace:

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f2dce1537c0 (LWP 2510))]
#0  0x00007f2dcad37826 in __GI___sigsuspend (set=0x7f2dce1df0b0 <_dispatch_sigsuspend.mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:30
30	../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or directory.
(gdb) bt
#0  0x00007f2dcad37826 in __GI___sigsuspend (set=0x7f2dce1df0b0 <_dispatch_sigsuspend.mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:30
#1  0x00007f2dce1be1d8 in _dispatch_sigsuspend () from target:/usr/lib/swift/linux/libdispatch.so
#2  0x00007f2dce1bc6c6 in _dispatch_sig_thread () from target:/usr/lib/swift/linux/libdispatch.so
#3  0x00007f2dcbfe9439 in __nptl_deallocate_tsd () at pthread_create.c:175
#4  0x00007f2dcad22854 in __libc_start_main (main=0x468bd0 <main>, argc=3, argv=0x7ffc81c21ef8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffc81c21ee8) at ../csu/libc-start.c:297
#5  0x0000000000420cb9 in _start ()

thread 2 seems similarly mysterious to me:

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f2dc2fee700 (LWP 2539))]
#0  0x00007f2dcbff2a15 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f2dc2fede48, expected=0, futex_word=0x7f2dce20eba0 <scoreboard+16>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
205	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
#0  0x00007f2dcbff2a15 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f2dc2fede48, expected=0, futex_word=0x7f2dce20eba0 <scoreboard+16>)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x7f2dce20eba0 <scoreboard+16>, abstime=abstime@entry=0x7f2dc2fede48) at sem_waitcommon.c:111
#2  0x00007f2dcbff2adf in __new_sem_wait_slow (sem=0x7f2dce20eba0 <scoreboard+16>, abstime=0x7f2dc2fede48) at sem_waitcommon.c:181
#3  0x00007f2dcbff2b92 in sem_timedwait (sem=<optimized out>, abstime=<optimized out>) at sem_timedwait.c:36
#4  0x00007f2dce1dc13c in manager_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:620
#5  0x00007f2dcbfea6ba in start_thread (arg=0x7f2dc2fee700) at pthread_create.c:333
#6  0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 seems to be the BlueSSL doing crypto:

(gdb) thread 3
[Switching to thread 3 (Thread 0x7f2dc1e04700 (LWP 2541))]
#0  0x00007f2dcbff351d in read () at ../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007f2dcbff351d in read () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f2dcd211abc in ?? () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#2  0x00007f2dcd20fa5c in BIO_read () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#3  0x00007f2dcd56ea95 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.0.0
#4  0x00007f2dcd56d0b2 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.0.0
#5  0x00007f2dcd56d814 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.0.0
#6  0x00000000005c3fe9 in _TFC10SSLService10SSLService8onAcceptfzT6socketC6Socket6Socket_T_ (socket=..., self=..., $error=...)
    at /root/rendezvous-swift/.build/checkouts/BlueSSLService.git-2005963651182040591/Sources/SSLService.swift:493
#7  0x00000000005d0762 in _TTWC10SSLService10SSLService6Socket18SSLServiceDelegateS_FS2_8onAcceptfzT6socketCS1_6Socket_T_ ()
#8  0x00000000005c3ec0 in _TFC10SSLService10SSLService8onAcceptfzT6socketC6Socket6Socket_T_ (socket=..., self=..., $error=...)
    at /root/rendezvous-swift/.build/checkouts/BlueSSLService.git-2005963651182040591/Sources/SSLService.swift:483
#9  0x00000000005d0762 in _TTWC10SSLService10SSLService6Socket18SSLServiceDelegateS_FS2_8onAcceptfzT6socketCS1_6Socket_T_ ()
#10 0x0000000000583c98 in _TFC6Socket6Socket22acceptClientConnectionfzT_S0_ (self=..., $error=...)
    at /root/rendezvous-swift/.build/checkouts/BlueSocket.git-7749210418216336129/Sources/Socket.swift:1462
#11 0x000000000061d712 in _TFC9KituraNet10HTTPServerP33_92AA53CEEC3B42FE7AE426A27AF9C2E06listenfT12listenSocketC6Socket6Socket13socketManagerCS_21IncomingSocketManager_T_ (listenSocket=..., socketManager=..., self=...)
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git--5198026081281008802/Sources/KituraNet/HTTP/HTTPServer.swift:188
#12 0x000000000061ca48 in _TFFC9KituraNet10HTTPServer6listenFzT2onSi_T_U_FT_T_ ()
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git--5198026081281008802/Sources/KituraNet/HTTP/HTTPServer.swift:125
#13 0x0000000000454bf7 in _TTRXFo___XFdCb___ ()
#14 0x00007f2dce1b9260 in _dispatch_block_async_invoke_and_release () from target:/usr/lib/swift/linux/libdispatch.so
#15 0x00007f2dce1bd6ed in _dispatch_root_queue_drain () from target:/usr/lib/swift/linux/libdispatch.so
#16 0x00007f2dce1dc9af in worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:506
#17 0x00007f2dcbfea6ba in start_thread (arg=0x7f2dc1e04700) at pthread_create.c:333
#18 0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 looks like a libdispatch stuff:

(gdb) thread 4
[Switching to thread 4 (Thread 0x7f2dce151700 (LWP 2542))]
#0  0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
84	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f2dce1c5705 in _dispatch_event_loop_drain () from target:/usr/lib/swift/linux/libdispatch.so
#2  0x00007f2dce1c14b8 in _dispatch_mgr_invoke () from target:/usr/lib/swift/linux/libdispatch.so
#3  0x00007f2dce1c127b in _dispatch_mgr_thread () from target:/usr/lib/swift/linux/libdispatch.so
#4  0x00007f2dce1bd5f4 in _dispatch_root_queue_drain () from target:/usr/lib/swift/linux/libdispatch.so
#5  0x00007f2dce1bd3f8 in _dispatch_worker_thread () from target:/usr/lib/swift/linux/libdispatch.so
#6  0x00007f2dcbfea6ba in start_thread (arg=0x7f2dce151700) at pthread_create.c:333
#7  0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 Kitura socket manager:

(gdb) thread 5
[Switching to thread 5 (Thread 0x7f2dc1603700 (LWP 2543))]
#0  0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
84	in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00000000006044f8 in _TFC9KituraNet21IncomingSocketManagerP33_684BF7E5C6D71BC96F4DF81B5FFCF4847processfT15epollDescriptorVs5Int3220runRemoveIdleSocketsSb_T_ (epollDescriptor=..., runRemoveIdleSockets=..., self=...)
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git--5198026081281008802/Sources/KituraNet/IncomingSocketManager.swift:148
#2  0x000000000060321c in _TFFC9KituraNet21IncomingSocketManagercFT_S0_U_FT_T_ ()
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git--5198026081281008802/Sources/KituraNet/IncomingSocketManager.swift:84
#3  0x0000000000454bf7 in _TTRXFo___XFdCb___ ()
#4  0x00007f2dce1af177 in _dispatch_call_block_and_release () from target:/usr/lib/swift/linux/libdispatch.so
#5  0x00007f2dce1baedd in _dispatch_queue_serial_drain () from target:/usr/lib/swift/linux/libdispatch.so
#6  0x00007f2dce1bb890 in _dispatch_queue_invoke () from target:/usr/lib/swift/linux/libdispatch.so
#7  0x00007f2dce1bd5f4 in _dispatch_root_queue_drain () from target:/usr/lib/swift/linux/libdispatch.so
#8  0x00007f2dce1dba18 in overcommit_worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:315
#9  0x00007f2dcbfea6ba in start_thread (arg=0x7f2dc1603700) at pthread_create.c:333
#10 0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 Kitura socket manager again:

(gdb) thread 6
[Switching to thread 6 (Thread 0x7f2dc0e02700 (LWP 2544))]
#0  0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
84	in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007f2dcae099d3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00000000006044f8 in _TFC9KituraNet21IncomingSocketManagerP33_684BF7E5C6D71BC96F4DF81B5FFCF4847processfT15epollDescriptorVs5Int3220runRemoveIdleSocketsSb_T_ (epollDescriptor=..., runRemoveIdleSockets=..., self=...)
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git--5198026081281008802/Sources/KituraNet/IncomingSocketManager.swift:148
#2  0x000000000060321c in _TFFC9KituraNet21IncomingSocketManagercFT_S0_U_FT_T_ ()
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git--5198026081281008802/Sources/KituraNet/IncomingSocketManager.swift:84
#3  0x0000000000454bf7 in _TTRXFo___XFdCb___ ()
#4  0x00007f2dce1af177 in _dispatch_call_block_and_release () from target:/usr/lib/swift/linux/libdispatch.so
#5  0x00007f2dce1baedd in _dispatch_queue_serial_drain () from target:/usr/lib/swift/linux/libdispatch.so
#6  0x00007f2dce1bb890 in _dispatch_queue_invoke () from target:/usr/lib/swift/linux/libdispatch.so
#7  0x00007f2dce1bd5f4 in _dispatch_root_queue_drain () from target:/usr/lib/swift/linux/libdispatch.so
#8  0x00007f2dce1dba18 in overcommit_worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:315
#9  0x00007f2dcbfea6ba in start_thread (arg=0x7f2dc0e02700) at pthread_create.c:333
#10 0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7, 8, 9, 10 seem to be idle libdispatch workers:

(gdb) thread 7
[Switching to thread 7 (Thread 0x7f2dab7fe700 (LWP 2571))]
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
225	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007f2dce1dba64 in overcommit_worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:326
#2  0x00007f2dcbfea6ba in start_thread (arg=0x7f2dab7fe700) at pthread_create.c:333
#3  0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) thread 8
[Switching to thread 8 (Thread 0x7f2d9bfff700 (LWP 14040))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f2dce1dca9b in wqlist_scan_wait (queue_priority=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:451
#2  worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:493
#3  0x00007f2dcbfea6ba in start_thread (arg=0x7f2d9bfff700) at pthread_create.c:333
#4  0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) thread 9
[Switching to thread 9 (Thread 0x7f2da9ffb700 (LWP 13692))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	in ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f2dce1dca9b in wqlist_scan_wait (queue_priority=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:451
#2  worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:493
#3  0x00007f2dcbfea6ba in start_thread (arg=0x7f2da9ffb700) at pthread_create.c:333
#4  0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) thread 10
[Switching to thread 10 (Thread 0x7f2dabfff700 (LWP 530))]
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
225	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007f2dce1dba64 in overcommit_worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:326
#2  0x00007f2dcbfea6ba in start_thread (arg=0x7f2dabfff700) at pthread_create.c:333
#3  0x00007f2dcae093dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@mman
Copy link
Author

mman commented Sep 3, 2017

Several of my instances on different machines with different IP addresses from the same C segment got the hang today at around the same time. Unfortunately I haven't managed to grab the stack this time but it leads me to believe that it was some kind of bot/malware/scanner thing that is trying various different types of payload which causes Kitura to hang.

@mman
Copy link
Author

mman commented Sep 4, 2017

Ahother hang today, here are the stacks for three Kitura related threads:

Thread 3 stuck in BIO_read looks really suspicious to me, either we have a accepted client and socket and data ready so that we can perform BIO_read, or we don't and we must timeout.

(gdb) thread 3
[Switching to thread 3 (Thread 0x7ff90b355700 (LWP 2525))]
#0  0x00007ff91554351d in read () from target:/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007ff91554351d in read () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007ff916761abc in ?? () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#2  0x00007ff91675fa5c in BIO_read () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#3  0x00007ff916abea95 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.0.0
#4  0x00007ff916abd0b2 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.0.0
#5  0x00007ff916abd814 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.0.0
#6  0x00000000005d0b79 in _TFC10SSLService10SSLService8onAcceptfzT6socketC6Socket6Socket_T_ (socket=..., self=..., $error=...)
    at /root/rendezvous-swift/.build/checkouts/BlueSSLService.git-501062253649323540/Sources/SSLService.swift:493
#7  0x00000000005dd2f2 in _TTWC10SSLService10SSLService6Socket18SSLServiceDelegateS_FS2_8onAcceptfzT6socketCS1_6Socket_T_ ()
#8  0x00000000005d0a50 in _TFC10SSLService10SSLService8onAcceptfzT6socketC6Socket6Socket_T_ (socket=..., self=..., $error=...)
    at /root/rendezvous-swift/.build/checkouts/BlueSSLService.git-501062253649323540/Sources/SSLService.swift:483
#9  0x00000000005dd2f2 in _TTWC10SSLService10SSLService6Socket18SSLServiceDelegateS_FS2_8onAcceptfzT6socketCS1_6Socket_T_ ()
#10 0x0000000000597218 in _TFC6Socket6Socket22acceptClientConnectionfzT_S0_ (self=..., $error=...)
    at /root/rendezvous-swift/.build/checkouts/BlueSocket.git--8953993203035625848/Sources/Socket.swift:1464
#11 0x000000000063d0e2 in _TFC9KituraNet10HTTPServerP33_92AA53CEEC3B42FE7AE426A27AF9C2E06listenfT12listenSocketC6Socket6Socket13socketManagerCS_21IncomingSocketManager_T_ (listenSocket=..., socketManager=..., self=...)
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git-375811198261420595/Sources/KituraNet/HTTP/HTTPServer.swift:188
#12 0x000000000063c418 in _TFFC9KituraNet10HTTPServer6listenFzT2onSi_T_U_FT_T_ ()
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git-375811198261420595/Sources/KituraNet/HTTP/HTTPServer.swift:125
#13 0x000000000042bb77 in _TTRXFo___XFdCb___ ()
#14 0x00007ff917709260 in _dispatch_block_async_invoke_and_release () from target:/usr/lib/swift/linux/libdispatch.so
#15 0x00007ff91770d6ed in _dispatch_root_queue_drain () from target:/usr/lib/swift/linux/libdispatch.so
#16 0x00007ff91772c9af in worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:506
#17 0x00007ff91553a6ba in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007ff91435982d in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x7ff90ab54700 (LWP 2527))]
#0  0x00007ff914359e23 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ff914359e23 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000000000619be8 in _TFC9KituraNet21IncomingSocketManagerP33_684BF7E5C6D71BC96F4DF81B5FFCF4847processfT15epollDescriptorVs5Int3220runRemoveIdleSocketsSb_T_ (epollDescriptor=..., runRemoveIdleSockets=..., self=...)
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git-375811198261420595/Sources/KituraNet/IncomingSocketManager.swift:148
#2  0x000000000061891c in _TFFC9KituraNet21IncomingSocketManagercFT_S0_U_FT_T_ ()
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git-375811198261420595/Sources/KituraNet/IncomingSocketManager.swift:84
#3  0x000000000042bb77 in _TTRXFo___XFdCb___ ()
#4  0x00007ff9176ff177 in _dispatch_call_block_and_release () from target:/usr/lib/swift/linux/libdispatch.so
#5  0x00007ff91770aedd in _dispatch_queue_serial_drain () from target:/usr/lib/swift/linux/libdispatch.so
#6  0x00007ff91770b890 in _dispatch_queue_invoke () from target:/usr/lib/swift/linux/libdispatch.so
#7  0x00007ff91770d5f4 in _dispatch_root_queue_drain () from target:/usr/lib/swift/linux/libdispatch.so
#8  0x00007ff91772ba18 in overcommit_worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:315
#9  0x00007ff91553a6ba in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007ff91435982d in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) thread 6
[Switching to thread 6 (Thread 0x7ff90a353700 (LWP 2533))]
#0  0x00007ff914359e23 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ff914359e23 in epoll_wait () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x0000000000619be8 in _TFC9KituraNet21IncomingSocketManagerP33_684BF7E5C6D71BC96F4DF81B5FFCF4847processfT15epollDescriptorVs5Int3220runRemoveIdleSocketsSb_T_ (epollDescriptor=..., runRemoveIdleSockets=..., self=...)
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git-375811198261420595/Sources/KituraNet/IncomingSocketManager.swift:148
#2  0x000000000061891c in _TFFC9KituraNet21IncomingSocketManagercFT_S0_U_FT_T_ ()
    at /root/rendezvous-swift/.build/checkouts/Kitura-net.git-375811198261420595/Sources/KituraNet/IncomingSocketManager.swift:84
#3  0x000000000042bb77 in _TTRXFo___XFdCb___ ()
#4  0x00007ff9176ff177 in _dispatch_call_block_and_release () from target:/usr/lib/swift/linux/libdispatch.so
#5  0x00007ff91770aedd in _dispatch_queue_serial_drain () from target:/usr/lib/swift/linux/libdispatch.so
#6  0x00007ff91770b890 in _dispatch_queue_invoke () from target:/usr/lib/swift/linux/libdispatch.so
#7  0x00007ff91770d5f4 in _dispatch_root_queue_drain () from target:/usr/lib/swift/linux/libdispatch.so
#8  0x00007ff91772ba18 in overcommit_worker_main (unused=<optimized out>)
    at /home/buildnode/disk2/workspace/oss-swift-3.1-package-linux-ubuntu-16_04/swift-corelibs-libdispatch/libpwq/src/posix/manager.c:315
#9  0x00007ff91553a6ba in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007ff91435982d in clone () from target:/lib/x86_64-linux-gnu/libc.so.6

@djones6
Copy link
Collaborator

djones6 commented Sep 8, 2017

I think I've recreated this.

I believe that it's actually quite a simple problem, but it took the clue in @mman 's previous comment to figure out what's happening: it's not a hang or failure as such, but a weakness in the design of the socket accept loop. An incoming connection that establishes a TCP session but then sends no data will block the server from accepting connections indefinitely.

Steps to recreate:

  • Start the server, make some SSL requests (demonstrate it is functional)
  • Connect a rogue client, eg: telnet localhost 8443 and leave it doing nothing
  • Try to make further requests. The server appears to be unresponsive

If this really is the case, then at the point where the server has hung, you should be able to do a netstat -an | grep 8443 (substitute your server's port) and find a connection open to the server. In the example above, my telnet looks like this:

djones6@needletail:~$ netstat -an | grep -e'Address' -e'8443'
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:8443            0.0.0.0:*               LISTEN
tcp        0      0 192.168.1.39:8443       192.168.1.40:40034      ESTABLISHED

If I now try to connect another client (eg. a well-behaved curl request), I'll see another established connection, but curl won't complete:

djones6@needletail:~$ netstat -an | grep -e'Address' -e'8443'
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:8443            0.0.0.0:*               LISTEN
tcp      269      0 192.168.1.39:8443       192.168.1.40:40036      ESTABLISHED
tcp        0      0 192.168.1.39:8443       192.168.1.40:40034      ESTABLISHED

In my case, the 'bad' connection stands out because there is no data on the receive queue (nothing to read). The connection has been established - and accepted - but SSL now expects to be able to perform the initial handshake, with data the client may never send. In contrast, the well-behaved connection has 269 bytes of data waiting.

Regardless of whether this is the same problem, it's definitely a DOS issue which we'll need to address.

@djones6
Copy link
Collaborator

djones6 commented Sep 8, 2017

I believe this is an issue that will need addressing in BlueSSLService, I raised Kitura/BlueSSLService#40

@mman
Copy link
Author

mman commented Sep 8, 2017

@djones6 Correction: I did indeed find the lingering inactive connections to the mentioned port, it was only a bit more complex since my app is running inside of docker conteiners and thus the necessary connections were not visible on the main host but rather inside of the container. But they are there.

I have also reproduced the problem that you described. Using telnet or nc to the listening HTTPS port and doing nothing blocks Kitura completely as you are describing. Sounds like a rather big issue to me. The logic should probably be the traditional accept and fork and then perform the SSL handshake with timeout, since waiting for the (potentially malicious) SSL handshake to timeout will still block the server from accepting new incoming connections during the timeout window.

djones6 added a commit to Kitura/Kitura-net that referenced this issue Sep 14, 2017
djones6 added a commit to Kitura/Kitura-net that referenced this issue Sep 14, 2017
djones6 added a commit to Kitura/BlueSocket that referenced this issue Sep 14, 2017
…ptClientConnection and delegate.onAccept
tunniclm pushed a commit to Kitura/Kitura-net that referenced this issue Oct 9, 2017
@mman
Copy link
Author

mman commented Oct 9, 2017

Thanks guys for fixing this issue. Could you please advise when it will be safe to rebuild my app depending on Kitura using swift build? Looks like the necessary fixes are in in Kitura-net and BlueSSLService master branch already but while BlueSSLService had the new version tagged recently, the Kitura-net will need a new release to get pulled in properly as a Kitura dependency. Correct? Any plan as to when this is planned? Thanks.

@djones6
Copy link
Collaborator

djones6 commented Oct 10, 2017

Apologies for the delay in tagging this fix. It appears to have introduced an intermittent test failure which I am debugging.

@djones6
Copy link
Collaborator

djones6 commented Oct 13, 2017

It turned out that the test for this issue had exposed a problem with the way the OpenSSL error queue was being handled, which I've resolved in BlueSSLService 0.12.62.

@mman Kitura-net has now been tagged with a fix for this issue: 1.7.17

@mman
Copy link
Author

mman commented Oct 13, 2017

So If I now swift build my project against latest stable kitura, it should just work, correct?

thanks for all your help,
Martin

@mman
Copy link
Author

mman commented Oct 13, 2017

@djones6 Answering to myself, yes, simple swift build now fetches all updated dependencies correctly...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants