openssl ssl_accept sometimes does not return, causes server to hang permanently #8108

rdp · 2019-08-22T07:07:02Z

Requisite code sample:

require "socket"
require "openssl"

tcp_server = TCPServer.new(8081)

ssl_context = OpenSSL::SSL::Context::Server.new
ssl_context.certificate_chain = "cert.pem"
ssl_context.private_key = "_key.pem"
ssl_server = OpenSSL::SSL::Server.new(tcp_server, ssl_context)

puts "SSL Server listening on #{tcp_server.local_address}"
while true
    begin
      connection = ssl_server.accept
      connection.close rescue nil
    rescue ex
       print "unable to accept", ex
    end
end

I discovered that my server "in production", with similar code underneath, will sometimes hang and stop accepting connections seemingly randomly (incoming connections enter SYN_RECV [TCP backlog]) but no new connections are accepted by the crystal app.

My hypothesis from adding some puts calls is that it enters the call to SSL_Accept and never returns. Turns out there there are several reads and writes within SSL_Accept so if you have a "rogue connection" (or malicious?) that does the TLS handshake "half way" (or even, just connects and does nothing), SSL_Accept won't exit (it's waiting on reads that never come). since the servers are single threaded through the "wrapped accept" portion, this effectively hangs the server. This affects the HTTP server if it uses SSL, etc. HTTP works fine.

Seemingly related discussion here: https://openssl-users.openssl.narkive.com/5QNG9OWX/ssl-accept-hang

AFAICT you can repro it by running the snippet above and connecting and sending nothing, ex: this:

require "socket"
TCPSocket.open("localhost", 8081) { |socket|
  puts "connected #{socket}, sleeping"
  sleep
}

After running both then all future connections to the server "aren't accepted", ex: openssl s_client -connect localhost:8081 works before you run that, and afterward hangs on connect.

crystal 0.30.1

Ubuntu 19,04

The text was updated successfully, but these errors were encountered:

rdp · 2019-08-22T07:08:20Z

current workaround, in case interesting:

diff --git a/src/openssl/ssl/server.cr b/src/openssl/ssl/server.cr
index b31e53090..1280bc786 100644
--- a/src/openssl/ssl/server.cr
+++ b/src/openssl/ssl/server.cr
@@ -60,7 +60,10 @@ class OpenSSL::SSL::Server
   #
   # This method calls `@wrapped.accept` and wraps the resulting IO in a SSL socket (`OpenSSL::SSL::Socket::Server`) with `context` configu
ration.
   def accept : OpenSSL::SSL::Socket::Server
-    OpenSSL::SSL::Socket::Server.new(@wrapped.accept, @context, sync_close: @sync_close)
+    socket = @wrapped.accept
+    socket.as(TCPSocket).read_timeout = 60.seconds
+    socket.as(TCPSocket).write_timeout = 60.seconds
+    OpenSSL::SSL::Socket::Server.new(socket, @context, sync_close: @sync_close)
   end
 
   # Implements `::Socket::Server#accept?`.
@@ -68,6 +71,9 @@ class OpenSSL::SSL::Server
   # This method calls `@wrapped.accept?` and wraps the resulting IO in a SSL socket (`OpenSSL::SSL::Socket::Server`) with `context` configuration.
   def accept? : OpenSSL::SSL::Socket::Server?
     if socket = @wrapped.accept?
+      socket.as(TCPSocket).read_timeout = 60.seconds
+      socket.as(TCPSocket).write_timeout = 60.seconds
+
       OpenSSL::SSL::Socket::Server.new(socket, @context, sync_close: @sync_close)
     end
   end

There are some other questions at play, like the fact that it can only accept one client "at a time" so if one client has a slow TLS handshake, the others have to wait for that handshake to fully complete before being accepted (sometimes takes seconds)?

RX14 · 2019-09-05T21:32:51Z

This is actually a case of over-abstraction harming us

SSL_Accept goes through a BIO which uses normal crystal IO. So if the SSL::*::Socket.new is called in a fiber per-request, then I'm sure that a single slow client could not block the server, since other fibers could continue working - just like normal crystal IO.

The problem is that the ssl_server.accept is called from the server fiber, not the per-request fiber. Since we have to accept the socket before having a connection to spawn the fiber.

So the only correct way to have a SSL::Server is to accept a TCP socket, spawn a fiber for the connection, then handle TLS.

There's no way to do that with a SSL::Server API which is compatible with TCPServer.

jhass · 2019-09-05T22:39:59Z

Otoh I guess we don't want to accept an infinite amount of hanging clients anyway, so we don't exhaust our FDs or something else. So the solution seems two fold, first we want to make sure to not hang indefinitely by making sure the IO timeout is set and works. Second, can we just spawn a pool of fibers to accept and send the socket back through a channel?

clients = Channel(IO).new(MAX_PENDING_CONNECTIONS)
MAX_PENDING_CONNECTIONS.times do
  spawn do
    loop do
      channel.send accept
    end
  end
end

loop do
  spawn handle_client(clients.receive)
end

Idk, seems too easy, I'm probably missing something.

RX14 · 2019-09-16T13:04:41Z

@jhass the number of pending clients can be controlled through read_timeout and write_timeout, if these are set before passing the socket to SSL::*::Socket.new.

I don't like the idea of adding all these fibers and complexity just to maintain the API style. To me, it seems much simpler to embrace the fact that SSL::*::Socket.new is a blocking call and remove SSL::Server.

rdp · 2020-02-21T13:58:10Z

unrelated: Crystal maybe should set so_keepalive by default so that people who "forget" to set a read timeout (and the connection is lost somehow) don't accidentally orphan fibers...forever...just an idea.

This was referenced Aug 22, 2019

Server automatic shutdown kemalcr/kemal#541

Closed

clear OpenSSL thread error queue each time, might help with certain f… #8103

Closed

jhass added kind:bug topic:stdlib topic:stdlib:networking labels Aug 22, 2019

rdp mentioned this issue Nov 12, 2019

Need to prevent OpenSSL HTTP::Server Slow read Attack #7794

Open

rdp mentioned this issue Dec 28, 2019

Some OpenSSL connections will cause Other OpenSSL connections (hangs | waiting)? #8626

Closed

straight-shoota mentioned this issue Apr 11, 2020

Add Log handler for HTTP::Server #9034

Merged

waj mentioned this issue Apr 26, 2020

HTTP server do SSL handshake in client fiber #9177

Merged

waj closed this as completed in #9177 Apr 27, 2020

atlantis mentioned this issue Jan 13, 2023

Add read_timeout to TCPSocket? spider-gazelle/crystal-mqtt#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openssl ssl_accept sometimes does not return, causes server to hang permanently #8108

openssl ssl_accept sometimes does not return, causes server to hang permanently #8108

rdp commented Aug 22, 2019 •

edited by jhass

Loading

rdp commented Aug 22, 2019 •

edited by jhass

Loading

RX14 commented Sep 5, 2019

jhass commented Sep 5, 2019 •

edited

Loading

RX14 commented Sep 16, 2019

rdp commented Feb 21, 2020

openssl ssl_accept sometimes does not return, causes server to hang permanently #8108

openssl ssl_accept sometimes does not return, causes server to hang permanently #8108

Comments

rdp commented Aug 22, 2019 • edited by jhass Loading

rdp commented Aug 22, 2019 • edited by jhass Loading

RX14 commented Sep 5, 2019

jhass commented Sep 5, 2019 • edited Loading

RX14 commented Sep 16, 2019

rdp commented Feb 21, 2020

rdp commented Aug 22, 2019 •

edited by jhass

Loading

rdp commented Aug 22, 2019 •

edited by jhass

Loading

jhass commented Sep 5, 2019 •

edited

Loading