Skip to content
This repository

Cherokee fails to handle additional connections after a time #952

Closed
davidjb opened this Issue · 9 comments

3 participants

David Beitey Stefan de Konink Daniel Niccoli
David Beitey

My installation of Cherokee has been working fine, however within the last few days has started failing to process incoming connections after some amount of time. The seemingly relevant part of the error log looks like this:

...
[02/04/2013 20:41:27.603] (error) cryptor_libssl.c:856 - SSL_write: unknown
    errno: Connection timed out | The issue seems to be related to your system.

[02/04/2013 20:41:39.565] (error) cryptor_libssl.c:856 - SSL_write: unknown
    errno: Connection timed out | The issue seems to be related to your system.

[02/04/2013 20:42:06.569] (error) cryptor_libssl.c:856 - SSL_write: unknown
    errno: Connection timed out | The issue seems to be related to your system.

As best I can ascertain, on or shortly after this last message is logged, Cherokee cannot process any further connections. In testing yesterday when the above happened, new connections being made simply hang until they time out. Interestingly, Cherokee still responds to requests with a relevant 504 Gateway Timeout error message for URLs that should be reverse proxied.

After the above happens, the only way to get the server back online is to forceably kill the process (eg kill -9) and restart.

Tech details about the version of Cherokee as are follows. This is on a RHEL 6.4 machine. I'm aware the version was compiled some time ago (using the Github master at the time) - I'm loathe to continually update the server since it is in production & the given version has been fine up until now. However, if there's the suggestion that a recent change that may help fix this issue, I'll update accordingly.

Compilation
 Version: 1.2.102
 Compiled on: Sep 21 2012 18:53:23
 Arguments to configure:  '--host=x86_64-unknown-linux-gnu' '--build=x86_64-unknown-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-wwwroot=/var/www/cherokee' '--with-libssl' '--disable-static' 'build_alias=x86_64-unknown-linux-gnu' 'host_alias=x86_64-unknown-linux-gnu' 'CFLAGS=-O2 -g'

Installation
 Deps dir: /usr/share/cherokee/deps
 Data dir: /usr/share/cherokee
 Icons dir: /usr/share/cherokee/icons
 Themes dir: /usr/share/cherokee/themes
 Plug-in dir: /usr/lib64/cherokee
 Temporal dir: /tmp

Plug-ins
 Built-in: 

Support
 IPv6: yes
 Pthreads: yes
 Tracing: no
 sendfile(): yes
 syslog(): yes
 Polling methods: epoll poll select 
 SSL/TLS: libssl
 TLS SNI: yes
Stefan de Konink
Collaborator
David Beitey

Hi Stefan, no it wouldn't contain this. Seems like that could well be a fix. Is there some test associated with that original bug that I can hit my server with to see if it causes the hang? At present, the hanging seems random and intermittent and only just started 5 days ago -- no issues up until then.

David Beitey

Actually, I misspoke. Yes, my version does include that fix. That change was on 24 June 2012 and my version was built using the latest master in September.

David Beitey

Some more information - not very helpful, but relevant perhaps all the same. Cherokee continues to hang every few hours or so, has no discernible increase in memory or CPU when it does. Oddly, it hung at almost exactly 2am last night, despite me putting a cron workaround in place to restart the server every 30 minutes. This seems to indicate that Cherokee isn't running out of resources or workers, but perhaps some sort of request that causes the server to lock up. In addition, the error log I mention above is seemingly not relevant -- there were no error log entries within hours of the hang happening, and the same SSL write issues haven't happened since.

Looking back through the logs, I have replayed - to the best of my ability - all of the final requests before the server hung, and all of them complete without issue.

So, seemingly nothing to go on as to why this is happening. I've also since restarted the entire server on the hope this might resolve itself; waiting to see what happens.

Stefan de Konink
Collaborator

Could you;
1) update to the latest master configure with CFLAGS="-O0 -gdb" and --enable-trace --enable-back-trace and check out if it occurs at the same rate?
2) find a way to reproduce it or maybe attach a gdb process and see in what part of the code it occurs? (so bt, and for each individual thread)

I'm currently running a website on HTTPS which does many requests per second, so I really know cherokee is capable of doing so. But I am obviously eager to help you find your problem.

David Beitey

I think the issue may have had something to do with the DNS - our network's DNS had been playing up around the time the issue was happening. All the backend sources configured in the web server use host names rather than IPs so perhaps that was it. Those original SSL issues in the logs were a misnomer, as previously mentioned. Either way, it's no longer reproducible.

David Beitey davidjb closed this
Stefan de Konink
Collaborator

I do consider this a big relief.

David Beitey

Well, perhaps. If Cherokee's internals happen to be choking on something like a failed DNS request and not recovering down the track, then it could be a real problem. Worth keeping in mind in case you see similar in the future.

Daniel Niccoli
Collaborator

Removed label t:bug and added t:cannot-reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.