New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radicale repeatedly hanging / locking #266

Closed
TheFabe opened this Issue Feb 28, 2015 · 50 comments

Comments

Projects
None yet
@TheFabe

TheFabe commented Feb 28, 2015

I am runnning radicale 0.10 and the radicale process is repeatedly hanging after a differing time span.
I have clients on iOS and Android (davdroid) and Thunderbird with Lightning accessing it for calendars and addressbooks.
After some time radicale does not react any more and clients report problems when trying to sync.
sudo lsof -c radicale -a -i
lists not only the listening socket on my radicale port, but also a socket in status ESTABLISHED in cases when radicale is "hanging".
pkill radicale
will not help, only
pkill -9 radicale
will be sufficient.
Because I have enabled port-forwarding from my router to my raspi in several cases I saw there as foreign IP probably of one of "my" devices when they last were "in the wild" (my GSM-providers name space). The iOS and Android devices are accessing radicale from my internal WLAN and also from the Internet.

Can there be a problem with radicale not "detecting" a client "just dropping dead"?

Is is correct that the "timeout" is disabled in this simple setup using "deamon = True" because of serve_forever?

Can I set a special timeout for the TCP-level or ssl-negotiations ?

radicale is running as "standalone" with daemon = True in the config file and is launched in rc.local.
I use radicale on a raspi under raspian, and I have not set up a "real" HTTP-server like nginx or apache to save performance.
I have found the following message http://librelist.com/browser//radicale/2014/3/2/bug-in-radicale-0-8-locks-up/ and I am wondering whether changing to nginx will really help me.

@liZe

This comment has been minimized.

Member

liZe commented Mar 2, 2015

I have found the following message http://librelist.com/browser//radicale/2014/3/2/bug-in-radicale-0-8-locks-up/ and I am wondering whether changing to nginx will really help me.

Yes. I've never found why the Python server hangs, but I'm sure that the problem doesn't exist with a "real" HTTP server. By the way, I'd be really happy to know what's the cause of the problem, and if there's a way to fix it.

@hieronymousch

This comment has been minimized.

hieronymousch commented Mar 5, 2015

Same here when running the python server. No indications in logfiles why it hangs. PID file remains and proces seems running.

@liZe liZe changed the title from radicale repeatedly hanging / locking - no timeout to Radicale repeatedly hanging / locking Mar 5, 2015

@TheFabe

This comment has been minimized.

TheFabe commented Mar 6, 2015

My current guess it that the problem is linked to the availability of radicale to the internet.
I assume that radicale can only handle one request after another, because it starts a single python-thread to handle the requests? Can someone confirm this?

If this is the case, because radicale uses ssl.wrap_socket and has removed the timeout, there could be a hang if someone (script-kiddy?) connects to my radicale and opens a TCP-session and then silently "drops dead" at some stage in the SSL-handshake of key-exchange or something and the "simple" ssl.wrap_socket waits forever to complete the session handshake before passing the "core"-request to radicale?

This could explain why I see an open session to some strange IP in lsof....

Can you comment on this idea?

@TheFabe

This comment has been minimized.

TheFabe commented Mar 6, 2015

I have removed the port-forwarding from my router two days ago, and my radicale seems more stable. This could be then, because only "well behaving" clients connect to it now via the LAN and WLAN.

@hieronymousch: Have you also exposed your radicale to the internet?

@hieronymousch

This comment has been minimized.

hieronymousch commented Mar 6, 2015

Hi,

Mine was also connected to the Internet. I put it now behind Apache, hope
this will make it more stable.

J
Op 6 mrt. 2015 19:30 schreef "TheFabe" notifications@github.com:

I have removed the port-forwarding from my router two days ago, and my
radicale seems more stable. This could be then, because only "well
behaving" clients connect to it now via the LAN and WLAN.

@hieronymousch https://github.com/hieronymousch: Have you also exposed
your radicale to the internet?


Reply to this email directly or view it on GitHub
#266 (comment).

@halo

This comment has been minimized.

halo commented Mar 7, 2015

My current guess it that the problem is linked to the availability of radicale to the internet.

Mine is solely used on the Intranet (without SSL just for testing) and I have the same issue. I'm running it on a Raspberry Pi 2 using Minibian and it happens when I run radicale in the foreground. I will go for nginx now to see if it helps.

@untitaker

This comment has been minimized.

Contributor

untitaker commented Mar 7, 2015

I think this is because the server is singlethreaded and Radicale is hanging itself up on connections which might have been kept open by the client (intentionally for reuse).

@deronnax

This comment has been minimized.

Contributor

deronnax commented Mar 10, 2015

Radicale behind nginx + uwsgi. No hanging problem.

@svrnwnsch

This comment has been minimized.

svrnwnsch commented May 17, 2015

I am having the same problem. How can I stop the radicale daemon safe to restart it every day?

@daks

This comment has been minimized.

daks commented May 18, 2015

Radicale behing nginx+gunicorn (managed by supervisord). No problem.

@tmst

This comment has been minimized.

tmst commented Jul 13, 2015

I've been seeing this for some time, now. At first I suspected it had to do with the log file growing without bounds, but then I noticed that the server would be unresponsive even with a newly rotated logfile. I'm not sure how the logfile problem might have affected the server, but I do know that it brought the entire system to a halt when the disk space ran out.

Anyway, it would be good to know what's going on, here. I'd rather not configure another server (nginx) if I don't have to. If it's the case that a session left open is stopping this, could it be closed after a certain time? It probably wouldn't work to kill any existing thread when a client tries to connect, as it might kill an active transfer (?) and leave the server or client in an undesirable state.

So the solution would be to close any session left open longer than a certain time as measured from the initial connection time. I can't think of any other way to do that than by forking another process that has a handle to the thread and kills it at a certain time and then exits. Any interesting ideas?

@halo

This comment has been minimized.

halo commented Jul 13, 2015

I gather from the comments that nobody using a webserver has the problem. But is it consistent behavior that spinning up the python daemon causes the hang every time with two clients even without SSL? Does it only happen on Raspberry PIs?

I really wish that we could achieve what the website says:

Works out-of-the-box, no installation nor configuration required

Because it's not everyone's cup of tea to setup uwsgi etc. Especially if people just want to quickly try it out to see how robust it is compared to similar tools programmed in other languages (which was how I ran into the problem).

EDIT @tmst The disk space issue is known from #106

@tmst

This comment has been minimized.

tmst commented Jul 14, 2015

@halo Yes, the dreaded "IOError: [Errno 28] No space left on device". I think I finally figured out how to configure logrotate so it just truncates the logfile in place and doesn't require restarting the server. Something about the restart was wrong as I wasn't sure how to get the sudo command correct.

@chris5560

This comment has been minimized.

chris5560 commented Jul 14, 2015

Here my logging config so Radicale do logrotate by itself:

[loggers]
keys = root

[handlers]
keys = console,file,syslog

[formatters]
keys = simple,full,syslog

[logger_root]
level = DEBUG
handlers = console,file,syslog

[handler_console]
level = WARNING
class = StreamHandler
args = (sys.stdout,)
formatter = simple

# HERE IS THE MAGIC 3files a 32k in size
[handler_file]
args = ('/var/log/radicale/radicale','a',32768,3)
level = INFO
class = handlers.RotatingFileHandler
formatter = full

[handler_syslog]
level = WARNING
class = handlers.SysLogHandler
args  = ('/dev/log', handlers.SysLogHandler.LOG_DAEMON)
formatter = syslog

[formatter_simple]
format = %(message)s

[formatter_full]
format = %(asctime)s - %(levelname)s: %(message)s

[formatter_syslog]
format = radicale [%(process)d]: %(message)s

Better the using logrotate.

@TheFabe

This comment has been minimized.

TheFabe commented Jul 14, 2015

I am quite sure the problem is not really caused by the "not enough space"-issue, even if this also will bring the processing to a halt. I repeatedly experienced the hanging although I have never had any "out of space issues".
Since I moved my radicale "behind" an nginx-server I have never had this problem again.
I still restart my radicale every few days, so I can make sure no changes are lost while I make a backup. This stopping and starting never had any issues.
The "difficult part" of setting up nginx was creating the "radicale"-site in the sites-enabled directory. I created a specifiy passwd-file for this and made several "location" directives so I can setup different auth_basic "realm" for authentication.

snippet from radicale file in /etc/nginx/sites-enabled
server {
listen 12345; ## listen for ipv4; this line is default and implied (port I use for clients with SSL)

    ssl on;
    # I share my certificate with my dovecot running in the same raspi host
    ssl_certificate /etc/dovecot/dovecot.pem;
    ssl_certificate_key /etc/dovecot/private/dovecot.pem;

    ssl_session_timeout 5m;

    ssl_protocols SSLv3 TLSv1;
    ssl_ciphers ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv3:+EXP;
    ssl_prefer_server_ciphers on;

    root /usr/share/nginx/www;
    index index.html index.htm;

    # Make site accessible from http://localhost/
    server_name localhost;

    location / {
            auth_basic "Radicale - Password Required";
            auth_basic_user_file radicale.passwd;
            proxy_pass http://localhost:5432;
            # This is the port radicale will "listen" on internally
    }
    location /user1 {
            auth_basic "Radicale for user1";
            auth_basic_user_file radicale.passwd;
            proxy_pass http://localhost:5432;
    }
    location /user2 {
            auth_basic "Radicale for user2";
            auth_basic_user_file radicale.passwd;
            proxy_pass http://localhost:5232;
    }

}
I only have a handful of users, and one "shared" user with well-known credentials.

@TheFabe TheFabe closed this Jul 14, 2015

@TheFabe TheFabe reopened this Jul 14, 2015

@TheFabe

This comment has been minimized.

TheFabe commented Jul 14, 2015

Sorry, closing this was not what I wanted...

@tmst

This comment has been minimized.

tmst commented Jul 16, 2015

On Tuesday 14 July 2015 01:50:23 AM Christian Schoenebeck wrote:

Here my logging config so Radicale do logrotate by itself:

Interesting. I would have never guessed:
args = ('/var/log/radicale/radicale','a',32768,3)

How did you determine the semantics "()"?

@chris5560

This comment has been minimized.

chris5560 commented Jul 16, 2015

@halo

This comment has been minimized.

halo commented Jul 16, 2015

Please keep the logfile stuff to the logfile issue, those people will seek for answers over there, not here :)

So nobody could confirm my questions about a pattern for when the locking occurs? Does it always lock everywhere as soon as multiple clients are involved and radicale is not behind a proxy?

If yes, then the in-built python server is useless (but I suspect more people would have this problem if it would always occur). If the answer is no then I'm wondering whether it could be related to Rasbian/Minibian. Maybe python is too old there?

(And I don't mean the out-of-disk issue. I would almost expect it to hang in that case, plus it has been confirmed that this is not related to the hanging described in this thread)

@tmst

This comment has been minimized.

tmst commented Jul 20, 2015

@halo Sorry. I haven't discovered a pattern for the hanging. I'm connecting from several devices (Android running CalDAV-sync, KDE, and iCal on Mac), both over the local network and the Internet using a no-ip hostname updated by my DSL modem's software.

When it all works, it's sweet. But there are way too many points of potential failure.

@halo

This comment has been minimized.

halo commented Jul 20, 2015

Well, I'm currently spawning Radicale with Passenger (nginx) and didn't experience any hanging yet. But what I'm really after (given we cannot figure out any pattern) is whether the out-of-the-box server works totally problem-free at all for someone?

If it does, we're just unlucky (and more unlucky people might follow our lead, hah). Plus I'd like to hear one lucky person speak up and confirm that it does work at all times :)

If it does not, one might even consider ripping it out of Radicale (think how simple the configuration would be without TLS) and everyone should be forced to set up something like uwsgi to increase general reliability. At the least, the documentation should make super clear that that's not a working setup.

@liZe

This comment has been minimized.

Member

liZe commented Jul 24, 2015

I gather from the comments that nobody using a webserver has the problem. But is it consistent behavior that spinning up the python daemon causes the hang every time with two clients even without SSL? Does it only happen on Raspberry PIs?

From my long experience using Radicale:

  • The default HTTP server included in Python works until it doesn't. Sometimes it works for months, sometimes it hangs after a couple of minutes. With or without TLS.
  • Using a "real" HTTP server fixes the problem.

A lot of people have been complaining about this Python HTTP server hanging, with Radicale but with other projects too. I tried hard to fix this bug, but unfortunately I didn't even find out how to reproduce it.

I'd be really happy to see this bug fixed, but I personally gave up a long time ago.

I really wish that we could achieve what the website says:

Works out-of-the-box, no installation nor configuration required

Because it's not everyone's cup of tea to setup uwsgi etc. Especially if people just want to quickly try it out to see how robust it is compared to similar tools programmed in other languages (which was how I ran into the problem).

If anyone needs help, I'll be happy to give hints.

@daks

This comment has been minimized.

daks commented Jul 24, 2015

My point of view (but I may be wrong) is that it should be behind a dedicated web server.

I first tested Radicale with the builtin server, with few clients and configurations. But once I wanted to put it in 'production' (for family use only) I automatically put it behind nginx+gunicorn (as I do for other Python apps like Flask or Django).
But it's right that I did it because I know how to do it, but it's not the case of every potential Radicale user.

Maybe the documentation should indicate that for testing it's ok to run the process alone but for real use the setup is more complex.
It could also be possible to include some files (wsgi, gunicorn...) to ease deployment.

I can share my setup if needed.

@flusi100

This comment has been minimized.

flusi100 commented Feb 28, 2016

I was playing with radicale and put it in production for my familiy with the default server. I only used it in my private network, so it was no security issue. After hours of wondering, what was going on (logs don't help here) I found this bug. I regret to put trust in the already mentioned statement of the first page: easy to install. Now I have to bother with nginx and wsgi, what I wanted to avoid for my setup. After some hours of searching the web for a working configuration (wide spread ubuntu server) I still have no working setup. Really frustrating. For me it now gets very complicated and I think of using another solution, because bringing up the feature rich sogo was easier...

@deronnax

This comment has been minimized.

Contributor

deronnax commented Feb 28, 2016

Hi.
I'm not related to Radicale. I am very sorry for what you're experiencing. You can replace Nginx and uwsgi with Gunicorn, which is much, much easier to configure (it almost works out of the box).

@liZe : that hanging problem seems to be a really big problem a lot of people encouter. Maybe the Radicale web page should recommend a default deployment using Gunicorn. What do you think ?

@flusi100

This comment has been minimized.

flusi100 commented Mar 4, 2016

I really would like to stay with radicale, but this bug makes it hard for me (and most probably for many others). If it cannot be fixed, maybe a readme.txt for typical setups would help. Other projects could be a role model.

@danmcd

This comment has been minimized.

danmcd commented Mar 7, 2016

If you run it on illumos (or even Oracle Solaris) the pstack(1) command is your new best friend ("pstack pgrep -f radicale"). Mine locks, and is NOT open to the public Internet. It hangs in read() on a socket. If a device (typically phone, sometimes laptop) goes off the net before radicale thinks it's done, it just hangs in read() forever.

I currently hack around this with a cron(1) job that checks pstack every five minutes, and if it's in read(), it kills -9 radicale. I do not run it behind a real web server, but I suspect the read() hang has to do with the Python web server, as others have noted. Next time I get a hit from my cron jobs, I'll share the pstack output here.

@untitaker

This comment has been minimized.

Contributor

untitaker commented Mar 7, 2016

What would be interesting is: If Radicale hangs, what happens if you shut down (really, shut down) all possible clients? This would help identify whether Radicale hangs up over a non-existent connection, or if clients leave one open for further use (which does hang up Radicale, which is already a know issue).

@danmcd

This comment has been minimized.

danmcd commented Mar 7, 2016

ALL clients? The locks I'm seeing are on a read to a specific open connection. I can imagine shutting down (as in powering off) the client in question will send FIN or RST if its side of the connection exists and is open, but that seems like an unnecessary burden on the client(s).

@untitaker

This comment has been minimized.

Contributor

untitaker commented Mar 7, 2016

@danmcd If it does close the connection properly, we can conclude that neither side has bugs:

  • Clients assume the server can handle more than one connection (they usually do) and keep theirs open to speed up further requests.
  • Radicale can only handle one connection at a time. This is necessary right now because handling multiple requests would lead to data races in its internal storage.
@liZe

This comment has been minimized.

Member

liZe commented Mar 7, 2016

@deronnax You must not just replace the builtin server with Gunicorn, this leads to data races! See #276.

That's true, tried here, no problem during one year, and then suddenly one calendar disappeared from the disk. When you read the code and see how write works, it's really simple to understand how a race condition can happen.

If it does close the connection properly, we can conclude that neither side has bugs.

True. We could set a configurable timeout on the socket as a quick and dirty solution, what do you think?

@danmcd

This comment has been minimized.

danmcd commented Mar 7, 2016

A modest timeout on read() would be most useful, I think.

@untitaker

This comment has been minimized.

Contributor

untitaker commented Mar 7, 2016

@liZe We could also make Radicale free of data races. E.g. we could have a lock for the complete server, and acquire it for each actual request (instead of open connection, which isn't exposed in WSGI anyway). Then you can use gunicorn to safely work around this issue.

For this, we need https://pypi.python.org/pypi/filelock/ (threadlocks will not suffice because gunicorn spawns multiple processes).

@liZe

This comment has been minimized.

Member

liZe commented Mar 7, 2016

@untitaker Yes, I've already seen this project, looks like a very good solution.

By the way, we need a timeout even with a multithread/multiprocess solution.

@untitaker

This comment has been minimized.

Contributor

untitaker commented Mar 7, 2016

@liZe Yes, gunicorn already provides all of that. I know you don't want to introduce any dependencies, but I don't think the alternative of vendoring everything is viable.

You could also bundle both filelock and some pure-Python server like e.g. waitress.

@liZe

This comment has been minimized.

Member

liZe commented Mar 7, 2016

@untitaker I'm now open to good ideas, even with dependencies: http://librelist.com/browser//radicale/2015/8/21/radicale-1-0-is-coming-what-s-next/

I'll open a ticket to talk about this.

@liZe

This comment has been minimized.

Member

liZe commented Mar 7, 2016

See #364.

@liZe liZe modified the milestone: 2.0 Mar 14, 2016

@liZe liZe referenced this issue Apr 4, 2016

Closed

Follow roadmap for 2.0 #372

36 of 39 tasks complete
@eatdust

This comment has been minimized.

eatdust commented May 19, 2016

Hi there,
radicale 1.1.1 on mageia 5. Hangs at least once every 24 hours, I am considering changing to another caldav server as it makes it unusable for me professionally.
Hope I can help.
cheers.

@liZe

This comment has been minimized.

Member

liZe commented May 19, 2016

@eatdust As it is explained in this ticket, using the monothread server built in Python will always lead to hangs AFAIK. The only reliable solution for now is to put Radicale behind a "real" HTTP server (I use nginx, but it's compatible with all the WSGI-compatible servers, including the Python-based ones).

@tmst

This comment has been minimized.

tmst commented May 20, 2016

I think you intended to put a link to the thread in here. Yes?

Say, I wonder how many of us are using Radicale behind a KDE client.
I've had considerably less (or no) trouble after switching to
Thunderbird. I'm still using Radicale 0.7.

On 05/19/2016 06:59 AM, Guillaume Ayoub wrote:

@eatdust https://github.com/eatdust As it is explained in this
ticket, using the monothread server built in Python will always lead
to hangs AFAIK. The only reliable solution for now is to put Radicale
behind a "real" HTTP server (I use nginx, but it's compatible with all
the WSGI-compatible servers, including the Python-based ones).


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#266 (comment)

@untitaker

This comment has been minimized.

Contributor

untitaker commented May 20, 2016

It is the thread you are replying to.

On 20 May 2016 08:26:54 CEST, Tom Russell notifications@github.com wrote:

I think you intended to put a link to the thread in here. Yes?

Say, I wonder how many of us are using Radicale behind a KDE client.
I've had considerably less (or no) trouble after switching to
Thunderbird. I'm still using Radicale 0.7.

On 05/19/2016 06:59 AM, Guillaume Ayoub wrote:

@eatdust https://github.com/eatdust As it is explained in this
ticket, using the monothread server built in Python will always lead
to hangs AFAIK. The only reliable solution for now is to put Radicale

behind a "real" HTTP server (I use nginx, but it's compatible with
all
the WSGI-compatible servers, including the Python-based ones).


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#266 (comment)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#266 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@eatdust

This comment has been minimized.

eatdust commented May 20, 2016

@liZe Got it.
Indeed, I now running it under nginx with just proxying and it seems to work fine. Sorry about this, maybe this should be explained in the doc. As it was my first attempt to use radicale, I have just followed the doc to set the standalone python server. Maybe the doc should have instead a tutorial to orient new users for setting up radicale behind a http server.
Thanks.

This was referenced Oct 13, 2016

@liZe

This comment has been minimized.

Member

liZe commented Mar 4, 2017

No that Radicale is data-race free, I think that we can safely close this issue. See #197 and #372 for the documentation part.

@tmst

This comment has been minimized.

tmst commented Apr 20, 2017

@a3nm

This comment has been minimized.

a3nm commented Apr 21, 2017

Some more feedback about this issue: I worked around the problem by running Radicale behind Apache2 as a WSGI. There doesn't seem to be any more problems when doing this.

@tmst

This comment has been minimized.

tmst commented Apr 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment