The commit 2e251ce seem to introduce problems for gunicorn.
If I run gunicorn with more than 1 worker, I get bad file descriptor sockets and everything seem to fall apart.
Tested with gunicorn 0.13 and 0.17.
Running the commit before that one seems to work fine.
@balboah I had issues with that commit as well but I thought we had fixed it since then. I'll do some testing.
Yeah, multiple workers are broken.
Any updates on this? If anyone has a clue of what modification could have caused this, i can take a look at it...
just pinging to say I'm having the same issue here.
@sontek : which one are the commits that try to fix this issue?
Seem to have the same problems as well... Websockets sometimes work/sometimes not... Generally spoken everything's quite buggy. I'd also like to investigate on this but don't know where to start?
Generally spoken I'd like to ask how do you generally recommend to run it in production?
I think I have tracked down the issue to these two added lines in socketio.server.SocketIOServer.get_socket():
2e251cec socketio/server.py (Alexandre Bourget 2012-11-15 17:29:02 -0500 131) if sessid and not socket:
2e251cec socketio/server.py (Alexandre Bourget 2012-11-15 17:29:02 -0500 132) return None # you ask for a session that doesn't exist!
To elaborate, it looks like each worker process is given its own SocketIOServer, and each SocketIOServer maintains its own dictionary (self.sockets) mapping sessids to Sockets.
During the connection process, two separate requests are made. First, a request is made to: /socket.io/1/ (which I guess is the handshake?), and then this is followed up by another call to something like /socket.io/1/websocket/419838463361 (which actually establishes the socket).
If one worker handles the first request, and then another worker handles the second request, the second request is going to fail because it won't be able to find the sessid saved by the first worker on its instance of SocketIOServer.
The reason it works sporadically is because sometimes you would coincidentally have the same worker handle both requests, so it will be able to find the sessid it saved. Of course, the more workers you have, the less likely that is to occur, which explains the behavior reported in issue #125.
Commenting out those 2 lines of code resolves the issue, but I'm not familiar enough with the project to know whether that's a suitable long term solution. (It will end up creating a duplicate Socket on the second worker. I haven't seen where they are being cleaned up but I guess it could cause a leak or be problematic in some other way?)
After reviewing the code in socketio.handler.SocketIOHandler._do_handshake and the socket.io spec I don't see any reason that the handshake request needs to actually create a Socket, the only thing it uses it for is the socket.sessid which is a randomly generated number. It seems like you could just directly generate that random number for the handshake, and wait to create the Socket until the client makes the follow up 'transport connection' request.
I think that would resolve the connectivity issue for the websocket transport when running multiple workers, but as related issue #112 mentions, it will still be problematic for the polling based transports.
workaround for #132
Please read invitation to Wednesday September 18th's sprint: https://groups.google.com/forum/#!topic/gevent-socketio/2OIRKA8M2uE
We are facing the same issue. Is there a solution yet to this problem? Has the workaround proven to be sustainable? I am unwilling just to comment some lines on the production machine...
Commenting that out will just mean you don't care if the socket was present before. If you don't save anything in the session.. then it might not change anything for your use case, but it breaks statefullness. It is quite tricky to handle socket.io requests on two workers. Some messages could v be pending on worker 1 while you connect to worker 2.. potentially relaunching background jobs, etc.
To keep statefullness, ideally, one client should always connect to the same worker. Even with front load balancers, the same path through reverse proxies should be kept for the duration of the socketio connection.
It involves devops stuff, amd I'm not sure exactly which part of that problen should be dealt with directly in gevent-socketio.
Any suggestions ?
Our application is not stateful, so no problems there. I installed the latest version of gevent-socketio and applied the patch. Unfortunately, the patch made no difference at all.
Our application (it's a game) is pretty simple: the website polls data on two different paths regularly (with different frequency). The server sends changes that occur on the server side of things to all connected clients. Up to that point, we did not notice any problems (but maybe they were already there).
We now added a simple chat functionality. The client sends a message to the server, which broadcasts it to all clients (sender included). The server receives the message from the client without problems, but the broadcast arrives at the clients only at about a third of the cases.
I am not even sure if our problem is related to the one outlined in this (and the other) issues, but we noticed that it works 100% when we reduce the gunicorn workers to 1.
@abourget Let me know if I can help/test in any way. I can't really suggest anything for the solution of this as we are only users and not at all familiar with the architecture of gevent.
NB: We were using the pip install up to now, which did not include the line in question. I installed the latest git, but I am not sure if this is advisable. Which version should we use on a production system?
About the chat issue:
see, having 2 or more workers, means 2 or more processes. Each process is independant, holds its own Python objects. The socket.io lib keeps a list of open sockets within it's process.. so if it happens 2 users are connected to this process, then "broadcasting" will send messages to those connected locally.. it doesn't know about users connected to another worker, nothing is shared between them.
In order to build something that would broadcast to all users available, you'd need an external system, like a Redis PubSub, or 0mq or RabbitMQ kind of thing.. and every worker would connect to that central element, subscribe, and publish. It would be the glue between the isolated workers.
Does that make sense ?
Ok, that makes a lot of sense. One question though: the sender doesn't see his own message usually. Does that just mean that he switched worker thread in the meantime?
Thanks for the replies and sorry that this went a bit OT.
Depends on which Mixin you're using. There are two broadcast functions in the BroadcastMixin.. one that echoes back to the user, one that doesn't.
Thanks for your comments. The nondeterministic behavior was definitely due to some instance mixup and a lack of basic understanding. We rewrote the code using redis pub/sub and everything works as expected!
hv same problem here , already try using redis but still buggy , sometimes it can send message to browser but sometimes it can't . Any updates ?
Going back on the statelessness and sessions, am I correct to assume that the request will be sent only once for socket communication purposes? I am currently using pyramid with a session backed on redis. Is that be a good way to maintain a state – I am using the request.session dict for authentication purposes.
Workaround for #132
I have the same problem, so is Beaker a decent solution?