Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How mutch websocket connection can one daphne handle at the same time #67

Closed
ostcar opened this issue Jan 14, 2017 · 6 comments
Closed

Comments

@ostcar
Copy link

ostcar commented Jan 14, 2017

I experience strange problems when I connect a lot of websocket connections (nearly) at the same time to one daphne instance.

I wrote this go program to connect a specific amount of websocket clients to daphne. Then I started the liveblock example with python manage.py runserver, created a blog and connected the go script to it:

./testWsConnections -url ws://localhost:8000/liveblog/foo/stream/ --clients 400

Of cause, there are a lot of 503-responses (channel layer full). In this case the program ignores the error and retires to connect after 100ms. But there are a lot of cases, in which the connection is closed without any response. The program counts this cases and retires to connect after 100ms. In the end, all clients are connected and all clients receive messages when the liveblog changes.

When I try to connect 200 clients, it takes around 2 Seconds and I get no connection lost.

When I try to connect 400 clients, it takes around 5 Seconds and I get only a few connection lost.

When I try to connect 600 clients, it takes around 10 Seconds and I get around 70 connection lost.

When I try to connect 800 clients, it takes around 60 Seconds and I get around 200 connection lost.

Is it a bug, that daphne closes a websocket handshake without a response? Why does the time to connect clients increases exponential?

The connection lost only happens, when there are a lot of unhandled websocket handshakes at the same time. When I connect 100 clients at the same time, wait until all are connected and then connect the next 100 clients, it takes only 14 seconds until until 800 client are connected and there are no lost connections.

@proofit404
Copy link
Member

I guess this is the result of connection handling mechanics. Websocket handshake will be in open state until websocket.connect consumer return {'accept': True} or similar result. If websocket.connect channel is full, incoming connection is closed with 503 error code. I don't know why time to connect increases exponential. Default channel capacity is 100 so it explains the last situation. Try to increase websocket.connect capacity with channel_capacity argument of channel layer.

@andrewgodwin
Copy link
Member

Daphne shouldn't be closing connections without any response - is there any console logging from it when this is happening? Do you know what WebSocket close code you get in the browser?

@ostcar
Copy link
Author

ostcar commented Jan 16, 2017

There is no console logging at all (except of the CONNECT, HANDSHAKING and DISCONNECT messages). I also don't get any WebSocket close code. It seems, as if the tcp socket is closed before the websocket handshake is finished. The golang error message is: connection reset by peer

I run my tests again and came to much better results (but there are still lost connections). It "feels" like it is better, when i flush the redis db. But I am not sure. I also get the same errors, when I switch to the asgi inmemory backend.

I looked in the redis-db and it seems, that there are as as many keys in the form "asgi:websocket.send!PVPpOgmJFeV" as there are lost connections.

I tried for some time to reproduce the error with only a few clients by altering some parts of daphne or the asgi_redis backend but could not make it.

I have nomore ideas, how to debug this.

@andrewgodwin
Copy link
Member

No error at all is unexpected; even if it errors out somewhere in the Python processing you should get a Twisted traceback at minimum. I'm not sure there's much I can do at this point, but any more information might help!

@ostcar
Copy link
Author

ostcar commented Jan 16, 2017

You can get the same behaviour, when you lower the open file limit for daphne (withulimit -n 20). Then you also get a closed connection without any error message or traceback.

But this is not the problem with this issue. With this issue you can have all connections opend in the end. When you set a low open file limit, then there can't be more then $(ulimit -n) - 10 open connections.

@andrewgodwin
Copy link
Member

Entire websocket handling code is rewritten to not be as nasty and I haven't been able to replicate this, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants