Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the connection more reliable on bad condition networks #96

Conversation

awesomebytes
Copy link
Collaborator

Also this adds parameters to tune the timeout on checking if the data was sent and how big the pending messages buffer can be.

With values like 10 on bad networks this made the connection drop a lot more frequently which made the renewConnection pop a lot more.

dddomodossola added a commit that referenced this pull request Dec 3, 2015
…etwork

Make the connection more reliable on bad condition networks
@dddomodossola dddomodossola merged commit 68e8670 into rawpython:master Dec 3, 2015
@nzjrs
Copy link
Collaborator

nzjrs commented Dec 3, 2015

I don't think adding magic tuning parameters here is a good idea.

We either make it reliable or we don't. Asking the user to select a random timeout to maybe fix things is not really sustainable

@awesomebytes
Copy link
Collaborator Author

I didn't change the timeout, I made it possible to change it. It was
hardcoded to 1s.

The reliability is given from capturing the error I managed to reproduce
consistently on a stressed wifi network and reconnecting when this error I
could reproduce happens.

To reproduce the errors commented in the code related to error code 1006 I
needed to generate 5mbps traffic, use a high traffic remi app (joystick
moving it like crazy) and messing physically by moving one of the wifi
connected devices.

Sorry if my pr/commit was not too clear, I spent too many hours in a row
fighting this.
On Dec 3, 2015 11:51 PM, "John Stowers" notifications@github.com wrote:

I don't think adding magic tuning parameters here is a good idea.

We either make it reliable or we don't. Asking the user to select a random
timeout to maybe fix things is not really sustainable


Reply to this email directly or view it on GitHub
#96 (comment).

@dddomodossola
Copy link
Collaborator

@nzjrs Yeah, I think you are right. But I don't figure out the best way.
Should we have a comunication buffer or should we drop pending messages?
Should we have a comunication timeout value or should we consider disconnected on pending messages?
I'm sure about only one thing, the comunication is unstable from the browser side because of the incomplete websocket API. We don't have any feedback about disconnection.

@dddomodossola
Copy link
Collaborator

With these last changes I found that the comunication is stable and in case of fault it reconnects without problems.
BUT I've a doubt. At reconnection all pending messages are sent again to the server and the program hangs untill all messages terminates to be transferred. Futhermore these messages can be old and could not be yes suitable. I'm not sure about this. Maybe it's better to drop all pendings. What do you think about this?

@awesomebytes
Copy link
Collaborator Author

Depends on the usecase, make it a parameter and let the user choose.

The default behaviour is up to you to decide. As the library is more
focused on giving a user interface what usually is better is having a fast
response, so dropping these messages sounds more appealing.

I don't know if we can get into any synchronization problems in between
what the GUI is showing and what the python server thinks its represented
(I don't fully understand the internals of how it works) if we drop
messages.

2015-12-04 10:36 GMT+01:00 Davide notifications@github.com:

With these last changes I found that the comunication is stable and in
case of fault it reconnects without problems.
BUT I've a doubt. At reconnection all pending messages are sent again to
the server and the program hangs untill all messages terminates to be
transferred. Futhermore these messages can be old and could not be yes
suitable. I'm not sure about this. Maybe it's better to drop all pendings.
What do you think about this?


Reply to this email directly or view it on GitHub
#96 (comment).

@dddomodossola
Copy link
Collaborator

Ok. The risk of buffering is that, after losing the connection, the user interacts with the GUI (without knowing the actual server state), and at reconnection there is inconsistence of messages sent from the GUI to the server. I think that in all cases, all pendings have to be dropped.

Just to explain with an example. @awesomebytes You are working with a joystick driving a robot. If you lose the connection during interaction, at reconnection the robot will be moved without control. Is it true?

@awesomebytes
Copy link
Collaborator Author

In my example, indeed, I get all pending messages in a row. Altho I have a
separated loop that just sends on a fixed rate (10hz) the last command I
got, so I'm not so badly affected by this behaviour. But anyways for this
app I would totally want to get rid of the buffering.

2015-12-04 11:15 GMT+01:00 Davide notifications@github.com:

Ok. The risk of buffering is that, after losing the connection, the user
interacts with the GUI (without knowing the actual server state), and at
reconnection there is inconsistence of messages sent from the GUI to the
server. I think that in all cases, all pendings have to be dropped.

Just to explain with an example. @awesomebytes
https://github.com/awesomebytes You are working with a joystick driving
a robot. If you lose the connection during interaction, at reconnection the
robot will be moved without control. Is it true?


Reply to this email directly or view it on GitHub
#96 (comment).

@nzjrs
Copy link
Collaborator

nzjrs commented Dec 4, 2015

I'm not suggesting the fix is not an improvement, it is just non sustainable to ask the user to select an arbitrary parameter which changes REMI's behaviour from 'works most of the time', to 'works a little bit more than most of the time, but certainly not all of the time'.

@dddomodossola
Copy link
Collaborator

@nzjrs I think understand what you say John. I don't understand instead what you suggest to do. Maybe a statistical determination of timeout at runtime?

@nzjrs
Copy link
Collaborator

nzjrs commented Dec 4, 2015

Unless there are some reasonable unit-tests or ways to reproduce this I can't really offer a prediction.

I'm also starting to think this homegrown real-time messaging stack is never going to be as well tested as a more robust and tested one (autobahn maybe?)

@dddomodossola
Copy link
Collaborator

Using exernal comunication libraries could be a solution. But you know my adversity about this.
I am worried about portability compromission and the installation overhead. Furthermore, I think this is a conceptual problem instead of an implementation issue. And I am not sure that other libraries could solve this.
However I will go deep in this question, trying to analize what is the impact on using autobahn or something similar.

@nzjrs
Copy link
Collaborator

nzjrs commented Dec 4, 2015

It's a javascript library, so we just ship it with the library.... we don't have to use the server side components if we dont want to.

@awesomebytes
Copy link
Collaborator Author

I would prefer not depending on other "big" libraries like autobahn, but if
that will ensure good communication, I'm all in.

Also, I'm super happy with the current status of the communication of the
library. (And I'm probably the one testing it in the worse possible
conditions)
On Dec 4, 2015 2:34 PM, "Davide" notifications@github.com wrote:

Using exernal comunication libraries could be a solution. But you know my
adversity about this.
I am worried about portability compromission and the installation
overhead. Furthermore, I think this is a conceptual problem instead of an
implementation issue. And I am not sure that other libraries could solve
this.
However I will go deep in this question, trying to analize what is the
impact on using autobahn or something similar.


Reply to this email directly or view it on GitHub
#96 (comment).

@awesomebytes
Copy link
Collaborator Author

Didn't know that, sounds nice!
On Dec 4, 2015 2:41 PM, "John Stowers" notifications@github.com wrote:

It's a javascript library, so we just ship it with the library.... we
don't have to use the server side components if we dont want to.


Reply to this email directly or view it on GitHub
#96 (comment).

@nzjrs
Copy link
Collaborator

nzjrs commented Dec 4, 2015

I did have some other autobahn like (the JS side only) bookmarked, I will have to try and find them.

I'm sure there are many other websocket wrappers that do things like message queuing etc.

@dddomodossola
Copy link
Collaborator

Ok perfect. If it's only javascript we can try to use it! ;-) @nzjrs Thank you for the patience, I know that sometimes it seems that I would reinvent the wheel, but.. you already knows my motivations...

@nzjrs
Copy link
Collaborator

nzjrs commented Dec 4, 2015

socket.io is another option

@nzjrs
Copy link
Collaborator

nzjrs commented Dec 4, 2015

It also handles reconnection logic etc. If you just want to use the websocket abstraction in socket.io that is apparently called engine.io

I misspoke earlier, I think socket.io is the project I was remembering, and would recommend

@nzjrs
Copy link
Collaborator

nzjrs commented Dec 4, 2015

The pure python server side is here https://github.com/miguelgrinberg/python-socketio (if you want to use it..., if not, just copy the code you need here (its MIT)

@awesomebytes
Copy link
Collaborator Author

I took a real quick eye to socket.io and looks great!
Dunno about how difficult it would be to integrate it.

2015-12-04 17:50 GMT+01:00 John Stowers notifications@github.com:

It also handles reconnection logic etc. If you just want to use the
websocket abstraction in socket.io that is apparently called engine.io

I misspoke earlier, I think socket.io is the project I was remembering,
and would recommend


Reply to this email directly or view it on GitHub
#96 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants