Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory exhaustion on server with high rate large blob data #212

Closed
jihoonl opened this issue Feb 22, 2016 · 9 comments
Closed

Memory exhaustion on server with high rate large blob data #212

jihoonl opened this issue Feb 22, 2016 · 9 comments

Comments

@jihoonl
Copy link
Member

jihoonl commented Feb 22, 2016

This is follow-up issue of #192.

Questions

  • Should rosbridge have intelligence or configuration to adapt slow and fast clients seamlessly?
  • Rosbridge server will suffer from memory exhaustion if slow client is subscribing to high rate large chuck data anyway. How should rosbridge react on this memory exhaustion?
@mvollrath
Copy link
Contributor

Adaptation sounds complicated, but it seems like an additional per-client queue is needed in the outgoing implementation to prevent clients from killing the server with greedy subscriptions. This queue would be configured by the server's parameters, outside of the protocol's throttle and queue options. There should probably be a warning message the first time a message is dropped from a client's queue.

Additionally, a status message (as described in the protocol) should be sent to warn the client that they are dropping messages and need to add throttling.

@sevenbitbyte
Copy link
Member

sevenbitbyte commented Feb 24, 2017

Should rosbridge have intelligence or configuration to adapt slow and fast clients seamlessly?

Yes ROS bridge should serve all clients as well as the available resources will allow. I think dynamically throttling a slow client may be a good direction to go in if it can be done in a way that also detects if the client's connection becomes faster.

Rosbridge server will suffer from memory exhaustion if slow client is subscribing to high rate large chuck data anyway. How should rosbridge react on this memory exhaustion?

A client should not be able to impact the server's performance so easily. Server development should expect clients to not be well behaved and expect that they are not ideally optimizated. If greater sever stability means dropping packets in new ways then by all means please do. The protocol does appear to have all needed features to inform the client that they are getting throttle by way of the status messages.

The incentive to app developers is to optimize clients(use throttling) so that users have the expected performance but when writing apps I don't always no apriori the exact network conditions my apps are used in. So its just not always possible to get throttle settings right for all users on all networks. Getting hits from the server about performance would make it easier from the app development perspective.

@mvollrath
Copy link
Contributor

The rosbridge_library protocol has a throttle and queue system, but since it dumps messages into the Tornado queue asynchronously, the backpressure from the client doesn't make it back to the protocol's queue. Instead Tornado infinitely queues the backlog.

If we extend the RosbridgeWebSocket handler to hand off outgoing messages to an intermediate per-connection queue instead of pushing straight to Tornado, we can make synchronous Tornado writes from per-connection threads. This way we gain control of the backpressure on the server side and clients don't slow each other down.

@viktorku
Copy link
Member

viktorku commented Nov 7, 2018

This backpressure in the server from slow clients can be solved with a simple circular buffer which drops stale messages after a certain cap limit. Wouldn't that be the simplest to implement? Does Tornado offer this out of the box in some way?

@mvollrath
Copy link
Contributor

Tornado gives us a Future when we write to the socket, so we can either ignore it (as we do currently) or wait for it to finish to effectively make the write synchronous. Making the write synchronous would be the simplest way to connect the backpressure.

However, since all the web socket writes happen sequentially in the subscriber thread upon receiving a message, the last client would be waiting for the synchronous writes of all the other clients before it. This is why we would need a thread per connection to decouple the clients synchronous writes and give them their own queues.

@mvollrath
Copy link
Contributor

Or better yet, decouple the threads further up the chain where MultiSubscriber is iterating over the callbacks. This way the queue handler works as intended and we aren't adding a queue after a queue.

@mvollrath
Copy link
Contributor

I did some more looking around and the answer might be simpler than I thought.

Adding a small blurb to RosbridgeWebSocket.send_message lets us identify which thread is sending the message to Tornado:

    def send_message(self, message):
        """..."""
        import threading
        print 'writing from {}'.format(threading.current_thread())
        IOLoop.instance().add_callback(partial(self.write_message, message, binary))

Now the results depend completely on the settings the client used to subscribe to the topic. I tested two clients running identical applications subscribed to the same topic.

With queue_length unset or set to 0, both send from the subscriber thread:

writing from <Thread(/bench/pcl, started daemon 140434852521728)>
writing from <Thread(/bench/pcl, started daemon 140434852521728)>

With queue_length set to any value > 0, the sends are decoupled by QueueMessageHandler:

writing from <QueueMessageHandler(Thread-7, started daemon 140427042748160)>
writing from <QueueMessageHandler(Thread-15, started daemon 140427034355456)>

So if we can figure out how to make writes synchronous and default queue_length to some non-zero value, we hook up backpressure to the QueueMessageHandler like users would expect.

I also found Queue support in Tornado which might be useful, but haven't looked at it closely.

mvollrath added a commit to EndPointCorp/rosbridge_suite that referenced this issue Dec 11, 2018
Instead of piling outgoing messages onto Tornado's infinite callback queue, block until previous messages have been written.

This change connects web socket backpressure to rosbridge_library and rospy queues.

Fixes RobotWebTools#212
@mvollrath
Copy link
Contributor

I've had some success with making web socket writes synchronous in Tornado 4.5.3 by locking both the callback addition and the write itself (with a future callback releasing the Lock). Tornado is still invisibly queuing some unknown amount of data, maybe down in the socket. If the write_message Future ever fails to run callbacks (which is not uncommon), the server's IOLoop is completely frozen. Unless newer Tornado versions are more reliable, this is on the shelf for now.

@mvollrath
Copy link
Contributor

mvollrath added a commit to EndPointCorp/roslibjs that referenced this issue Jan 29, 2019
Decouple rendering from WebSocket backpressure.

A client-side solution for RobotWebTools/rosbridge_suite#212
mvollrath added a commit to EndPointCorp/rosbridge_suite that referenced this issue Jan 31, 2019
jihoonl pushed a commit that referenced this issue Feb 1, 2019
jihoonl pushed a commit to RobotWebTools/roslibjs that referenced this issue Mar 12, 2019
* Remove unused WebSocket import from SocketAdapter

* Add background WebSocket transport

Decouple rendering from WebSocket backpressure.

A client-side solution for RobotWebTools/rosbridge_suite#212

* Upgrade Karma

Fixes my issues with running the tests.

* Default to websocket transport

"workersocket" doesn't work in Node, so it's not a good default.

* Fix karma configs for 3.x

* Add close() method for WorkerSocket

* Add pubsub test for workersocket transportLibrary

Sanity check.
MatthijsBurgh pushed a commit to RobotWebTools/roslibjs that referenced this issue Jun 20, 2021
* Remove unused WebSocket import from SocketAdapter

* Add background WebSocket transport

Decouple rendering from WebSocket backpressure.

A client-side solution for RobotWebTools/rosbridge_suite#212

* Upgrade Karma

Fixes my issues with running the tests.

* Default to websocket transport

"workersocket" doesn't work in Node, so it's not a good default.

* Fix karma configs for 3.x

* Add close() method for WorkerSocket

* Add pubsub test for workersocket transportLibrary

Sanity check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants