Memory exhaustion on server with high rate large blob data #212

jihoonl · 2016-02-22T06:16:47Z

This is follow-up issue of #192.

Questions

Should rosbridge have intelligence or configuration to adapt slow and fast clients seamlessly?
Rosbridge server will suffer from memory exhaustion if slow client is subscribing to high rate large chuck data anyway. How should rosbridge react on this memory exhaustion?

mvollrath · 2016-02-26T19:27:48Z

Adaptation sounds complicated, but it seems like an additional per-client queue is needed in the outgoing implementation to prevent clients from killing the server with greedy subscriptions. This queue would be configured by the server's parameters, outside of the protocol's throttle and queue options. There should probably be a warning message the first time a message is dropped from a client's queue.

Additionally, a status message (as described in the protocol) should be sent to warn the client that they are dropping messages and need to add throttling.

sevenbitbyte · 2017-02-24T23:04:48Z

Should rosbridge have intelligence or configuration to adapt slow and fast clients seamlessly?

Yes ROS bridge should serve all clients as well as the available resources will allow. I think dynamically throttling a slow client may be a good direction to go in if it can be done in a way that also detects if the client's connection becomes faster.

Rosbridge server will suffer from memory exhaustion if slow client is subscribing to high rate large chuck data anyway. How should rosbridge react on this memory exhaustion?

A client should not be able to impact the server's performance so easily. Server development should expect clients to not be well behaved and expect that they are not ideally optimizated. If greater sever stability means dropping packets in new ways then by all means please do. The protocol does appear to have all needed features to inform the client that they are getting throttle by way of the status messages.

The incentive to app developers is to optimize clients(use throttling) so that users have the expected performance but when writing apps I don't always no apriori the exact network conditions my apps are used in. So its just not always possible to get throttle settings right for all users on all networks. Getting hits from the server about performance would make it easier from the app development perspective.

mvollrath · 2018-11-06T21:24:03Z

The rosbridge_library protocol has a throttle and queue system, but since it dumps messages into the Tornado queue asynchronously, the backpressure from the client doesn't make it back to the protocol's queue. Instead Tornado infinitely queues the backlog.

If we extend the RosbridgeWebSocket handler to hand off outgoing messages to an intermediate per-connection queue instead of pushing straight to Tornado, we can make synchronous Tornado writes from per-connection threads. This way we gain control of the backpressure on the server side and clients don't slow each other down.

viktorku · 2018-11-07T20:19:50Z

This backpressure in the server from slow clients can be solved with a simple circular buffer which drops stale messages after a certain cap limit. Wouldn't that be the simplest to implement? Does Tornado offer this out of the box in some way?

mvollrath · 2018-11-07T21:14:04Z

Tornado gives us a Future when we write to the socket, so we can either ignore it (as we do currently) or wait for it to finish to effectively make the write synchronous. Making the write synchronous would be the simplest way to connect the backpressure.

However, since all the web socket writes happen sequentially in the subscriber thread upon receiving a message, the last client would be waiting for the synchronous writes of all the other clients before it. This is why we would need a thread per connection to decouple the clients synchronous writes and give them their own queues.

mvollrath · 2018-11-07T21:57:23Z

Or better yet, decouple the threads further up the chain where MultiSubscriber is iterating over the callbacks. This way the queue handler works as intended and we aren't adding a queue after a queue.

mvollrath · 2018-11-08T11:55:09Z

I did some more looking around and the answer might be simpler than I thought.

Adding a small blurb to RosbridgeWebSocket.send_message lets us identify which thread is sending the message to Tornado:

    def send_message(self, message):
        """..."""
        import threading
        print 'writing from {}'.format(threading.current_thread())
        IOLoop.instance().add_callback(partial(self.write_message, message, binary))

Now the results depend completely on the settings the client used to subscribe to the topic. I tested two clients running identical applications subscribed to the same topic.

With queue_length unset or set to 0, both send from the subscriber thread:

writing from <Thread(/bench/pcl, started daemon 140434852521728)>
writing from <Thread(/bench/pcl, started daemon 140434852521728)>

With queue_length set to any value > 0, the sends are decoupled by QueueMessageHandler:

writing from <QueueMessageHandler(Thread-7, started daemon 140427042748160)>
writing from <QueueMessageHandler(Thread-15, started daemon 140427034355456)>

So if we can figure out how to make writes synchronous and default queue_length to some non-zero value, we hook up backpressure to the QueueMessageHandler like users would expect.

I also found Queue support in Tornado which might be useful, but haven't looked at it closely.

Instead of piling outgoing messages onto Tornado's infinite callback queue, block until previous messages have been written. This change connects web socket backpressure to rosbridge_library and rospy queues. Fixes RobotWebTools#212

mvollrath · 2018-12-11T23:05:55Z

I've had some success with making web socket writes synchronous in Tornado 4.5.3 by locking both the callback addition and the write itself (with a future callback releasing the Lock). Tornado is still invisibly queuing some unknown amount of data, maybe down in the socket. If the write_message Future ever fails to run callbacks (which is not uncommon), the server's IOLoop is completely frozen. Unless newer Tornado versions are more reliable, this is on the shelf for now.

mvollrath · 2018-12-11T23:08:22Z

For the curious, https://github.com/EndPointCorp/rosbridge_suite/tree/fix_ws_backpressure

Decouple rendering from WebSocket backpressure. A client-side solution for RobotWebTools/rosbridge_suite#212

Fixes RobotWebTools#212

Fixes #212

* Remove unused WebSocket import from SocketAdapter * Add background WebSocket transport Decouple rendering from WebSocket backpressure. A client-side solution for RobotWebTools/rosbridge_suite#212 * Upgrade Karma Fixes my issues with running the tests. * Default to websocket transport "workersocket" doesn't work in Node, so it's not a good default. * Fix karma configs for 3.x * Add close() method for WorkerSocket * Add pubsub test for workersocket transportLibrary Sanity check.

jihoonl mentioned this issue Feb 22, 2016

Eliminate memory exhaustion on webserver with high rate, LARGE binary data #192

Closed

skeel3r mentioned this issue Apr 15, 2016

Large PointCloud Jumping and Latency RobotWebTools/ros3djs#144

Open

mvollrath added a commit to EndPointCorp/roslibjs that referenced this issue Jan 29, 2019

Add background WebSocket transport

d5e0738

Decouple rendering from WebSocket backpressure. A client-side solution for RobotWebTools/rosbridge_suite#212

mvollrath mentioned this issue Jan 29, 2019

Add WebSocket in WebWorker transport RobotWebTools/roslibjs#317

Merged

mvollrath added a commit to EndPointCorp/rosbridge_suite that referenced this issue Jan 31, 2019

Synchronous websocket write

4008f44

Fixes RobotWebTools#212

mvollrath mentioned this issue Jan 31, 2019

Synchronous websocket write #385

Merged

jihoonl closed this as completed in #385 Feb 1, 2019

jihoonl pushed a commit that referenced this issue Feb 1, 2019

Synchronous websocket write (#385)

7f2d165

Fixes #212

AdamVig mentioned this issue Aug 14, 2019

feat: pull latest changes from upstream (TOOLS-1086) 6RiverSystems/roslibjs#18

Merged

basecoplan mentioned this issue May 12, 2020

rosbridge_server hanging randomly under load #425

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory exhaustion on server with high rate large blob data #212

Memory exhaustion on server with high rate large blob data #212

jihoonl commented Feb 22, 2016

mvollrath commented Feb 26, 2016

sevenbitbyte commented Feb 24, 2017 •

edited

mvollrath commented Nov 6, 2018

viktorku commented Nov 7, 2018

mvollrath commented Nov 7, 2018

mvollrath commented Nov 7, 2018

mvollrath commented Nov 8, 2018

mvollrath commented Dec 11, 2018

mvollrath commented Dec 11, 2018

Memory exhaustion on server with high rate large blob data #212

Memory exhaustion on server with high rate large blob data #212

Comments

jihoonl commented Feb 22, 2016

mvollrath commented Feb 26, 2016

sevenbitbyte commented Feb 24, 2017 • edited

mvollrath commented Nov 6, 2018

viktorku commented Nov 7, 2018

mvollrath commented Nov 7, 2018

mvollrath commented Nov 7, 2018

mvollrath commented Nov 8, 2018

mvollrath commented Dec 11, 2018

mvollrath commented Dec 11, 2018

sevenbitbyte commented Feb 24, 2017 •

edited