Eliminate memory exhaustion on webserver with high rate, LARGE binary data #192

pbeeson · 2015-08-04T20:31:15Z

In using rosbridge, I was passing VERY dense, LARGE pointclouds at 20Hz. I noticed that after a while, the tornado write buffer for the websocket was monotonically increasing because the web client wasn't pulling data as fast as it was written.

This simple change addresses this by creating blocking binary data on a topic until the tornado websocket buffer is flushed out. This keeps the tornado buffer from monotonically increasing and filling up system memory until the entire machine crashes.

…rver buffer from filling up when attached to slow clients.

T045T · 2015-08-05T14:41:10Z

Wouldn't this be just as big (of not bigger) a problem for high-frequency non-BSON data?

I'm not quite comfortable with adding another place where messages can be dropped (there's already the send and receive queues, this adds another one), but I'm not sure there is a better way - it would be a lot more work to not read new messages from the topic as long as tornado is blocked, right?

pbeeson · 2015-08-05T16:04:08Z

On Wednesday, August 5, 2015, Nils Berg notifications@github.com wrote:

Wouldn't this be just as big (of not bigger) a problem for high-frequency
non-BSON data?

Possibly. Right now I know that many people might not be using the BSON
binary encoding so I wanted to leave their transmissions alone (I also
figured that if data was large it probably uses a unit8 array, like images
and point clouds. I'm not sure what non-binary structures would be large
and high rate.)

I'm not quite comfortable with adding another place where messages can be
dropped (there's already the send and receive queues, this adds another
one), but I'm not sure there is a better way - it would be a lot more work
to not read new messages from the topic as long as tornado is blocked,
right?

Maybe not a lot more work, but I'm not sure that's it's functionally
different. I use this and I don't see "dropped" packets because if your
client can't consume the web socket data as fast as the server is
writing it, you are going to start getting delays on the client and you'll
run out of server memory before long. So best for the client to get
packets "on demand" and this pull request was the easiest modification I
could make to have that happen. It doesn't affect services or parameters,
only topics and only when using BSON for binary (though that could change
to any topic if desired).

—
Reply to this email directly or view it on GitHub
#192 (comment)
.

ablakey · 2015-08-07T06:12:23Z

rosbridge_server/src/rosbridge_server/websocket_handler.py

        binary = type(message)==bson.BSON
-        IOLoop.instance().add_callback(partial(self.write_message, message, binary))
+        if topic == None or not binary:


Forgive me if these comments are too pedantic. Relatively new to community PRs and code reviews =)

if topic is None or not binary would be more Pythonic.

mvollrath · 2016-01-17T02:18:18Z

Please see if this helps your problem: #203

pbeeson · 2016-01-19T13:42:25Z

While Issue #203 may help prolong runtime, the fact that you have to set this manually on the server side does not completely fix it. In a case where a client still cannot consume the websocket data fast enough, the server side buffer will increase without bound. A skilled programmer could fiddle with delay_between_messages but that shouldn't be required. The simple "queue size 1" solution that I proposed in Issue #192 "adapts" throughput to the client's consumption speed.

mvollrath · 2016-01-28T06:30:47Z

This would result in the client losing data without any warning on either end.

pbeeson · 2016-01-28T15:53:19Z

Agreed. I preferred that for my streaming application over coming back the next day to a core dump on my server.

jihoonl · 2016-02-10T07:22:02Z

rosbridge should not assume that binary == large size data. There are some corner cases which may drop or block inappropriate messages.

For example, assume that rosbridge is streaming both uuid_msgs/UniqueID(small but binary format) and visualization_msgs/Marker(large but non-binary format) to the same client. Then it would start to block uuid messages even if the socket is actually stucked by marker messages.

To solve the problem, how about using depthcloud_encoder and web_video_server to stream the pointcloud data? I think it is more make sense to separate out the large and dense message to use another channel.

Tutorial : Point Cloud Streaming from a Kinect

pbeeson · 2016-02-10T13:21:13Z

Depth cloud encoder seems to be hard coded for Kinect. It doesn't support denser larger clouds from stereo devices. It also is not lossless.

I agree with all your arguments against my fix as a general solution, but currently when sending 250 MB or more a second, if the client does not consume quickly enough (perhaps because of slow 3D rendering of large data), it does not take long before the server fills up the computer's memory and crashes the machine. That was not acceptable in my application, where as dropping some frames when attached to a slow client was acceptable (since the client couldn't consume them anyway).

jihoonl · 2016-02-11T12:45:18Z

@pbeeson can you confirm that the server memory overflows if and only if there is a slow client?

If so, I would suggest to utilise throttle_rate and queue_length to throttle down the message sending to a slow client.

See rosbridge specification 3.4.4 Subscribe section it describes how to use them. Also you can check the ThrottleMessageHandler and QueueMessageHandler implementation to see how will manage message sending to the client. throttle_rate and queue_size are configurable in roslibjs.Topic.subscribe

If the server memory overflows even if there is no slow client, let me know. I will check the subscriber logic in rosbridge_library to see if something is handled inappropriately.

pbeeson · 2016-02-11T13:24:18Z

I don't want to throttle. If a client is fast it should get all the data. If slow I want the server to adapt seamlessly to the client.
queue size only a affects the server not the client and does not help efficiently stream data to a slow client.

jihoonl · 2016-02-22T06:18:01Z

This cannot be merged anyway. I have created issue(#212) to follow up the problem. We will continue discussion there. Closing this PR.

Added a queue of size 1 for binary messages to help keep websocket se…

3f3d3f1

…rver buffer from filling up when attached to slow clients.

ablakey reviewed Aug 7, 2015
View reviewed changes

jihoonl mentioned this pull request Feb 22, 2016

Memory exhaustion on server with high rate large blob data #212

Closed

jihoonl closed this Feb 22, 2016

skeel3r mentioned this pull request Apr 15, 2016

Large PointCloud Jumping and Latency RobotWebTools/ros3djs#144

Open

jubeira mentioned this pull request Jan 3, 2019

PointCloud example issues RobotWebTools/ros3djs#243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate memory exhaustion on webserver with high rate, LARGE binary data #192

Eliminate memory exhaustion on webserver with high rate, LARGE binary data #192

pbeeson commented Aug 4, 2015

T045T commented Aug 5, 2015

pbeeson commented Aug 5, 2015

ablakey Aug 7, 2015

mvollrath commented Jan 17, 2016

pbeeson commented Jan 19, 2016

mvollrath commented Jan 28, 2016

pbeeson commented Jan 28, 2016

jihoonl commented Feb 10, 2016

pbeeson commented Feb 10, 2016

jihoonl commented Feb 11, 2016

pbeeson commented Feb 11, 2016

jihoonl commented Feb 22, 2016

Eliminate memory exhaustion on webserver with high rate, LARGE binary data #192

Eliminate memory exhaustion on webserver with high rate, LARGE binary data #192

Conversation

pbeeson commented Aug 4, 2015

T045T commented Aug 5, 2015

pbeeson commented Aug 5, 2015

ablakey Aug 7, 2015

Choose a reason for hiding this comment

mvollrath commented Jan 17, 2016

pbeeson commented Jan 19, 2016

mvollrath commented Jan 28, 2016

pbeeson commented Jan 28, 2016

jihoonl commented Feb 10, 2016

pbeeson commented Feb 10, 2016

jihoonl commented Feb 11, 2016

pbeeson commented Feb 11, 2016

jihoonl commented Feb 22, 2016