Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate memory exhaustion on webserver with high rate, LARGE binary data #192

Closed
wants to merge 1 commit into from
Closed

Conversation

pbeeson
Copy link

@pbeeson pbeeson commented Aug 4, 2015

In using rosbridge, I was passing VERY dense, LARGE pointclouds at 20Hz. I noticed that after a while, the tornado write buffer for the websocket was monotonically increasing because the web client wasn't pulling data as fast as it was written.

This simple change addresses this by creating blocking binary data on a topic until the tornado websocket buffer is flushed out. This keeps the tornado buffer from monotonically increasing and filling up system memory until the entire machine crashes.

…rver buffer from filling up when attached to slow clients.
@T045T
Copy link
Contributor

T045T commented Aug 5, 2015

Wouldn't this be just as big (of not bigger) a problem for high-frequency non-BSON data?

I'm not quite comfortable with adding another place where messages can be dropped (there's already the send and receive queues, this adds another one), but I'm not sure there is a better way - it would be a lot more work to not read new messages from the topic as long as tornado is blocked, right?

@pbeeson
Copy link
Author

pbeeson commented Aug 5, 2015

On Wednesday, August 5, 2015, Nils Berg notifications@github.com wrote:

Wouldn't this be just as big (of not bigger) a problem for high-frequency
non-BSON data?

Possibly. Right now I know that many people might not be using the BSON
binary encoding so I wanted to leave their transmissions alone (I also
figured that if data was large it probably uses a unit8 array, like images
and point clouds. I'm not sure what non-binary structures would be large
and high rate.)

I'm not quite comfortable with adding another place where messages can be
dropped (there's already the send and receive queues, this adds another
one), but I'm not sure there is a better way - it would be a lot more work
to not read new messages from the topic as long as tornado is blocked,
right?

Maybe not a lot more work, but I'm not sure that's it's functionally
different. I use this and I don't see "dropped" packets because if your
client can't consume the web socket data as fast as the server is
writing it, you are going to start getting delays on the client and you'll
run out of server memory before long. So best for the client to get
packets "on demand" and this pull request was the easiest modification I
could make to have that happen. It doesn't affect services or parameters,
only topics and only when using BSON for binary (though that could change
to any topic if desired).


Reply to this email directly or view it on GitHub
#192 (comment)
.

binary = type(message)==bson.BSON
IOLoop.instance().add_callback(partial(self.write_message, message, binary))
if topic == None or not binary:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me if these comments are too pedantic. Relatively new to community PRs and code reviews =)

if topic is None or not binary would be more Pythonic.

@mvollrath
Copy link
Contributor

Please see if this helps your problem: #203

@pbeeson
Copy link
Author

pbeeson commented Jan 19, 2016

While Issue #203 may help prolong runtime, the fact that you have to set this manually on the server side does not completely fix it. In a case where a client still cannot consume the websocket data fast enough, the server side buffer will increase without bound. A skilled programmer could fiddle with delay_between_messages but that shouldn't be required. The simple "queue size 1" solution that I proposed in Issue #192 "adapts" throughput to the client's consumption speed.

@mvollrath
Copy link
Contributor

This would result in the client losing data without any warning on either end.

@pbeeson
Copy link
Author

pbeeson commented Jan 28, 2016

Agreed. I preferred that for my streaming application over coming back the next day to a core dump on my server.

@jihoonl
Copy link
Member

jihoonl commented Feb 10, 2016

rosbridge should not assume that binary == large size data. There are some corner cases which may drop or block inappropriate messages.

For example, assume that rosbridge is streaming both uuid_msgs/UniqueID(small but binary format) and visualization_msgs/Marker(large but non-binary format) to the same client. Then it would start to block uuid messages even if the socket is actually stucked by marker messages.

To solve the problem, how about using depthcloud_encoder and web_video_server to stream the pointcloud data? I think it is more make sense to separate out the large and dense message to use another channel.

@pbeeson
Copy link
Author

pbeeson commented Feb 10, 2016

Depth cloud encoder seems to be hard coded for Kinect. It doesn't support denser larger clouds from stereo devices. It also is not lossless.

I agree with all your arguments against my fix as a general solution, but currently when sending 250 MB or more a second, if the client does not consume quickly enough (perhaps because of slow 3D rendering of large data), it does not take long before the server fills up the computer's memory and crashes the machine. That was not acceptable in my application, where as dropping some frames when attached to a slow client was acceptable (since the client couldn't consume them anyway).

@jihoonl
Copy link
Member

jihoonl commented Feb 11, 2016

@pbeeson can you confirm that the server memory overflows if and only if there is a slow client?

If so, I would suggest to utilise throttle_rate and queue_length to throttle down the message sending to a slow client.

See rosbridge specification 3.4.4 Subscribe section it describes how to use them. Also you can check the ThrottleMessageHandler and QueueMessageHandler implementation to see how will manage message sending to the client. throttle_rate and queue_size are configurable in roslibjs.Topic.subscribe

If the server memory overflows even if there is no slow client, let me know. I will check the subscriber logic in rosbridge_library to see if something is handled inappropriately.

@pbeeson
Copy link
Author

pbeeson commented Feb 11, 2016

  1. I don't want to throttle. If a client is fast it should get all the data. If slow I want the server to adapt seamlessly to the client.

  2. queue size only a affects the server not the client and does not help efficiently stream data to a slow client.

@jihoonl
Copy link
Member

jihoonl commented Feb 22, 2016

This cannot be merged anyway. I have created issue(#212) to follow up the problem. We will continue discussion there. Closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants