Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU load when services advertised by clients are called but dont deliver service_response #265

Closed
cjue opened this issue Feb 28, 2017 · 4 comments · Fixed by #312
Closed

Comments

@cjue
Copy link

cjue commented Feb 28, 2017

Hi everyone,

maybe this is working as intended, but I noticed that rosbridge has permanently high CPU load after forwarding call_service messages that are not answered.

My test client sends the following messages and then disconnects from rosbridge_websocket:

# Advertise service
{"type": "std_srvs/SetBool", "service": "/horse/test_service", "op": "advertise_service"}

# call the service
{"args": {"data": true}, "service": "/horse/test_service", "op": "call_service"}

# rosbridge forwards the message like this:
{"args": {"data": true}, "id": "service_request:/horse/test_service:1", "service": "/horse/test_service", "op": "call_service"}

# client unadvertises the service
{"service": "/horse/test_service", "op": "unadvertise_service"}

In the end rosbridge_websocket continues to block one CPU core, presumably with busy waiting:

ps aux | grep rosbridge
juelg    30485 81.6  0.1 741984 58468 ?        Ssl  19:20   0:59 python /home/juelg/robot_folders/checkout/horse_rosbridge_suite/catkin_workspace/src/rosbridge_suite/rosbridge_server/scripts/rosbridge_websocket __name:=rosbridge_websocket __log:=/home/juelg/.ros/log/fed3a90e-f85d-11e6-9556-c85b766c783b/rosbridge_websocket-1.log

Is this the expected behavior or a bug?
I tested against the bleeding edge "develop" branch as well as against 0.7.16.

T045T added a commit to T045T/rosbridge_suite that referenced this issue Jan 8, 2018
graceful_shutdown() gives the handler time to error out of any existing service requests.
This is important because we busy-wait for a rosbridge response for service calls and those threads do not get stopped otherwise.
Also, rospy service clients do not currently support timeouts, so any clients would be stuck too.

fixes RobotWebTools#265
@T045T T045T mentioned this issue Jan 8, 2018
T045T added a commit to T045T/rosbridge_suite that referenced this issue Jan 9, 2018
This gives the service a bit of time to cancel any in-flight service requests (which should fix RobotWebTools#265).
This is important because we busy-wait for a rosbridge response for service calls and those threads do not get stopped otherwise.
Also, rospy service clients do not currently support timeouts, so any clients would be stuck too.

A new test case in test_service_capabilities.py verifies the fix works
T045T added a commit to T045T/rosbridge_suite that referenced this issue Jan 9, 2018
This gives the service a bit of time to cancel any in-flight service requests (which should fix RobotWebTools#265).
This is important because we busy-wait for a rosbridge response for service calls and those threads do not get stopped otherwise.
Also, rospy service clients do not currently support timeouts, so any clients would be stuck too.

A new test case in test_service_capabilities.py verifies the fix works
Behery pushed a commit that referenced this issue Jan 10, 2018
This gives the service a bit of time to cancel any in-flight service requests (which should fix #265).
This is important because we busy-wait for a rosbridge response for service calls and those threads do not get stopped otherwise.
Also, rospy service clients do not currently support timeouts, so any clients would be stuck too.

A new test case in test_service_capabilities.py verifies the fix works
@T045T
Copy link
Contributor

T045T commented Jan 10, 2018

@cjue Please check whether #312 actually fixes your issue, and feel free to reopen this if it doesn't!

@dhirajdhule
Copy link

dhirajdhule commented Feb 3, 2021

@cjue Did you confirm if this fixed the issue for you?

@T045T I am facing the same issue. Here are the steps to reproduce it.

  1. Client connects to rosbridge websocket server
  2. Client advertises a service on the server (op code: advertise_service)
  3. Client disconnects.
  4. Rosservice (cmd line) called on the machine running rosbridge websocket
  5. The rosbridge process occupies ~98% CPU core.
  6. The response (failure) for service call is never sent to the rosservice caller (it hangs until rosbridge is closed).

Referring to your comments in other thread
Primarily, this should fix #265 (but only if a service is actually unadvertised. If it just loses connection and does not answer, the client will still busy-wait).

The client may not be able to send the 'unadertise_service' message before disconnecting (due to network level disconnects). When the client disconnects, does the rosbridge server automatically unadvertise the services advertised by that client?

I am using release 0.11.12-1focal.20201201.165130 for ROS Noetic.

Thanks in advance.

@cjue
Copy link
Author

cjue commented Feb 3, 2021

Sorry for not following up after #312. From its description by @T045T :

Primarily, this should fix #265 (but only if a service is actually unadvertised. If it just loses connection and does not answer, the client will still busy-wait).

This is of course relevant. But it does not fix the issue described by @dhirajdhule, which is also what I had in mind. This scenario could also occur when the client advertising the service is killed or crashes for some reason, without a clean unadvertisement.

Of course it would be best practice to try to handle any shutdowns and unadvertise the service, but it's unfortunate that bad code in one client program can cause permanent ~100% CPU usage in the unrelated rosbridge process.

@dhirajdhule
Copy link

Another observation.

In a custom application, I closed the rosbridge_protocol instance by calling .finish() method on it. However, it still is calling the outgoing() callback when local ROS system calls a service advertised by rosbridge (before calling the finish method).

The code comments mention that after calling finish() on the protocol instance, it should not be used again. However it seems this advise is not followed internally. I couldn't also find any variable which can programmatically tell me if protocol finish() method has been called.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants