-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU load when services advertised by clients are called but dont deliver service_response #265
Comments
graceful_shutdown() gives the handler time to error out of any existing service requests. This is important because we busy-wait for a rosbridge response for service calls and those threads do not get stopped otherwise. Also, rospy service clients do not currently support timeouts, so any clients would be stuck too. fixes RobotWebTools#265
This gives the service a bit of time to cancel any in-flight service requests (which should fix RobotWebTools#265). This is important because we busy-wait for a rosbridge response for service calls and those threads do not get stopped otherwise. Also, rospy service clients do not currently support timeouts, so any clients would be stuck too. A new test case in test_service_capabilities.py verifies the fix works
This gives the service a bit of time to cancel any in-flight service requests (which should fix RobotWebTools#265). This is important because we busy-wait for a rosbridge response for service calls and those threads do not get stopped otherwise. Also, rospy service clients do not currently support timeouts, so any clients would be stuck too. A new test case in test_service_capabilities.py verifies the fix works
This gives the service a bit of time to cancel any in-flight service requests (which should fix #265). This is important because we busy-wait for a rosbridge response for service calls and those threads do not get stopped otherwise. Also, rospy service clients do not currently support timeouts, so any clients would be stuck too. A new test case in test_service_capabilities.py verifies the fix works
@cjue Did you confirm if this fixed the issue for you? @T045T I am facing the same issue. Here are the steps to reproduce it.
Referring to your comments in other thread The client may not be able to send the 'unadertise_service' message before disconnecting (due to network level disconnects). When the client disconnects, does the rosbridge server automatically unadvertise the services advertised by that client? I am using release 0.11.12-1focal.20201201.165130 for ROS Noetic. Thanks in advance. |
Sorry for not following up after #312. From its description by @T045T :
This is of course relevant. But it does not fix the issue described by @dhirajdhule, which is also what I had in mind. This scenario could also occur when the client advertising the service is killed or crashes for some reason, without a clean unadvertisement. Of course it would be best practice to try to handle any shutdowns and unadvertise the service, but it's unfortunate that bad code in one client program can cause permanent ~100% CPU usage in the unrelated rosbridge process. |
Another observation. In a custom application, I closed the rosbridge_protocol instance by calling .finish() method on it. However, it still is calling the outgoing() callback when local ROS system calls a service advertised by rosbridge (before calling the finish method). The code comments mention that after calling finish() on the protocol instance, it should not be used again. However it seems this advise is not followed internally. I couldn't also find any variable which can programmatically tell me if protocol finish() method has been called. Thanks. |
Hi everyone,
maybe this is working as intended, but I noticed that rosbridge has permanently high CPU load after forwarding call_service messages that are not answered.
My test client sends the following messages and then disconnects from rosbridge_websocket:
In the end rosbridge_websocket continues to block one CPU core, presumably with busy waiting:
Is this the expected behavior or a bug?
I tested against the bleeding edge "develop" branch as well as against 0.7.16.
The text was updated successfully, but these errors were encountered: