Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incoming service calls #669

Merged

Conversation

DomenicP
Copy link
Contributor

Public API Changes
N/A

Description
Not to be confused with #650 (although possibly related?).

After porting the IncomingQueue class from the ROS1 branch in #663, I discovered that incoming service calls from rosbridge clients were no longer working. The rosbridge server would receive the service call and run until this line in the call_service() function where it would get stuck:

spin_until_future_complete(node_handle, future)

On the client side the connection would sit open ("pending" according to the Chrome debugger) waiting for a response that would never arrive.

After doing some digging I believe the problem is that we are now attempting to call rclpy.spin_until_future_complete() on a background thread instead of the main thread. I'm not super familiar with the details, but doing some quick research seems to indicate that at least historically trying to call spin() from multiple threads can cause problems. For example:

(These links are not directly related to the issue at hand but are semi-related problems I've seen regarding spinning and multiple threads.)

Since this code is running in the IncomingQueue thread instead of the main thread, it should be safe to use a synchronous service call here without blocking the main event loop. Making this change fixed the service calls and the client received responses again.

One question I have is if we are going to switch back to a synchronous service call, do we want to run it in the IncomingQueue thread? Or do we want to create a new thread for each service call so that the IncomingQueue is not blocked? The ServiceCaller class inherits from threading.Thread and historically under the ROS1 branch it looks like it was run with .start() instead of .run() like it is now.

Perhaps someone with a better understanding of Tornado and the event loop and threads used in rosbridge has a better idea of how this situation could be handled without adding more threads?

@jtbandes
Copy link
Member

jtbandes commented Nov 5, 2021

Hi, thanks for your contributions! I've just refactored some of the test code (#675) to allow us to more easily write tests which use the WebSocket server. Could you please update this PR to add a test — ideally one which fails without the proposed fixes, and passes with them? Check out the tests I'm adding in #666 for an example.

@jmarsik
Copy link

jmarsik commented Nov 21, 2021

+1

Please fix this, rosbridge is currently unusable (at least in Foxy) with this bug :( I've spent some time debugging why Foxglove Studio tool does not display anything when connecting to our ROS2 instance through rosbridge, then I found this...

Maybe I will find some time later to write that test, but not right now.

@DomenicP
Copy link
Contributor Author

Just pushed a commit that contains a test for the service call fix. With the patch from the first commit reverted, the test stalls until it times out due to the previously described issue. With the proposed fixes the test passes successfully. I borrowed the expect_messages() function from common.py since #666 has not been merged yet.

@jmarsik
Copy link

jmarsik commented Nov 24, 2021

Oh, perfect, you were faster than me :) Thank you ... Now let's get it merged and released!

@jtbandes
Copy link
Member

I'll take a look at this after the holiday weekend!

@jtbandes jtbandes self-assigned this Nov 25, 2021
@jtbandes
Copy link
Member

historically under the ROS1 branch it looks like it was run with .start() instead of .run() like it is now.

Thanks for pointing that out, I might not have noticed it otherwise. I was going to say it looks like the service call is happening in a background thread anyway, so changing it back to synchronous is fine, but now I see it's not actually running in a separate thread anymore (and the fact that it inherits from Thread is just tech debt now).

Perhaps someone with a better understanding of Tornado and the event loop and threads used in rosbridge has a better idea of how this situation could be handled without adding more threads?

I haven't followed all the IncomingQueue changes and am still learning this codebase, but I would generally say that no blocking calls should be happening in the websocket message handler if possible. It seems rclpy provides a decent number of APIs that are async-friendly, which is how I fixed #650 without adding more threads. I wonder if we used this async approach consistently for all message handlers, perhaps we wouldn't need the IncomingQueue thread at all?

In the meantime this small fix seems useful and is something we can iterate on.

@jtbandes
Copy link
Member

If you enable allow edits from maintainers then I can help resolve the merge conflict. It's a pretty minimal conflict though.

@DomenicP
Copy link
Contributor Author

If you enable allow edits from maintainers then I can help resolve the merge conflict. It's a pretty minimal conflict though.

I'd be glad to do so, except unfortunately I don't see the option available on the PR. Might be a parent org setting preventing me from enabling that option. Regardless, I'll do a quick rebase to resolve the merge conflict in just a minute.

* After porting `IncomingQueue` from the ros1 branch, service calls from rosbridge clients are not receiving any response.
* Traced the problem back to trying to call `spin_until_future_complete()` from a background thread.
* Previously this function would have been called from the same thread that `rclpy.spin()` is called on.

Add a websocket test for service call
@jtbandes jtbandes merged commit c624f97 into RobotWebTools:ros2 Nov 29, 2021
@DomenicP DomenicP deleted the fix-incoming-service-calls branch November 29, 2021 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants