Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multi-threaded execution of in MonteCarloSimulation launched from Python #17363

Open
RussTedrake opened this issue Jun 9, 2022 · 6 comments
Assignees
Labels
component: pydrake Python API and its supporting Starlark macros priority: medium type: feature request

Comments

@RussTedrake
Copy link
Contributor

This is a follow-up to the conversation started in this Slack thread: https://drakedevelopers.slack.com/archives/C2CHRT98E/p1653658311307059

The monte-carlo simulation tools have become a useful model for multi-process executions in Drake. We currently disallow multi-threaded execution for MonteCarloSimulation at the binding layer:

// Note: parallel simulation must be disabled in the binding via
// num_parallel_executions=kNoConcurrency, since parallel execution of
// Python systems in multiple threads is not supported.

According to @calderpg-tri ... running the current python tests for monte-carlo with multi-processing enabled will cause the system to hang. My current understanding of the problem is that the c++ code needs deal with the Python GIL. (see, for instance, pybind/pybind11#1087). The big question is whether we can handle this at the level of the bindings, or whether every potential c++ caller must deal with managing the lock.

Enabling this for the monte-carlo tools will pave the way for many other relevant use cases (such as trajectory optimization and DrakeGym) where users would benefit substantially from multi-process executions but also have the ability to pass Python callbacks into the C++ execution.

+@EricCousineau-TRI as pydrake owner.

@EricCousineau-TRI
Copy link
Contributor

I liked the original intent of the thread - to explicitly fail if attempting to mix C++ multithreading with Python.

For concurrent execution in Python involving CPython API extensions, I think multiprocessing is perhaps the simplest / best way forward. Fixing pydrake bindings with GIL is feasible, but would require some design thought, esp. when interacting with other CPython API extensions such as OpenCV.

I will probably be unable to dedicate any real time to this in the near future myself. Am not entirely sure how to delegate.

@jwnimmer-tri
Copy link
Collaborator

For concurrent execution in Python involving CPython API extensions, I think multiprocessing is perhaps the simplest / best way forward.

Yes, I tend to agree with this.

Am not entirely sure how to delegate.

For my own part, I don't see any particular urgency here. I believe the resolution would be to create an illustration / example / tutorial showing how to set up concurrent simulations using multiprocessing.

@RussTedrake
Copy link
Contributor Author

Two thoughts:

  1. python multiprocessing normally gets stuck because our Diagram and Context objects aren't pickle-able. In DrakeGym I had to work around that by deferring the construction of the diagrams/contexts to the subprocess (which is not ideal)

  2. An example showing multiprocessing would be good, but I'm pretty sure it is not the resolution I seek. I want to run concurrent c++ code (like montecarlo simulations), but in workflows that can be invoked from python. A better resolution from my perspective would be to enable the MonteCarloSimulation, but have it throw if it encountered a python callback in the parallel execution. This is where my questions started on the slack thread linked above.

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Jun 28, 2022

(2) Sure. If we add a new function bool System::IsThreadSafe() const then we can have Monte Carlo fail-fast in case there is a PyLeafSystem or PyVectorSystem or similar in play when num_threads > 1.

This should catch >90% of the user errors. (I'm not confident that we can plug all of the callback holes. The Monte Carlo documentation should directly explain to Python users that they cannot have any Python code in their Diagram in any way.)

The main challenge will be with the ScalarSystemFunction output callback. It is called on multiple threads, but is a Python functor. We'll could add a lock there in the pydrake bindings so that output calculations happen one-at-a-time, blocking the worker threads in the meantime; the same lock will also need to interpose itself on the simulator factory callback -- the Python code can either be making a new simulator, or summarizing the output, never both concurrently.

Another approach would be to add a "bool use_serial_callbacks" flag to Monte Carlo itself, so that it internally governs the make vs output callbacks -- it might be able to keep the thread schedule more full if we give it direct control. It might also end up being useful for C++ code that can't easily make a thread safe output calculator.

(1) I think pickling the functor that creates the diagram, instead of the diagram itself, should be a fairly easy and plausible work-around for many cases. It also makes it possible to scale the concurrency beyond a single machine. The functor can have bound arguments (functools.partial) for any extra configuration data; it doesn't always need to be a zero-argument function. It will scale in case the user needs to add a Python System (e.g., looping in their controller). It is a minor hurdle to carve out the factory functor to pickle it, but it has the upside that you don't hit a brick wall anytime you need to go beyond pydrake's C++-bound code.

@calderpg-tri
Copy link
Contributor

(2) For the case of ScalarSystemFunction output another approach would be to restructure the dispatch loop so that output is only called in serial on completed simulations, no locks necessary. Doesn't solve potential callbacks in the system, but does ensure that the user-provided methods are never called in parallel.

@jwnimmer-tri
Copy link
Collaborator

FYI we have this comment now:

// Note: This hard-codes `parallelism` to be off, since parallel execution
// of Python systems on multiple threads was thought to be unsupported. It's
// possible that with `py::call_guard<py::gil_scoped_release>` it would
// actually be fine, so we could revisit that decision at some point.

Adding the parallelism=... arg to the bindings is probably easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pydrake Python API and its supporting Starlark macros priority: medium type: feature request
Projects
None yet
Development

No branches or pull requests

4 participants