Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The solver hangs if I compile with threads and run it from python. #8

Closed
stschubert opened this issue Jan 10, 2020 · 5 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@stschubert
Copy link

Is there anything new about the below reported issue? I'm asking since I have the same problem.

Update: Looks like the solver hangs if I compile with threads and run it from python. I removed the cmake flags above.

Originally posted by @NikolausDemmel in #7 (comment)

@stschubert stschubert changed the title **Update:** Looks like the solver hangs if I compile with threads and run it from python. I removed the cmake flags above. The solver hangs if I compile with threads and run it from python. Jan 10, 2020
@stschubert
Copy link
Author

By now, I could figure out that there is a problem in minisam/python/minisam_wrapper/factor.cpp in PYBIND11_OVERLOAD_PURE(Eigen::VectorXd, Factor, error, variables); (line 70). However, I don't know how pybind works...

@dongjing3309
Copy link
Owner

Hi @stschubert , my feeling is I must have a bug in multi-threading and causes deadlock. Do you have the example here which will helps me debug a lot? Thanks!

@dongjing3309 dongjing3309 self-assigned this Jan 21, 2020
@dongjing3309 dongjing3309 added the bug Something isn't working label Jan 21, 2020
@NikolausDemmel
Copy link

I just came across this discussion about python bindings and multi-threading, and thought that it might be related.

@Edwinem
Copy link
Contributor

Edwinem commented Feb 25, 2020

I was able to reproduce the issue with the gps_factor_example.py. I can also confirm that it has to do with the GIL.

Running the code through a debugger it deadlocks currently on

linthreads[i].join();

and

https://github.com/pybind/pybind11/blob/5b4751af269afab9a2e40a92b486bd29214f352b/include/pybind11/pybind11.h#L1911

It can be fixed by releasing the GIL before the thread joins. As seen below.

  pybind11::gil_scoped_release release;
  // wait threads to finish
  for (int i = 0; i < MINISAM_WITH_MULTI_THREADS_NUM; i++) {
    linthreads[i].join();
  }

This has to be done in both FactorGraph.cpp and linearization.cpp.

Note this now creates a dependency on Python and Pybind11 in minisam. So you probably have to add some new compile time flags where if the python package is enabled then this line is added.

Update:

A better fix is provided in #10. It is cleaner and does not require a Python dependency.

@dongjing3309
Copy link
Owner

should be fixed by PR1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants