Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Implement CPU multithreaded version (pthreads/TBB/OpenCL/MPI etc) #126
Right now Bullet 2 is single threaded and we discarded the obsolete BulletMultiThreaded. Bullet 3 OpenCL is mainly designed for GPU and OpenCL drivers are unreliable. It would be good to create a new multithreaded version of Bullet keeping in mind Bullet 2 and Bullet 3 OpenCL.
Hi Erwin, I am also very much interested in CPU Support and might be supporting in developping a solution.
My major concerns in developing a CPU MultiThreaded version are the following:
What were the exact reasons for dropping the cpu multithreaded support in bullet? I could reactivate the code in 2.82 and it seemed to scale linearly with the number of cpus available. At least with the Many Box demos.. How about Gimpact concave meshes with this demo? The scaling factors with number of CPU would be perfectly usefull for our applications.
Can you please give me some directions on what to do to find a solution? Thank you Georg.
I'm working on getting Bullet 2 running on multiple CPU threads. So far, I've got the collision detection (dispatchAllPairs) part working. I had to add a couple of locks to the collision dispatcher (around the manifold pool and collision algorithm pool allocs and deallocs). I also had to get rid of the persistent btVoroniSimplexSolver that was being shared across threads and replace it with local versions. So far the changes to Bullet have been pretty minimal.
I'm using TBB as the task scheduler, but I've avoided introducing any TBB-dependency into Bullet libraries. All of the TBB-specific code (which is very little) is residing in the MultiThreadedDemo app (which I've dusted off and stripped out the BulletMultiThreaded parts of).
Also, in order to avoid adding any overhead to the single-threaded version, the locks that I added get compiled out unless a BT_THREADSAFE macro is set to 1.
If anyone is interested I can put this up on GitHub.
I'll put up a fork of it shortly. It is still a work in progress. The lock is only implemented for win32 at the moment. Also, bullet's built-in profiling isn't threadsafe, so I've disabled it. And I haven't tried to set up CMAKE to work with TBB yet.
I did a bit of profiling on my 4-core machine. My test scene has about 1300 capsules on a triangle mesh terrain. Single-threaded, the dispatchAllPairs was taking about 2.2ms, whereas multi-threaded it was taking 0.9ms--that's a speedup of about 2.4x. It seems like there is quite alot of collision algorithm objects being allocated and freed each simulation step (about 1300 in fact). So I think there is contention for the lock on the collision alg pool allocator (which may account for the lackluster speedup). I need to look into this further.
My changes are here: https://github.com/lunkhound/bullet3
Now you should be able to run the demo. Pay attention to the numbers after "collision detection time" on screen. Those are the numbers that should be improved compared to a single-threaded version. To compile a single-threaded version for comparison, edit MultiThreadedDemo.cpp, and change USE_PARALLEL_DISPATCHER to 0.
Oh, and keep in mind that in this particular demo, most of the time is going towards solving constraints, and that part is still completely single threaded.
referenced this issue
May 21, 2015
Dear Erwin, Dear Lunkhound, great that this is finally integrated. Thank you!
One question from your experience, what differences do you see between using openMP, ThreadSupport and TBB? Will there be any differences?
I am working on a little logger to do more testing on different laptops and workstations with the different options.
Maybe I can share some results on this if you like.
You may have to fix things to get it the various multi-threading options working. Good Luck!
If you have some comparison/number for different schedulers, please share (for example running some of the benchmarks in the Bullet/ExampleBrowser BenchmarkDemo)
The PPL option should compile for MSVC 2013 and 2015 as I recall. I never tried it with VC 2010, so it sounds like it only works with newer versions of PPL. I don't really recommend using PPL. Its mainly there as an example of how to implement a task scheduler for Bullet. Performance wasn't great compared to the other options.
The ThreadSupport option is probably the best choice for most cases. It's built-in (to Bullet) so there's no extra setup, and it has the best performance (on the Windows platform at least, according to my testing).
I found IntelTBB to also have good performance, but it's a bit of a hassle to setup since it's an external library.
OpenMP on windows with MSVC works, and is very easy to setup since it is built into the compiler but doesn't perform that great. With other compilers hopefully the performance is better.
Please feel free to share your results with CPU multithreading on this Forum thread