-
Notifications
You must be signed in to change notification settings - Fork 64
PoC: multithreaded chunked model CPU computation #1837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d250969
to
8c6aa41
Compare
@slipher here is my chunked variant, but it doesn't spawn more than one thread. Actually, it could be possible to write a custom dispatch function (I don't know how to do it). |
I added a commit that uses standard threading functions instead of I added a logger to check things and things look correct. I don't know what is missing. |
Hmm, one drawback of doing it that way, is that unlike what does OpenMP, threads are not reused and then profilers like Orbit list thousands of threads and not only that amount is crazy to list, but profiling is just meh because computed statistics are for each thread separately. It would be cool to be able to reuse those threads. |
But at least, if we could get the current implementation working that would be a start. |
50dccf8
to
f976903
Compare
Hmm, actually, it seems to work, with my custom thread start. It's just so inefficient that performance drops like if nothing was done. When switching from 1 thread to 2 I see a performance difference. It's just so bad compared to OpenMP. But now, I don't get why this chunked implementation doesn't work with OpenMP. |
I got it working with OpenMP, I had to use another syntax (which in fact skips my useless vector trick). I now get |
With this implementation we are now as fast in the CPU code than when using the GPU code running on CPU with the llvmpipe software renderer. With |
The experiment was a success. I close this and will submit a completed and cleaned-up branch later. |
Chunked version of:
For unknown reasons, only one thread is spawned.