Parallelized AdjArrayBQM constructor from dense array#725
Conversation
| sum_counters += prev_counter; | ||
| } | ||
|
|
||
| // TODO : This is the bottleneck for moderately dense input arrays. |
There was a problem hiding this comment.
@arcondello @shpface , I could not find a straightforward way of removing this bottleneck. Tried some approaches which work for vectors of primitive elements but not pairs. Let me know if you have any idea. Note I cannot reserve and pushback in parallel in the code below. This is a problem for very large dense matrices. Otherwise fine. With this bottleneck we get 10X speedup on 10 cores for dense matrices. Without would give us (20-40X) speedup though.
|
@shpface @arcondello This is ready for review. |
Codecov Report
@@ Coverage Diff @@
## master #725 +/- ##
=======================================
Coverage 91.93% 91.93%
=======================================
Files 63 63
Lines 4552 4552
=======================================
Hits 4185 4185
Misses 367 367 Continue to review full report at Codecov.
|
784e82c to
b38bb26
Compare
f1f95ba to
d6252cc
Compare
| } | ||
|
|
||
| // The aligned_malloc and aligned_free functions were written with the help of this link: | ||
| // https://stackoverflow.com/questions/38088732/explanation-to-aligned-malloc-implementation |
|
No longer relevant, see #788 |
Parallelized AdjArrayBQM constructor from dense array. Threads in parallel first calculate the size
of the resulting array and indices where they should start writing to. Then in the second phase they
write into the BQM. The bottleneck in this method is calling the resize function on the vector of the
BQM as resize() also initializes the data, this cannot be circumvented without changing data structure
or using custom vectors/allocators. It is mentioned as a TODO if improvements to standard
vector::resize function is made.