This is regarding the TBB mulithreading enhancement.
While working on ITKRANSAC module I found that a locally build package (AMD system)
without TBB module support is around 10 times faster than the pip package obtained from Github repo.
The difference is the presence of TBB module.
On building the module locally with TBB support I see similar slow results.
Is there a recommended way to handle this such that the optimized way to perform multithreading is selected?
I am using MultiThreaderBase as following:
itk::MultiThreaderBase::SetGlobalDefaultNumberOfThreads(this->numberOfThreads);
itk::MultiThreaderBase::Pointer threader = itk::MultiThreaderBase::New();
As per my reading, I found that the TBB module is independent of the processor but I am getting slow results on my system.
Another observation I found is that even with a lesser number of threads I see all the CPU cores getting utilized when using the non-TBB package. And while using the TBB module I see the correct number of cores getting used.
Is there some optimization happening under the hood in AMD system?
Description
Impact analysis
Expected behavior
Actual behavior
Versions
Environment
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal
AMD Ryzen 7 5800 8-Core Processor
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Additional Information
This is regarding the TBB mulithreading enhancement.
While working on ITKRANSAC module I found that a locally build package (AMD system)
without TBB module support is around 10 times faster than the pip package obtained from Github repo.
The difference is the presence of TBB module.
On building the module locally with TBB support I see similar slow results.
Is there a recommended way to handle this such that the optimized way to perform multithreading is selected?
I am using MultiThreaderBase as following:
As per my reading, I found that the TBB module is independent of the processor but I am getting slow results on my system.
Another observation I found is that even with a lesser number of threads I see all the CPU cores getting utilized when using the non-TBB package. And while using the TBB module I see the correct number of cores getting used.
Is there some optimization happening under the hood in AMD system?
Description
Impact analysis
Expected behavior
Actual behavior
Versions
Environment
Additional Information