You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenBLAS DGEMM achieves high efficiency, for example, over 90% of peak performance with 1 thread on Graviton3E, but the efficiency drops to about 73% when running DGEMM with 64 threads.
As is known, it is becoming difficult to keep high efficiency for multi-thread execution on recent many-core CPUs, even if high-performance kernels are implemented for single-thread execution.
I am considering to adjust the shape of the submatrix handled by each thread by modifying 2D thread distribution.
I would appreciate it if you could let me know if you have any suggestions.
The text was updated successfully, but these errors were encountered:
OpenBLAS DGEMM achieves high efficiency, for example, over 90% of peak performance with 1 thread on Graviton3E, but the efficiency drops to about 73% when running DGEMM with 64 threads.
As is known, it is becoming difficult to keep high efficiency for multi-thread execution on recent many-core CPUs, even if high-performance kernels are implemented for single-thread execution.
I am considering to adjust the shape of the submatrix handled by each thread by modifying 2D thread distribution.
I would appreciate it if you could let me know if you have any suggestions.
The text was updated successfully, but these errors were encountered: