-
Notifications
You must be signed in to change notification settings - Fork 548
Use mkl gemm batch in CPU backend when appropriate #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
27c7ba0
to
ed7c846
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. Have you done a minimum of benchmark to check there is a minimum of speedup for the usecase Intel is advertizing ?
@WilliamTambellini Yes, I am working on benchmarks. Once, I have them in presentable format, I will share them here. |
ed7c846
to
0859249
Compare
0859249
to
57eb26d
Compare
Sorry about the delay but we wanted to make sure the changes are indeed giving the performance improvements we expected it to. Turns out, if we Intel TBB as threading solution, MKL batch call wasn't giving any performance improvements for some reason. Once we switched the threading to be Given below is the link to the interactive chart that shows the numbers we have for batched operations. It is by no means a comprehensive set of benchmarks but serves as an confirmation of the potential performance improvements and issue with TBB. Update: |
looks like batched matmul failed on linux-e5, I will look into it. |
Hum very interesting. Congrats. |
We will be defaulting to |
48159fc
to
67900f5
Compare
The current changes to FindMKL seems to work on both Linux and Windows. However, OSX still can't find OpenMP, looking into it. Update: Fixed this on OSX too. Turns out, Intel MKL installation on OSX doesn't have |
998426f
to
a334db2
Compare
Update: |
10e3611
to
e5e11d0
Compare
e5e11d0
to
6b2669f
Compare
@umar456 I am not sure how to proceed, the blas_cpu test is failing via ci on |
6b2669f
to
febc154
Compare
caba038
to
3d93e83
Compare
3d93e83
to
a6e9bfb
Compare
2d304c1
to
7171bc8
Compare
7171bc8
to
36f7120
Compare
36f7120
to
566fc3b
Compare
7ddd69f
to
c1fdc7f
Compare
c1fdc7f
to
bae596d
Compare
Default to Intel OpenMP on Linux, OSX and Windows Intel TBB is not giving speed up as expected as of now. Hence, switching to OpenMP as thread layer. * GNU OpenMP causing some opencl tests to fail on certain debian configurations. Hence, choosing Intel OpenMP. * GNU OpenMP related mkl_gnu_thread library is not installed on OSX's Intel MKL Installation, so on OSX only option is Intel OpenMP. * Windows OpenMP support is lacking behind by so many versions, so we are using Intel OpenMP on Windows too.
bae596d
to
8c06d90
Compare
@@ -22,6 +22,9 @@ include(CheckCXXCompilerFlag) | |||
|
|||
arrayfire_set_cmake_default_variables() | |||
|
|||
#Set Intel OpenMP as default MKL thread layer | |||
set(MKL_THREAD_LAYER "Intel OpenMP" CACHE STRING "The thread layer to choose for MKL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go in the FindMKL.cmake file.
Resolves #2146