Use mkl gemm batch in CPU backend when appropriate #2206

9prady9 · 2018-06-20T10:49:29Z

Resolves #2146

WilliamTambellini

Cool, thanks. Have you done a minimum of benchmark to check there is a minimum of speedup for the usecase Intel is advertizing ?

9prady9 · 2018-06-20T16:41:48Z

@WilliamTambellini Yes, I am working on benchmarks. Once, I have them in presentable format, I will share them here.

9prady9 · 2018-07-21T06:03:10Z

Sorry about the delay but we wanted to make sure the changes are indeed giving the performance improvements we expected it to. Turns out, if we Intel TBB as threading solution, MKL batch call wasn't giving any performance improvements for some reason. Once we switched the threading to be GNU OpenMP, we started seeing good performance improvements.

Given below is the link to the interactive chart that shows the numbers we have for batched operations.
https://docs.google.com/spreadsheets/d/e/2PACX-1vRfKVz-VV9qGae7fLC3wYHtvNROwgfZ9yI7mCDTvxXJ2wqV8ibtrp1BVxykIz9nTVlCg5ouRvfd1hFN/pubchart?oid=1109307214&format=interactive

It is by no means a comprehensive set of benchmarks but serves as an confirmation of the potential performance improvements and issue with TBB.

Update:
Raised an issue in intel community forums regarding the same.
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/783212

9prady9 · 2018-07-21T16:40:35Z

looks like batched matmul failed on linux-e5, I will look into it.

WilliamTambellini · 2018-07-21T16:53:33Z

Hum very interesting. Congrats.
@umar456 I am aware about another drawback with TBB backend : it seems impossible to finely select the number of cpu cores to be used by afcpu. With OpenMP, just setting OMP_NUM_THREADS was at least working with AF 3.5. I know afcpu 3.6.1 has been built with TBB backend but would it be possible for afcpu 3.6.2 to switch back to OpenMP (gnu or intel, your choice) ?

9prady9 · 2018-07-24T06:34:57Z

We will be defaulting to GNU OpenMP on non-Windows platforms and Intel OpenMP on windows.

9prady9 · 2018-07-30T08:44:48Z

The current changes to FindMKL seems to work on both Linux and Windows. However, OSX still can't find OpenMP, looking into it.

Update: Fixed this on OSX too. Turns out, Intel MKL installation on OSX doesn't have mkl_gnu_thread installed. It has only mkl_intel_thread and mkl_tbb_thread. Therefore, I have made changes so that threading defaults to Intel OpenMP on Windows and Apple, and GNU OpenMP on Linux.

9prady9 · 2018-07-31T05:32:57Z

linux-cpu-e5(given below are the CPU details) seems be giving a slightly higher error value of 9.15527e-05 compared to the expected 1e-05. I have changed the expected value to be 1e-04, @umar456 let me know you don't like that.

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel Core Processor (Broadwell, IBRS)
siblings	: 4
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch arat xsaveopt invpcid_single tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap

Update:
Quadro M6000 is also generating error values in the range 0.0001* compared to the earlier expected 1e-05.

9prady9 · 2018-08-09T08:49:34Z

@umar456 I am not sure how to proceed, the blas_cpu test is failing via ci on linux-cpu-e5 with Intel OpenMP, however it doesn't fail when I run it manually from shell - exit code is also zero. This is the only blocker on this PR at the moment. All other jobs seem to be running fine with Intel OpenMP as thread layer.

CMakeModules/FindMKL.cmake

CMakeModules/platform.cmake

src/backend/cpu/blas.cpp

Addressed feedback

Default to Intel OpenMP on Linux, OSX and Windows Intel TBB is not giving speed up as expected as of now. Hence, switching to OpenMP as thread layer. * GNU OpenMP causing some opencl tests to fail on certain debian configurations. Hence, choosing Intel OpenMP. * GNU OpenMP related mkl_gnu_thread library is not installed on OSX's Intel MKL Installation, so on OSX only option is Intel OpenMP. * Windows OpenMP support is lacking behind by so many versions, so we are using Intel OpenMP on Windows too.

umar456 · 2018-12-27T15:21:38Z

CMakeLists.txt

@@ -22,6 +22,9 @@ include(CheckCXXCompilerFlag)

 arrayfire_set_cmake_default_variables()

+#Set Intel OpenMP as default MKL thread layer
+set(MKL_THREAD_LAYER "Intel OpenMP" CACHE STRING "The thread layer to choose for MKL")


This should go in the FindMKL.cmake file.

9prady9 added perf CPU labels Jun 20, 2018

9prady9 requested a review from umar456 June 20, 2018 10:49

9prady9 force-pushed the cpu_batched_matmul_mkl branch from 27c7ba0 to ed7c846 Compare June 20, 2018 10:54

WilliamTambellini reviewed Jun 20, 2018

View reviewed changes

9prady9 force-pushed the cpu_batched_matmul_mkl branch from ed7c846 to 0859249 Compare July 11, 2018 04:29

9prady9 added the in-progress label Jul 13, 2018

9prady9 force-pushed the cpu_batched_matmul_mkl branch from 0859249 to 57eb26d Compare July 21, 2018 05:37

9prady9 removed the in-progress label Jul 21, 2018

9prady9 force-pushed the cpu_batched_matmul_mkl branch 6 times, most recently from 48159fc to 67900f5 Compare July 30, 2018 07:42

9prady9 force-pushed the cpu_batched_matmul_mkl branch 3 times, most recently from 998426f to a334db2 Compare July 31, 2018 04:54

9prady9 force-pushed the cpu_batched_matmul_mkl branch 5 times, most recently from 10e3611 to e5e11d0 Compare August 7, 2018 10:54

9prady9 force-pushed the cpu_batched_matmul_mkl branch from e5e11d0 to 6b2669f Compare August 9, 2018 07:12

9prady9 force-pushed the cpu_batched_matmul_mkl branch from 6b2669f to febc154 Compare August 26, 2018 14:43

9prady9 force-pushed the cpu_batched_matmul_mkl branch 2 times, most recently from caba038 to 3d93e83 Compare October 24, 2018 04:46

9prady9 force-pushed the cpu_batched_matmul_mkl branch from 3d93e83 to a6e9bfb Compare November 11, 2018 01:56

9prady9 force-pushed the cpu_batched_matmul_mkl branch 2 times, most recently from 2d304c1 to 7171bc8 Compare November 24, 2018 03:32

9prady9 force-pushed the cpu_batched_matmul_mkl branch from 7171bc8 to 36f7120 Compare November 29, 2018 06:45

9prady9 force-pushed the cpu_batched_matmul_mkl branch from 36f7120 to 566fc3b Compare December 12, 2018 12:37

9prady9 force-pushed the cpu_batched_matmul_mkl branch 2 times, most recently from 7ddd69f to c1fdc7f Compare December 24, 2018 18:06

umar456 previously requested changes Dec 25, 2018

View reviewed changes

CMakeModules/FindMKL.cmake Outdated Show resolved Hide resolved

CMakeModules/platform.cmake Outdated Show resolved Hide resolved

src/backend/cpu/blas.cpp Outdated Show resolved Hide resolved

src/backend/cpu/blas.cpp Show resolved Hide resolved

9prady9 force-pushed the cpu_batched_matmul_mkl branch from c1fdc7f to bae596d Compare December 26, 2018 06:40

9prady9 force-pushed the cpu_batched_matmul_mkl branch from bae596d to 8c06d90 Compare December 27, 2018 09:42

9prady9 requested a review from umar456 December 27, 2018 09:46

umar456 approved these changes Dec 27, 2018

View reviewed changes

9prady9 merged commit 5cc9967 into arrayfire:master Dec 27, 2018

9prady9 deleted the cpu_batched_matmul_mkl branch December 27, 2018 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use mkl gemm batch in CPU backend when appropriate #2206

Use mkl gemm batch in CPU backend when appropriate #2206

Uh oh!

9prady9 commented Jun 20, 2018 •

edited

Loading

Uh oh!

WilliamTambellini left a comment

Uh oh!

9prady9 commented Jun 20, 2018

Uh oh!

9prady9 commented Jul 21, 2018 •

edited

Loading

Uh oh!

9prady9 commented Jul 21, 2018 •

edited

Loading

Uh oh!

WilliamTambellini commented Jul 21, 2018

Uh oh!

9prady9 commented Jul 24, 2018

Uh oh!

9prady9 commented Jul 30, 2018 •

edited

Loading

Uh oh!

9prady9 commented Jul 31, 2018 •

edited

Loading

Uh oh!

9prady9 commented Aug 9, 2018 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

umar456 Dec 27, 2018

Uh oh!

Uh oh!

Use mkl gemm batch in CPU backend when appropriate #2206

Use mkl gemm batch in CPU backend when appropriate #2206

Uh oh!

Conversation

9prady9 commented Jun 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WilliamTambellini left a comment

Choose a reason for hiding this comment

Uh oh!

9prady9 commented Jun 20, 2018

Uh oh!

9prady9 commented Jul 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

9prady9 commented Jul 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WilliamTambellini commented Jul 21, 2018

Uh oh!

9prady9 commented Jul 24, 2018

Uh oh!

9prady9 commented Jul 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

9prady9 commented Jul 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

9prady9 commented Aug 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

umar456 Dec 27, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

9prady9 commented Jun 20, 2018 •

edited

Loading

9prady9 commented Jul 21, 2018 •

edited

Loading

9prady9 commented Jul 21, 2018 •

edited

Loading

9prady9 commented Jul 30, 2018 •

edited

Loading

9prady9 commented Jul 31, 2018 •

edited

Loading

9prady9 commented Aug 9, 2018 •

edited

Loading