Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MXNet slowdown -- OpenBLAS hidden by another BLAS library again? #5528

Closed
hqucms opened this issue Feb 10, 2020 · 11 comments · Fixed by #5540
Closed

MXNet slowdown -- OpenBLAS hidden by another BLAS library again? #5528

hqucms opened this issue Feb 10, 2020 · 11 comments · Fixed by #5540
Assignees

Comments

@hqucms
Copy link
Contributor

hqucms commented Feb 10, 2020

Similar as cms-sw/cmssw#25230 (comment). MXNet is slow in the latest IB while using LD_PRELOAD to load OpenBLAS recovers the speed.

A quick search in cmsdist seems to point to https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_11_1_X/master/photospline.spec#L19-L20 and https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_11_1_X/master/suitesparse.spec#L10. Could we change them to build w/ OpenBLAS?

Actually should we consider removing lapack? OpenBLAS contains a full lapack package w/ some of the functions optimized, so it seems that the plain lapack package should not be needed anymore.

@cmsbuild
Copy link
Contributor

A new Issue was created by @hqucms Huilin Qu.

@Dr15Jones, @smuzaffar, @silviodonato, @makortel, @davidlange6, @fabiocos can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mrodozov
Copy link
Contributor

I'll remove lapack but would you write me a short example of how to test the performance ?

@smuzaffar
Copy link
Contributor

@hqucms , both of these tools (photospline and suitesparse) are not part of cmssw distribution. None of our externals are using lapack so yes we can drop it. I think the issue is with GSL BLAS which is linked in many externals and CMSSW libs.

@mrodozov , please make a PR to drop photospline and suitesparse

@hqucms
Copy link
Contributor Author

hqucms commented Feb 12, 2020

I'll remove lapack but would you write me a short example of how to test the performance ?

@mrodozov Thank you very much!

Currently we don't have any module in CMSSW using MXNet. For performance test maybe you can use cms-sw/cmssw#28902 + cms-data/RecoBTag-Combined#26 and run:

cmsRun $CMSSW_BASE/src/RecoBTag/MXNet/test/test_particle_net_cfg.py

When OpenBLAS is used one should get something similar to

TimeReport   0.029008     0.029008     0.029008  pfParticleNetJetTags

instead of

TimeReport   0.139752     0.139752     0.139752  pfParticleNetJetTags

@hqucms
Copy link
Contributor Author

hqucms commented Feb 12, 2020

@hqucms , both of these tools (photospline and suitesparse) are not part of cmssw distribution. None of our externals are using lapack so yes we can drop it. I think the issue is with GSL BLAS which is linked in many externals and CMSSW libs.

@smuzaffar You are right:

ld.log.816:       816:  file=libgslcblas.so.0 [0];  needed by /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_1_X_2020-02-07-1100/external/slc7_amd64_gcc820/lib/libMathMore.so [0]

ld.log.816:       816:  file=/cvmfs/cms-ib.cern.ch/nweek-02614/slc7_amd64_gcc820/external/gsl/2.2.1-bcolbf/lib/libgslcblas.so.0.0.0 [0];  dynamically loaded by /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_1_X_2020-02-07-1100/external/slc7_amd64_gcc820/lib/libCling.so [0]

ld.log.816:       816:  binding file /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_1_X_2020-02-07-1100/external/slc7_amd64_gcc820/lib/libmxnet.so [0] to /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_1_X_2020-02-07-1100/external/slc7_amd64_gcc820/lib/libgslcblas.so.0 [0]: normal symbol `cblas_sgemm'

And

> ldd /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_1_X_2020-02-07-1100/external/slc7_amd64_gcc820/lib/libMathMore.so
        linux-vdso.so.1 =>  (0x00007ffe313ea000)
        libMathCore.so => /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_1_X_2020-02-07-1100/external/slc7_amd64_gcc820/lib/libMathCore.so (0x00007f488617c000)
        libgsl.so.19 => /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_1_X_2020-02-07-1100/external/slc7_amd64_gcc820/lib/libgsl.so.19 (0x00007f4885f1a000)
        libgslcblas.so.0 => /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc820/cms/cmssw-patch/CMSSW_11_1_X_2020-02-07-1100/external/slc7_amd64_gcc820/lib/libgslcblas.so.0 (0x00007f48864cf000)
...

Looks like we need to make ROOT use openblas instead of gslblas?

@smuzaffar
Copy link
Contributor

problem is now cmake GSL module which explicitly search for gslcblas [a]. I have fixed it by setting GSL_CBLAS_LIBRARY pointing to openblas lib. I will make a PR with this change + cleanup of lapack.

[a]

find_library( GSL_CBLAS_LIBRARY
  NAMES gslcblas cblas
  HINTS ${GSL_ROOT_DIR}/lib ${GSL_LIBDIR}
  PATH_SUFFIXES Release Debug
)

@smuzaffar
Copy link
Contributor

#5540 should fix the issue

@hqucms
Copy link
Contributor Author

hqucms commented Feb 12, 2020

Thank you very much, @smuzaffar !
Can we have this backported to 10_2_X and 10_6_X once this is successfully integrated for 11_1_X?

@smuzaffar
Copy link
Contributor

@mrodozov , can you please backport #5540 for 10.2.X, 10.6.X and 11.0.X?

@slava77
Copy link
Contributor

slava77 commented Mar 4, 2020

@mrodozov , can you please backport #5540 for 10.2.X, 10.6.X and 11.0.X?

I wanted to check if this was done.
I see that #5549 tagged this issue and was merged in 11_0_X.
What about the older releases?

@mrodozov
Copy link
Contributor

mrodozov commented Mar 6, 2020

@slava77
in 10.6 and 10.2
https://github.com/cms-sw/cmsdist/pulls?q=is%3Apr+is%3Aclosed+author%3Amrodozov+Use+OpenBlas+instead+of+GSL+Blas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants