-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add steps to install multi-threaded OpenBLAS on Ubuntu #80
Conversation
Are you sure when using boost-eigen, you are compiling with multi-thread enabled? boost-eigen naturally comes with multithreaded gemm, which would probably account for most of the gain you are observing. |
To make it clear whether OpenBLAS or Eigen contributed to the performance improvements in the boost-eigen branch, three groups of benchmark experiments with different compilation flags are conducted using the lenet*.prototxt. For all the experiments, max iter is set to 3,000 and solver_mode is 0.
To check the effects of threads number, three runtime environment variables combinations are tested.
In all the experiments, max iter is set to 3,000 and solver_mode is set to 0 in lenet_solver.prototxt.
Comparing the results of compilation flags 1 and 3, it is evident that the multi-threaded OpenBLAS runs about 5 times faster than the normal ATLAS. The similar performances of compilation flags 2 and 3 prove that enabling OpenMP for Eigen does not help at all in this setting. |
I still do not think you are using the multithreaded version of eigen3. https://plafrim.bordeaux.inria.fr/doku.php?id=people:guenneba it would be extremely unlikely that eigen itself is bad in multithreading. Again, using lenet is not a good idea to benchmark things, use Yangqing On Fri, Feb 7, 2014 at 10:39 AM, kloudkl notifications@github.com wrote:
|
I'd like to make my arguments clear: (1) I am not comparing ATLAS with OpenBLAS - it is known that ATLAS is (2) small datasets like MNIST does not reflect actual use cases such as Yangqing On Fri, Feb 7, 2014 at 10:44 AM, Yangqing Jia jiayq84@gmail.com wrote:
|
I looked at the code more closely and now I have a little clearer picture on what caused this. in caffe/util/math_functions.cpp the gemm calls are still made using cblas_gemm instead of the eigen function, making the framework effectively still using atlas rather than eigen to carry out gemm. I will close this issue and open a separate issue indicating this necessary change for boost-eigen. If you would like to do a more detailed comparison please feel free to. Thanks for finding this bug! |
Thank you for all this benchmarking work! |
INSTALL.md has been replaced with a pointer to the online installation documentation to avoid the overhead of duplication, so refer to #81. |
This statement is categorically false: "it is known that ATLAS is inherently single-threaded." ATLAS has been threaded for 5+ years http://math-atlas.sourceforge.net/faq.html#tnum |
Add cudnn v4 batch normalization integration
* Fix boost shared_ptr issue in python interface * Default output model name for bn convert style script * Fix bugs in generation bn inference model * Script to convert inner product to convolution * Script to do polyak averaging
standardize memory optimization configurations * yjxiong/fix/mem_config: take care of share data with excluded blob improvise memory opt configs fix cudnn conv legacy bug (BVLC#96) add TOC Update README.md Update README.md (BVLC#95) Update README.md Improve the python interface (BVLC#80) Update README.md
…caffe into imagenet_vid_2016 * 'imagenet_vid_2016' of https://github.com/myfavouritekk/caffe: take care of share data with excluded blob Revert "Fix a but when setting no_mem_opt: true for layers near in-place layers." improvise memory opt configs fix cudnn conv legacy bug (BVLC#96) add TOC Update README.md Update README.md (BVLC#95) Update README.md Improve the python interface (BVLC#80) Update README.md
Multi-threaded OpenBLAS makes a huge performance difference. The benchmarks with and without it in comments to #16 demonstrated more than 5 times speed-up for boost-eigen and MKL on a machine with 4 Hyper-Threading CPU cores (supporting 8 threads).
This fixes #79.