-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace atlas/cblas routines with Eigen in the math functions #85
Conversation
- examples, test and pycaffe compile without problem (matcaffe not tested) - tests show some errors (on cpu gradient tests), to be investigated - random generators need to be double checked - mkl commented code needs to be removed
Replace MKL with Boost+Eigen3 * commit '70c4320e436f92d0963b2622d20c7435b2f07f30': Fix test_data_layer segfault by adding destructor to join pthread Fix math funcs, add tests, change Eigen Map to unaligned for lrn_layer Fix test stochastic pooling stepsize/threshold to be same as max pooling Fixed FlattenLayer Backward_cpu/gpu have no return value Fixed uniform distribution upper bound to be inclusive Add python scripts to install dependent development libs * commit '9a7d022652d65f44bebc97576a3b4f1b5e559748': Fix test_data_layer segfault by adding destructor to join pthread Fix math funcs, add tests, change Eigen Map to unaligned for lrn_layer Fix test stochastic pooling stepsize/threshold to be same as max pooling Fixed FlattenLayer Backward_cpu/gpu have no return value Fixed uniform distribution upper bound to be inclusive * commit '958f038e9e0b1b1c0c62b9119b323f4d62a3832a': Fix test_data_layer segfault by adding destructor to join pthread Fix math funcs, add tests, change Eigen Map to unaligned for lrn_layer Fix test stochastic pooling stepsize/threshold to be same as max pooling Fixed FlattenLayer Backward_cpu/gpu have no return value Fixed uniform distribution upper bound to be inclusive
Compile errors in boost-eigen branch
previously filled in all NaNs for me, making many tests fail)
make compatible with boost 1.46 and 1.55
fix bernoulli random number generation
Consider the discussion for removing Eigen and relying on OpenBLAS alone #81 (comment)). If this PR is still desired, please rebase for a clean merge. Note that |
If you guys don't mind, please hold on changes regarding #84 and #85 and kindly join the discussion in #81 - let's collectively decide which way to go for removing MKL dependency. My personal feeling is that having caffe simply depend on blas (with vsl functions written customly) instead of eigen, and then linking against multiple backend libraries (atlas, openblas, mkl) seems the right way based on @kloudkl 's analysis. |
I agree that it is more beneficial to unify the code base. After the benchmarks, I had a better understanding of why the standard BLAS was created. Issue #54 should be closed too. |
Great work @kloudkl ! |
This pull request meets the requirement of issue #84. Layerwise runtime analysis of imagenet.prototxt using @sguada's detailed net_speed_benchmark from #83 was a little disappointing in terms of Eigen's performance.
Table 1. Training time analysis of different BLAS library in the CPU mode and in the GPU mode. All time measured in seconds. Batch size in the CPU mode is 256. The data of the GPU mode is for illustrative purpose for issue #3 only.
Even when running with the max number of physical cores (4 on my machine), Eigen is still much slower than OpenBLAS and MKL. It requires expert knowledge of how Eigen evaluates expressions to fully exploit its high performance. In contrast, OpenBLAS is a very low-hanging fruit. You only need to install the multi-threaded package (#80, #81) and link it with your application (#82). Then everything works like a charm. MKL also does a good job but at a high price.
I would appreciate any other independent benchmark very much.
Eigen (boost-eigen branch head + this PR)
OMP_NUM_THREADS=4 ../build/examples/net_speed_benchmark.bin local_imagenet.prototxt 1 CPU
OpenBLAS (boost-eigen branch head & git cherry-pick 969d0ab)
MKL (master head)
GPU mode
../build/examples/net_speed_benchmark.bin local_imagenet.prototxt 256 GPU