New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fft based convolutional layer #544
Conversation
Thanks for your work exploring FFT convolution for CNNs @borisgin. We've been meaning to do this for a while. @forresti since you had an earlier interest in this perhaps you could do a review. To integrate this a conv layer A complete implementation including GPU should match or improve on the performance reported here: http://benanne.github.io/2014/05/12/fft-convolutions-in-theano.html p.s. Please rebase against BVLC/dev. Also note your |
Sure, I will add fft = on/off parameter to the layer description. Currently , by default fft is switched dynamically when kernel_size/stride > 5. It can be turned manually on and off on the fly using FFT_on() and FFT_off(). REBASE ISSUE: |
We added new functionality and now you need to add the lmdb library On Thursday, June 26, 2014, Boris Ginzburg notifications@github.com wrote:
Sergio |
Thanks! It worked. Did you observed any speed-up from replacement of leveldb by lmdb? |
Hi, I added lmdb lib, and rebased fft branch against BVLC/dev. |
how is the performance compared to your openmp version? |
All data above is for Forward() , CPU only. The comparison is (MKL_BLAS + openMP) vs ( FFT + openMP). You can disable OpenMP in Makefile.config. |
@@ -195,16 +196,17 @@ ifeq ($(BLAS), mkl) | |||
COMMON_FLAGS += -DUSE_MKL | |||
MKL_DIR = /opt/intel/mkl | |||
BLAS_INCLUDE ?= $(MKL_DIR)/include | |||
BLAS_LIB ?= $(MKL_DIR)/lib $(MKL_DIR)/lib/intel64 | |||
BLAS_LIB ?= $(MKL_DIR)/lib $(MKL_DIR)/lib/intel64 /opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more convenient to configure MKL_DIR as /opt/intel/composer_xe_2013_sp1.2.144/compiler in the Makefile.config.
Pushed the code with changes, suggested by kloudkl ( many thanks for code review :) ). |
@bhack i'd be happy to do so when it's ready. |
4278286
to
c01f07a
Compare
We restructured FFT convolutonal layer as new engine for convolutional layer, similar to cuDNN. To switch convolutional layer to FFT, one should set engine = FFT in the layer description. For example: convolution_param { Files structure: FFT -based convolutional layer implementation is in 2 following files: ./src/caffe/layers/conv_layer_fft.cpp and ./src/caffe/layers/conv_layer_fft.cu; |
@borisgin the FFT engine for convolution will be a great feature but this is not properly rebased. The diff shows 196 files changed and you are missing the Travis CI directory for the automatic PR testing and other changes. Please rebase on the latest dev, or simply cherry-pick your additions as listed
and then push to this PR for review and merge.
These should be rolled into the convolution layer tests like was done with cuDNN. As a last note, the FFT engine should be optional just as cuDNN is through the use of a compile-time flag and |
removed reshape() in ForwardFromTo in net.cpp
added #ifdef USE_FFT into all fft-related files
cleaned lint
FFT PR was successfully merged with dev branch |
FFT iengine for Convolutional Layer is s implemented similar cuDNN, But it can be turned on for individual layer by setting engine:FFT in the network configuration file.
|
Perhaps it worth take a look at CNN implementation using FFT - Facebook. Released as part of https://github.com/facebook/fbcunn. |
same suggestion with @emasa |
Hi, Our implementation is straightforward:
Facebook FFT-based implementation is similar except that they replaced element-wise multiply-acc |
Why this PR was not merged? |
I think, that is not merged, because it need MKL library (do not work with OpenBLAS and ATLAS). Edit: As it was pointed, code does not require MKL. I get wrong branch... |
@melgor |
code does not require MKL, It works with openblas and cuBLAS . |
scheduled for next release? |
Happy pull request birthday @borisgin :) |
While CPU execution can be further optimized this PR is closed since it is against the deprecated dev branch. This branch was not merged at the time due to concerns about further complexity and dependencies. Thanks for your work @borisgin. Note that cuDNN v3 includes an FFT convolution on the GPU. |
original caffe take 2000ms for forward compution, but the FFT-based caffe takes 98000ms for forward computation (the kernel size mostly 3X3, only one convolution kernel size is 7X7,however the stride is 2); |
I implemented Convolutional layer with fft-based Forward() . There is no FFT support in Backward() yet. The implementation is based on FFTW3 library. It was tested both with native FFTW3 and with MKL. Into addition it supports OpenMP to utilize all cores. Was tested with native gcc OpenMP and with MKL.
The current version is CPU-only . Is anybody interested in doing CUDA - version?
My impression, based on current CPU implementation (FFT+openMP), is that FFT-based convolutional layer makes sense only for large kernels (kernel_size / stride >= 7). There are more details on benchmark below:
I modified net_speed-benchmark to test Forward() only, then I took "examples/imagenet" imagenet topology and modified two first convolutonal layers:
batch = 128
stride = 1
kernel = { 5,7, 9,11,13,15}
10 forward iterations
For each kernel I slightly changed crop size in data layers to make map size FFT- friendly (128, 256,..) The results are below (time is seconds for 10 forward iterations):