Fft based convolutional layer #544

borisgin · 2014-06-26T14:19:19Z

I implemented Convolutional layer with fft-based Forward() . There is no FFT support in Backward() yet. The implementation is based on FFTW3 library. It was tested both with native FFTW3 and with MKL. Into addition it supports OpenMP to utilize all cores. Was tested with native gcc OpenMP and with MKL.

The current version is CPU-only . Is anybody interested in doing CUDA - version?

My impression, based on current CPU implementation (FFT+openMP), is that FFT-based convolutional layer makes sense only for large kernels (kernel_size / stride >= 7). There are more details on benchmark below:
I modified net_speed-benchmark to test Forward() only, then I took "examples/imagenet" imagenet topology and modified two first convolutonal layers:

batch = 128
stride = 1
kernel = { 5,7, 9,11,13,15}

10 forward iterations
For each kernel I slightly changed crop size in data layers to make map size FFT- friendly (128, 256,..) The results are below (time is seconds for 10 forward iterations):

Layer Kernel Input Output base,sec fft,sec
conv1 15 128x3x242x242 128x96x228x228 79 28
conv2 15 128x96x114x114 128x256x104x104 549 168
----------------------------------------------------------------------------
conv1 13 128x3x244x244 128x96x232x232 58 30
conv2 13 128x96x116x116 128x256x108x108 431 170
----------------------------------------------------------------------------
conv1 11 128x3x246x246 128x96x236x236 44 28
conv2 11 128x96x118x118 128x256x112x112 314 168
---------------------------------------------------------------------------
conv1 9 128x3x248x248 128x96x240x240 33 29
conv2 9 128x96x120x120 128x256x116x116 230 170
--------------------------------------------------------------------------
conv1 7 128x3x250x250 128x96x244x244 23 29
conv2 7 128x96x122x122 128x256x122x122 152 170
-------------------------------------------------------------------------
conv1 5 128x3x252x252 128x96x248x248 16 28
conv2 5 128x96x124x124 128x256x120x120 83 167

shelhamer · 2014-06-26T15:30:15Z

Thanks for your work exploring FFT convolution for CNNs @borisgin. We've been meaning to do this for a while. @forresti since you had an earlier interest in this perhaps you could do a review.

To integrate this a conv layer TYPE parameter should first be added for selecting BLAS vs. FFT convolution.

A complete implementation including GPU should match or improve on the performance reported here: http://benanne.github.io/2014/05/12/fft-convolutions-in-theano.html

p.s. Please rebase against BVLC/dev. Also note your Makefile.config should not be versioned.

borisgin · 2014-06-26T16:03:29Z

Sure, I will add fft = on/off parameter to the layer description. Currently , by default fft is switched dynamically when kernel_size/stride > 5. It can be turned manually on and off on the fly using FFT_on() and FFT_off().
No problem with adding CUDA fft like described in the link.

REBASE ISSUE:
I tried to pull blvc/dev ( git clone https://github.com/BVLC/caffe/ -b dev ) ,
but during compilation I got following error:
In file included from ./include/caffe/vision_layers.hpp:15:0,
from src/caffe/layer_factory.cpp:9:
./include/caffe/data_layers.hpp:11:18: fatal error: lmdb.h: No such file or directory
compilation terminated
When I use borisgin/dev everything is ok. Any suggestion for resolution?

sguada · 2014-06-26T16:22:10Z

We added new functionality and now you need to add the lmdb library

On Thursday, June 26, 2014, Boris Ginzburg notifications@github.com wrote:

I tried to pull blvc/dev ( git clone https://github.com/BVLC/caffe/ -b
dev ) ,
but during compilation I got following error:
In file included from ./include/caffe/vision_layers.hpp:15:0,
from src/caffe/layer_factory.cpp:9:
./include/caffe/data_layers.hpp:11:18: fatal error: lmdb.h: No such file
or directory
compilation terminated

—
Reply to this email directly or view it on GitHub
#544 (comment).

Sergio

borisgin · 2014-06-27T12:56:37Z

Thanks! It worked. Did you observed any speed-up from replacement of leveldb by lmdb?

borisgin · 2014-06-27T15:07:22Z

Hi, I added lmdb lib, and rebased fft branch against BVLC/dev.

luotao1 · 2014-07-01T09:22:11Z

how is the performance compared to your openmp version?

borisgin · 2014-07-01T09:57:58Z

All data above is for Forward() , CPU only. The comparison is (MKL_BLAS + openMP) vs ( FFT + openMP). You can disable OpenMP in Makefile.config.
I also added "fft=0/1" to Convolutional layer parameters so you turn it on/off for each layer.

kloudkl · 2014-07-01T12:23:39Z

Makefile

@@ -195,16 +196,17 @@ ifeq ($(BLAS), mkl)
 	COMMON_FLAGS += -DUSE_MKL
 	MKL_DIR = /opt/intel/mkl
 	BLAS_INCLUDE ?= $(MKL_DIR)/include
-	BLAS_LIB ?= $(MKL_DIR)/lib $(MKL_DIR)/lib/intel64
+	BLAS_LIB ?= $(MKL_DIR)/lib $(MKL_DIR)/lib/intel64 /opt/intel/composer_xe_2013_sp1.2.144/compiler/lib/intel64


It is more convenient to configure MKL_DIR as /opt/intel/composer_xe_2013_sp1.2.144/compiler in the Makefile.config.

borisgin · 2014-07-02T11:56:41Z

Pushed the code with changes, suggested by kloudkl ( many thanks for code review :) ).
I accepted all suggested changes, except FFT flag in Makefile.config: I think that it's simpler to add FFTW3 as dependency, and always include fft.hpp and fft.cpp into project.

bhack · 2014-07-31T10:13:52Z

@soumith @borisgin Could be interesting to benchmark Caffe with this PR when there will be GPU support.

soumith · 2014-07-31T12:36:57Z

@bhack i'd be happy to do so when it's ready.

borisgin · 2014-10-28T14:47:23Z

We restructured FFT convolutonal layer as new engine for convolutional layer, similar to cuDNN. To switch convolutional layer to FFT, one should set engine = FFT in the layer description. For example:
layers {
name: "conv1"
type: CONVOLUTION
...

convolution_param {
num_output: 20
...
engine: FFT
...
}
}
}

Files structure:

FFT -based convolutional layer implementation is in 2 following files: ./src/caffe/layers/conv_layer_fft.cpp and ./src/caffe/layers/conv_layer_fft.cu;
3 additional files are added to connect caffe to fftw3 and cufft: ./src/caffe/util/fft.cpp , ./src/caffe/util/fft.cu, ./include/util/fft.hpp
corresponding tests are in ./src/caffe/test/test_convolution_layer_FFT.cpp.
Code is rebased and ready for merge
Boris and Lior

shelhamer · 2014-10-29T19:57:31Z

@borisgin the FFT engine for convolution will be a great feature but this is not properly rebased. The diff shows 196 files changed and you are missing the Travis CI directory for the automatic PR testing and other changes.

Please rebase on the latest dev, or simply cherry-pick your additions as listed

FFT -based convolutional layer implementation is in 2 following files: ./src/caffe/layers/conv_layer_fft.cpp and ./src/caffe/layers/conv_layer_fft.cu;
3 additional files are added to connect caffe to fftw3 and cufft: ./src/caffe/util/fft.cpp , ./src/caffe/util/fft.cu, ./include/util/fft.hpp
corresponding tests are in ./src/caffe/test/test_convolution_layer_FFT.cpp.

and then push to this PR for review and merge.

corresponding tests are in ./src/caffe/test/test_convolution_layer_FFT.cpp.

These should be rolled into the convolution layer tests like was done with cuDNN.

As a last note, the FFT engine should be optional just as cuDNN is through the use of a compile-time flag and #ifdef guards.

removed reshape() in ForwardFromTo in net.cpp

added #ifdef USE_FFT into all fft-related files

cleaned lint

borisgin · 2014-11-04T13:47:22Z

FFT PR was successfully merged with dev branch

borisgin · 2014-11-05T05:04:48Z

FFT iengine for Convolutional Layer is s implemented similar cuDNN, But it can be turned on for individual layer by setting engine:FFT in the network configuration file.
To build caffe with FFT please uncomment corresponding line in Makefile.config (FFT).
For better performance on CPU we also added OPENMP flag in Makefile.config.
.
INSTALLATION:
This branch requires cufft (if you use CUDA) and FFTW3.To install FFTW3:

download sources http://www.fftw.org/download.html
unzip
3../configure --enable-float --disable-long-double --disable-quad-precision --enable-openmp --disable-fortran --disable-debug-malloc --enable-avx --enable-shared
make
sudo make install
if there are problems with float128 just comment out corresponding line in fftw3.h

emasa · 2015-01-23T17:16:50Z

Perhaps it worth take a look at CNN implementation using FFT - Facebook. Released as part of https://github.com/facebook/fbcunn.

sunbaigui · 2015-01-26T13:07:47Z

same suggestion with @emasa
@borisgin @shelhamer
Recently facebook open sourced a project for speeding up convolution using fft, it's much faster than cuda fft.
Related imformation is below:
paper: http://arxiv.org/pdf/1412.7580.pdf
open source: https://github.com/facebook/fbcunn

borisgin · 2015-01-26T14:47:56Z

Hi,
We have looked at FB ppare, but we haven't compared the performance of our method vs Facebook yet. On the algorithm level they are very similar.

Our implementation is straightforward:

do FFT of weights // weightFFT(f,g) // once per batch
for all images n in the batch {
... for all input feature f {
........ inFFT(f, n ) = FFT ( input (n,f) )
........ for all output features g {
............. outFFT(n,g) += inFFT(n,f) .* weightFFT (f,g); // element-wise matrix multiply
........ }
... }
...for all output features g {
........ out (n,g)= IFFT (outFFT(g,n));
... }
} //end of n

Facebook FFT-based implementation is similar except that they replaced element-wise multiply-acc
with transpose and SGEMM ( cool trick! ).

bhack · 2015-01-26T22:07:47Z

Why this PR was not merged?

melgor · 2015-02-12T11:36:34Z

I think, that is not merged, because it need MKL library (do not work with OpenBLAS and ATLAS).
I would like to use CUDA version, but I do not have MKL, so I does not compile.

Edit: As it was pointed, code does not require MKL. I get wrong branch...

ducha-aiki · 2015-02-12T11:47:41Z

@melgor
It works with OpenBLAS as well.

borisgin · 2015-02-12T13:22:15Z

code does not require MKL, It works with openblas and cuBLAS .

dxj19831029 · 2015-02-28T03:59:27Z

scheduled for next release?

bhack · 2015-06-15T08:39:50Z

Happy pull request birthday @borisgin :)

shelhamer · 2015-08-26T00:12:01Z

While CPU execution can be further optimized this PR is closed since it is against the deprecated dev branch. This branch was not merged at the time due to concerns about further complexity and dependencies. Thanks for your work @borisgin.

Note that cuDNN v3 includes an FFT convolution on the GPU.

bhack · 2015-08-26T00:14:45Z

@naibaf7 It is working also on CPU at #2610

outgrid · 2016-06-12T03:33:41Z

original caffe take 2000ms for forward compution， but the FFT-based caffe takes 98000ms for forward computation (the kernel size mostly 3X3, only one convolution kernel size is 7X7,however the stride is 2);
It seems the fbfft can properly handle the small kernel situations,did anyone implement fbfft or winograd in caffe?

kloudkl mentioned this pull request Jun 30, 2014

Fast Training of Convolutional Networks through FFTs #143

Closed

kloudkl reviewed Jul 1, 2014
View reviewed changes

bhack mentioned this pull request Jul 30, 2014

Try to extract Convolution code from cuda-convnet2 #830

Closed

shelhamer force-pushed the dev branch 3 times, most recently from 4278286 to c01f07a Compare August 28, 2014 07:00

borisgin added 7 commits November 3, 2014 16:43

added support for CPU_ONLY

6a909d9

removed reshape() in ForwardFromTo in net.cpp

added flag FFT := 1 into Makefile.config

d815d60

added #ifdef USE_FFT into all fft-related files

change default FFT = 0 in Makefile.config.example

d8c1394

added BLAS_LIB ?= /opt/OpenBLAS/lib to Makefile

9c781f1

add #ifdef USE_FFT

e90cb1a

cleaned lint

added #ifdef USE_FFT ...

7f556c4

added condition on cufft

3f59adb

bhack mentioned this pull request Feb 27, 2015

Migrate fbfft to accelerate convolutions #1989

Closed

shelhamer added JD JL ES labels Mar 10, 2015

shelhamer mentioned this pull request Aug 25, 2015

Clear all content out of dev except README.md #2081

Closed

shelhamer closed this Aug 26, 2015

aidangomez mentioned this pull request Sep 22, 2015

Add Optional AudioDataLayer and LibSndFile Dependancy #3065

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fft based convolutional layer #544

Fft based convolutional layer #544

borisgin commented Jun 26, 2014

shelhamer commented Jun 26, 2014

borisgin commented Jun 26, 2014

sguada commented Jun 26, 2014

borisgin commented Jun 27, 2014

borisgin commented Jun 27, 2014

luotao1 commented Jul 1, 2014

borisgin commented Jul 1, 2014

kloudkl Jul 1, 2014

borisgin commented Jul 2, 2014

bhack commented Jul 31, 2014

soumith commented Jul 31, 2014

borisgin commented Oct 28, 2014

shelhamer commented Oct 29, 2014

borisgin commented Nov 4, 2014

borisgin commented Nov 5, 2014

emasa commented Jan 23, 2015

sunbaigui commented Jan 26, 2015

borisgin commented Jan 26, 2015

bhack commented Jan 26, 2015

melgor commented Feb 12, 2015

ducha-aiki commented Feb 12, 2015

borisgin commented Feb 12, 2015

dxj19831029 commented Feb 28, 2015

bhack commented Jun 15, 2015

shelhamer commented Aug 26, 2015

bhack commented Aug 26, 2015

outgrid commented Jun 12, 2016 •

edited

Fft based convolutional layer #544

Fft based convolutional layer #544

Conversation

borisgin commented Jun 26, 2014

shelhamer commented Jun 26, 2014

borisgin commented Jun 26, 2014

sguada commented Jun 26, 2014

borisgin commented Jun 27, 2014

borisgin commented Jun 27, 2014

luotao1 commented Jul 1, 2014

borisgin commented Jul 1, 2014

kloudkl Jul 1, 2014

Choose a reason for hiding this comment

borisgin commented Jul 2, 2014

bhack commented Jul 31, 2014

soumith commented Jul 31, 2014

borisgin commented Oct 28, 2014

shelhamer commented Oct 29, 2014

borisgin commented Nov 4, 2014

borisgin commented Nov 5, 2014

emasa commented Jan 23, 2015

sunbaigui commented Jan 26, 2015

borisgin commented Jan 26, 2015

bhack commented Jan 26, 2015

melgor commented Feb 12, 2015

ducha-aiki commented Feb 12, 2015

borisgin commented Feb 12, 2015

dxj19831029 commented Feb 28, 2015

bhack commented Jun 15, 2015

shelhamer commented Aug 26, 2015

bhack commented Aug 26, 2015

outgrid commented Jun 12, 2016 • edited

outgrid commented Jun 12, 2016 •

edited