Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
ND convolution with im2col #2049
Conversation
shelhamer
added the
ES
label
Mar 7, 2015
barhomi
commented
Mar 31, 2015
|
Is this using cudnn v2 as a backend for the Nd-Conv? if that's the case, I think nvidia's nd-conv (only 3D for now) is not as tuned as their 2D conv, from the release notes: "As a BETA preview in this release, the convolution forward, convolution |
|
No, this doesn't touch CuDNN, it only generalizes the im2col convolution implementation (which predates CuDNN). |
barhomi
commented
Mar 31, 2015
|
ah, sorry! my bad On Tue, Mar 31, 2015 at 5:34 PM, Jeff Donahue notifications@github.com
Youssef Barhomi |
wkal
commented
Apr 1, 2015
|
Could you provide a demo to show how to use it? Otherwise, there will be huge high learning curve to test and use your work. Thanks! |
This was referenced Apr 8, 2015
avalada
commented
May 12, 2015
|
@jeffdonahue Does this also support 1D conv? |
|
@avalada yes, any N >= 0 should theoretically be supported. In practice, 0D convolution -- scalar multiplication -- probably doesn't work, but should and would make a great unit test. I expect 1-10D convolution to work out of the box with this; >10 won't work on GPU -- you'd have to add your case to the switch statements in 718802e. Also, 1D convolution is supported by the current implementation as well; just set either the width or height to 1. Theoretically, doing 1D convolution using an ND implementation could/should be more efficient than using a 2D implementation with a singleton dim, but with the apparently large overhead in the 2D case, I would be surprised if that's the case here -- you're probably better off sticking with the existing 2D implementation. (But I'd be very interested to know the comparison if you decide to benchmark.) |
|
Hey Jeff, Is there any chance you could link to an example prototxt making use of this pull request? It would be nice to have that to get started. |
jmerkow
commented
May 17, 2015
|
I don't think there are changes needed in the prototxt to use this PR. Just set your dims using repeated values in the prototxt. i.e.:
The channel axis defaults to 1 (in this net there are 3 channels). If you want nd kernels just repeat kernel for each dim, instead of using kernel_h,kernel_w. The notes in caffe.proto describe it pretty well. |
|
Thanks @jmerkow -- there's a slight correction as
Or a full version with DummyData that you should be able to run (didn't test but it should work, possibly needing minor typo fixing):
|
|
@jeffdonahue thanks for the reference. Here is a debugged version of Jeff's prototxt if anyone else is interested (the layers needed names and the SoftmaxWithLoss layer doesn't like >4d blobs):
|
jmerkow
commented
May 20, 2015
|
@jeffdonahue , If you change Line 150 in filler.hpp to remove legacy calls (i.e. use |
tomdeschamps
commented
May 29, 2015
|
Thanks for sharing the prototxt @Russell91. I'm trying to use this with ND data (N>=3). |
jmerkow
commented
Jun 1, 2015
|
@tomdeschamps I would try a hdf5 data layer. I believe those can be used to load ND images. |
tomdeschamps
commented
Jun 2, 2015
|
Thanks @jmerkow. Yes, I'm trying to load it as ND using the hdf5 data layer, but I have an error in LegacyShape() in blob.hpp:141 "Cannot use legacy accessors on Blobs with > 4 axes". |
|
Sorry for the trouble -- there are indeed a lot of places in the code that still use the legacy blob dim accessors ( The legacy accessors should be removed from most places, definitely in |
tomdeschamps
commented
Jun 3, 2015
|
Yes that seems to work with the check removed. I used @jmerkow nd-pooling branch. However HDF5Data seems to be handled very differently (no scaling, not sure we write whole batches in each .h5 file, etc...). Is there a documentation on how caffe deals with this format? |
dzhwinter
commented
Jun 13, 2015
|
@jeffdonahue @tomdeschamps tested, check remove can be compile successfully. I'm wondering that nd convolution and nd pooling in @jmerkow can be used together now? |
dzhwinter
commented
Jun 15, 2015
|
Hi, is it possible to implement an cudnn 3d convolution? Flatten 3d volumn into 2d dim or something like that. |
Tgaaly
commented
Jun 25, 2015
|
@jeffdonahue @Russell91 This is great work! Thanks. Today is my first day brewing Caffe! I'm trying to use this nd-conv with a "Flatten" layer. The "Flatten" layer does not work on blobs with > 4 axes so it fails when the n-dimensions are larger than 2. Is there a fix/solution for this? Would appreciate any advise/help.
|
Tgaaly
commented
Jun 25, 2015
|
I disabled the check on line 141 of blob.hpp (shown below) and it ran. Will that cause any problems? Actually looking back at the thread - this seems to be the consensus of others and it make it work.
|
ToruHironaka
commented
Jun 29, 2015
|
@tomdeschamps, @jmerkow, you guys mentioned about loading ND images with hdf5 earlier. I try to load 3D volume images (Width, height, and Depth) too. I want to make sure a couple of things. I followed your conversations above. I do not think I can convert my 3D image files into lmdb or leveldb. Am I right? It looks like hdf5 data layer is only the way to load my 3D images to ND blob. Have you successfully loaded 3D images? If so, please give me some advise how to load 3D images. |
tomdeschamps
commented
Jun 30, 2015
|
I think the nd-pooling branch is based on the nd-convolution branch. See On Sat, Jun 13, 2015 at 5:51 AM, dzhwinter notifications@github.com wrote:
|
tomdeschamps
commented
Jun 30, 2015
|
HDF5 has been the only way for me to do it so far. On Mon, Jun 29, 2015 at 10:14 PM, toru notifications@github.com wrote:
|
ToruHironaka
commented
Jul 1, 2015
|
@tomdeschamps, Thanks for your information, |
Tgaaly
commented
Jul 1, 2015
|
@ToruHironaka I was able to load 3D data using HDF5 as follows:
To create the data I used the example in caffe/matlab/hdf5creation Despite that I was not able to output any data using HDF5. I'm still stuck on this. The following:
gives the following error:
I found a workaround for this error -> found here: #1189 |
ToruHironaka
commented
Jul 1, 2015
|
@Tgaaly Thanks! I will try this. Just a curiosity, What kind of 3D data image did you use? |
Tgaaly
commented
Jul 4, 2015
|
when is this pull request going to be merged into master?? |
Tgaaly
commented
Jul 5, 2015
|
Has anyone verified if this branch is working at all? I'm trying a 3D CNN on 3D synthetic data and my network does not converge. My data consists of 2 classes with binary voxels - class 1 is a box-like object in the center of the voxel grid and class 2 is 2 square-like objects separated - I have about 1920 training samples with labels. This is synthetic data with no noise. Despite what appears to be an easy task, the CNN is not able to learn anything. The accuracy is stuck at 0.5 and the loss goes up and down. See below.
|
jmerkow
commented
Jul 9, 2015
|
I can verify that it works. |
|
Yea, it worked great for me too.
|
Tgaaly
commented
Jul 9, 2015
|
can you share your prototxt scripts? |
This was referenced Jul 14, 2015
YutingZhang
commented
Aug 12, 2015
|
@jeffdonahue Do you know why the nd implementation is slower than the 2d conv. Is it due to the new version of im2col? If so, the fix can be straightforward, say, just using the 2d version of im2col. |
|
I re-implemented this and ND-Convolution in #2610 for OpenCL and CUDA. |
|
I just spent an absurd amount of time debugging this, so just a note to possibly save someone else a lot of time: if you happen to be using this PR with the static |
|
@jeffdonahue |
|
@naibaf7 right, that's a less hacky way to do it (wish I had known two days ago that you'd independently found and fixed this bug!). I was thinking of getting rid of |
xjtuljy
commented
Aug 25, 2015
|
Thank you very much for developing this code! It works on my data for training! But I found that the saved snapshot is empty (saved caffemodel file is a empty file) in my case, so could anyone help with this? |
jeffdonahue
added the
ready for review
label
Aug 26, 2015
|
This is rebased & ready for review. I've restored the original 2D If we want to merge this, I can prepare another commit to add to #2016 to address the compatibility issue discussed above. I cherry-picked my commit in #2959 to make the Python tests pass -- if we want to merge this but not #2959, I'll go back and change all the |
|
@jeffdonahue The kernel stride is a feature that allows to have a stride within the kernel rather than just having a continuous kernel. This gives rise to very interesting network architectures: The parameter can be included at little to no extra cost in the existing im2col/col2im kernels for higher dimensions. |
|
@naibaf7 sounds like a nice feature, but I'd rather restrict this PR to just generalizing existing functionality to N dimensions -- it's big enough as is. I'd be willing to review a separate PR for internally strided kernels, though. BTW, in case anyone was relying on it or wants to refer to it for any reason, I've left the previous version of this PR available at a new |
|
@jeffdonahue |
shelhamer
added the
focus
label
Aug 31, 2015
|
I updated the PR with fixes for |
shelhamer
referenced
this pull request
Sep 3, 2015
Merged
NetSpec: don't require lists to specify single-element repeated fields #2959
|
Rebased to remove extraneous |
Agreed. They make the 4D layers more clear, although they might not stay 4D for long since there is an ND pooling follow up already #2442. |
|
@jeffdonahue nice indexing logic and tests (in particular the 0D multiplication test)! It took me a while to read, but this looks good to me save my one minor comment about the legacy accessors in a test. Merge as you like. |
|
Thanks for reviewing this massive PR @shelhamer! I updated the 0D test per your comment. I decided to time this on Caffe reference model, and I found that it was still 20-30% slower... It turned out that I wasn't actually reading the value of I'm going to let this sit for a little while with the two new commits before squashing and merging. |
|
@jeffdonahue thanks for looping back and catching those issues! I did a time comparison between @ master:
@ nd-convolution:
I think this can be squashed and merged.
|
jeffdonahue
added some commits
Mar 5, 2015
|
Thanks for the extensive benchmarking @shelhamer! I'll merge as suggested. Feel free to comment if you suspect any performance or other issues related to this PR. |
jeffdonahue
added a commit
that referenced
this pull request
Sep 19, 2015
|
|
jeffdonahue |
2e1c1cb
|
jeffdonahue
merged commit 2e1c1cb
into
BVLC:master
Sep 19, 2015
1 check passed
jeffdonahue
deleted the
jeffdonahue:nd-convolution branch
Sep 20, 2015
This was referenced Sep 20, 2015
ToruHironaka
commented
Nov 11, 2015
|
@Tgaaly, I assume that you stacked a series of 2D images to make 3D image dataset in hdf5 and list hdf5 files in train.txt and test.txt for each classes. Is it so? Sorry, I have been working on something else. It's been for awhile for getting in touch with you. |
albenoit
referenced
this pull request
Nov 12, 2015
Closed
removed 4dimensions constraints on blob shape #3325
ToruHironaka
commented
Nov 20, 2015
|
Hi, All I converted 3D images (2 CT-scanned human brain images) into HDF5 datasets with (Number of images, Width, Length, channel [not specified for grayscale]) in python. I created HDF5 dataset files for each 3D volume images and listed these HDF5 files on train.txt and test.txt. Then, I defined a net with below code from caffe/examples/02-brewing-logreg.ipynb Note: below codes are the original codes from the example so I modified this code for my dataset and net. def logreg(hdf5, batch_size): with open('examples/hdf5_classification/logreg_auto_train.prototxt', 'w') as f: with open('examples/hdf5_classification/logreg_auto_test.prototxt', 'w') as f: After that, I run test with below code from the same example caffe.set_mode_gpu() accuracy = 0 print("Accuracy: {:.3f}".format(accuracy)) I got the results like below and it looks alright. Can anyone confirm my way of building caffe 3D (Depth,Width,Height,Channel [channel ignored if gray scale image]) model is correct or not? Results: I1120 12:43:45.839637 28983 solver.cpp:734] Snapshotting solver state to binary proto file/hdf_FT_iter_10000.solverstate |
The 'channels' axis should be right after the batch axis, so the shape should be |
ToruHironaka
commented
Nov 21, 2015
|
@jeffdonahue, Thanks for answering me my question, You stated earlier in this pull. Sorry, I missed it. I set my dimensions as follow: depth (stuck of 3D volume), channel (RGB), width, length. Caffe run okay and the result was better. I was probably training width and channel instead of height in my last run. Just one more question, can I use the same method for multi-channel or spectrum images in hdf5? |
jmerkow
commented
Nov 21, 2015
|
You can load whatever you want with hdf5 as long as its sticks to the NxCxSxSx....(where S is a spatial dim). You can have multiple channels in each image, or multiple batches in each file. I typically stick to a batch of 1, and increase/decrease with the batch_size param. But I don't think you need to, for example if you want images grouped into pre-determined batches. |
ToruHironaka
commented
Dec 3, 2015
|
@jeffdonahue @jmerkow @Tgaaly, I previously forgot to set convolution_param so my last train was not 3D. It was 2D because I did not add kernel_size. I referred Tgaaly's model layer above but I got the error below. [libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 32:16: Non-repeated field "kernel_size" is specified multiple times. I think my hdf5 dataset's dimension is wrong. I have my dataset is (number of files, channel, width, length). I think I am suppose set my dataset dinmesion (batch_size, channel, width, length, depth). Then, add 3 kernel_size in convolution_param. Do you have any suggestion? |
|
@ToruHironaka Branch: https://github.com/naibaf7/caffe (includes OpenCL support and 3D max pooling support) it's still work in progress but if you can provide me your network and how the data is formatted I might be able to prepare a working python script for you. |
ToruHironaka
commented
Dec 3, 2015
|
@naibaf7 |
anewlearner
commented
Jan 27, 2016
|
@ToruHironaka
Does anybody has a examples of this? Ps: I don’t know anything about python. |
ToruHironaka
commented
Jan 27, 2016
You have to use hdf data format for 3D-convolution in this promotion of caffe. I wrote a python script to convert my CT image files (Width, Height, Depth) into hdf data file. Then, I could train my 3D hdf datasets with this promotion of caffe. It worked but I did not get good results yet. My accuracy was very low like 0.4~0.6 and loss was always high like 1.5 or 1.6. I am now troubleshooting my image-to-hdf python scripts. I tested my python script to create 2D dataset and trained in the official caffe. I got the accuracy about 0.87 and loss was about 0.62. Then, I used other person's image-to-hdf matlab script to create hdf datasets with the same images and trained exactly the same way as my python script test. It got accuracy about 0.88 and loss was about 0.2. I created lmdb datasets with the same images by using caffe conversion command, which you used for 2D images. I got the accuracty about 0.93 and loss was 0.35. So, my image-to-hdf python conversion script was obviously worst. I am finding my data conversion problem now. This promotion of caffe accepted 3D dataset in hdf5 and it worked. Also, many people confirmed it. You should try it out. If you need my help, let me know because it helps my data conversion problem. If your hdf datasets work, you got my answer. |
anewlearner
commented
Jan 28, 2016
|
@ToruHironaka |
christianpayer
referenced
this pull request
Apr 12, 2016
Open
nd convolution and pooling with cuDNN #3983
SiddhantRanade
commented
Jun 8, 2016
|
Has anyone successfully gotten ND-Pooling to work? ND Convolution works without issues (from the master branch of BVLC-Caffe) |
|
@pietromaximoff Code is here: |
aliimran9010
commented
Jun 20, 2016
|
@jeffdonahue Hi, i am new to caffe. I used the "nd-convolution" branch. It gives me the How can i resolve it |
chuckcho
added a commit
to chuckcho/video-caffe
that referenced
this pull request
Jul 13, 2016
|
|
chuckcho |
dd0d374
|
SiddhantRanade
commented
Jul 15, 2016
|
@naibaf7 I get the following error when I try to use the opencl branch of caffe (with python): pycaffe.py:13: RuntimeWarning: to-Python converter for std::vector<int, std::allocator > already registered; second conversion method ignored. I have no idea whatsoever what this error is and how to resolve it. Do you know how I can fix this? |
paulcx
commented
Nov 7, 2016
|
@ToruHironaka Do you have an example of how to train such data format of CT images on Caffe with 3D convolution? |
ToruHironaka
commented
Nov 7, 2016
|
@paulcx I wrote a python script for converting image files into hdf5 format and I followed this promotion's thread above. I could trained models but I did not get good results so I did something wrong. |
xjtuljy
commented
Nov 8, 2016
|
@ToruHironaka Is hdf5 the only format that work with this PR? Did you try N-D max pooling together with N-D convolution? |
ToruHironaka
commented
Nov 8, 2016
|
@xjtuljy yes, hdf5 is the only format for this promotion. I tried to train my 3D-CNN for ND-Pooling with Promotion 2442 and 2824. They ran and seemed to be working but my result was bad, so I think I am doing something wrong with my training. |
This was referenced Feb 12, 2017
romangrothausmann
added a commit
to romangrothausmann/caffe
that referenced
this pull request
Jun 29, 2017
|
|
romangrothausmann |
badd205
|
jeffdonahue commentedMar 6, 2015
This PR extends convolution to N spatial axes, where Caffe's current convolution supports only 2D convolution (with 2 spatial axes: height and width). For 2D convolution, this implementation doesn't compare favorably with the existing one -- I haven't done much benchmarking, but I believe it's 25-75% slower on both CPU and GPU. So before this could be merged, I'd need to restore the existing implementation and use it as the default "engine" for 2D convolutions (but this more destructive version makes it easier to tell what I was thinking from looking at the diff). If anyone has any suggestions on improving the performance or thoughts on why it might be so much slower, I'd love to hear them.
Edit: benchmarking this on alexnet, it's about 33% slower:
@ master:
@ nd-convolution: