ND convolution with im2col #2049

Merged
merged 3 commits into from Sep 19, 2015

Conversation

Projects
None yet
Contributor

jeffdonahue commented Mar 6, 2015

This PR extends convolution to N spatial axes, where Caffe's current convolution supports only 2D convolution (with 2 spatial axes: height and width). For 2D convolution, this implementation doesn't compare favorably with the existing one -- I haven't done much benchmarking, but I believe it's 25-75% slower on both CPU and GPU. So before this could be merged, I'd need to restore the existing implementation and use it as the default "engine" for 2D convolutions (but this more destructive version makes it easier to tell what I was thinking from looking at the diff). If anyone has any suggestions on improving the performance or thoughts on why it might be so much slower, I'd love to hear them.

Edit: benchmarking this on alexnet, it's about 33% slower:

@ master:

I0305 21:07:25.042047 22060 caffe.cpp:271] Average Forward pass: 486.327 ms.
I0305 21:07:25.042064 22060 caffe.cpp:273] Average Backward pass: 824.147 ms.
I0305 21:07:25.042079 22060 caffe.cpp:275] Average Forward-Backward: 1310.68 ms.

@ nd-convolution:

I0305 21:02:03.827594 12909 caffe.cpp:271] Average Forward pass: 681.38 ms.
I0305 21:02:03.827608 12909 caffe.cpp:273] Average Backward pass: 1068.98 ms.
I0305 21:02:03.827623 12909 caffe.cpp:275] Average Forward-Backward: 1750.56 ms.

shelhamer added the ES label Mar 7, 2015

barhomi commented Mar 31, 2015

Is this using cudnn v2 as a backend for the Nd-Conv? if that's the case, I think nvidia's nd-conv (only 3D for now) is not as tuned as their 2D conv, from the release notes:

"As a BETA preview in this release, the convolution forward, convolution
weight and data gradient, and cudnnSetTensor/cudnnScaleTensor routines
now support 3D datasets through the “Nd” interface. Note, however, that
these 3D routines have not yet been tuned for optimal performance. This will
be improved in future releases."

Contributor

jeffdonahue commented Mar 31, 2015

No, this doesn't touch CuDNN, it only generalizes the im2col convolution implementation (which predates CuDNN).

barhomi commented Mar 31, 2015

ah, sorry! my bad

On Tue, Mar 31, 2015 at 5:34 PM, Jeff Donahue notifications@github.com
wrote:

No, this doesn't touch CuDNN, it only generalizes the im2col convolution
implementation (which predates CuDNN).


Reply to this email directly or view it on GitHub
#2049 (comment).

Youssef Barhomi
PhD candidate
Brown University and University of Pierre and Marie Curie
T: +1 (617) 797 9929 | GMT -5:00 | http://barhomi.github.io

wkal commented Apr 1, 2015

Could you provide a demo to show how to use it? Otherwise, there will be huge high learning curve to test and use your work. Thanks!

avalada commented May 12, 2015

@jeffdonahue Does this also support 1D conv?

Contributor

jeffdonahue commented May 16, 2015

@avalada yes, any N >= 0 should theoretically be supported. In practice, 0D convolution -- scalar multiplication -- probably doesn't work, but should and would make a great unit test. I expect 1-10D convolution to work out of the box with this; >10 won't work on GPU -- you'd have to add your case to the switch statements in 718802e.

Also, 1D convolution is supported by the current implementation as well; just set either the width or height to 1. Theoretically, doing 1D convolution using an ND implementation could/should be more efficient than using a 2D implementation with a singleton dim, but with the apparently large overhead in the 2D case, I would be surprised if that's the case here -- you're probably better off sticking with the existing 2D implementation. (But I'd be very interested to know the comparison if you decide to benchmark.)

Contributor

Russell91 commented May 17, 2015

Hey Jeff,

Is there any chance you could link to an example prototxt making use of this pull request? It would be nice to have that to get started.

jmerkow commented May 17, 2015

I don't think there are changes needed in the prototxt to use this PR. Just set your dims using repeated values in the prototxt. i.e.:

name: "4d-net"
input: "data"
input_dim: 1
input_dim: 3 
input_dim: 5
input_dim: 5
input_dim: 5
input_dim: 5

The channel axis defaults to 1 (in this net there are 3 channels). If you want nd kernels just repeat kernel for each dim, instead of using kernel_h,kernel_w.

The notes in caffe.proto describe it pretty well.

Contributor

jeffdonahue commented May 17, 2015

Thanks @jmerkow -- there's a slight correction as input_dim is deprecated in favor of input_shape, as input_dim only works for 4D blobs. It should be:

name: "4d-net"
input: "data"
input_shape { dim: 1 dim: 3 dim: 5 dim: 5 dim: 5 dim: 5 }

Or a full version with DummyData that you should be able to run (didn't test but it should work, possibly needing minor typo fixing):

name: "4d-net"
# -> data: 10 x 3 x 2 x 3 x 4 x 5
# -> label: 10
layer {
  type: "DummyData"
  top: "data"
  top: "label"
  dummy_data_param {
    shape { dim: 10 dim: 3 dim: 2 dim: 3 dim: 4 dim: 5 }
    shape { dim: 10 }
  }
}
# -> conv-out: 10 x 8 x 1 x 1 x 1 x 1
layer {
  type: "Convolution"
  bottom: "data"
  top: "conv-out"
  convolution_param {
    num_output: 8
    # specifies the index of the "channels" axis --
    # may be omitted as 1 is the default
    axis: 1
    kernel_size: 2
    kernel_size: 3
    kernel_size: 4
    kernel_size: 5  # i could just specify "kernel_size: 2" if i wanted 2x2x2x2 filters
  }
}
layer {
  type: "SoftmaxWithLoss"
  bottom: "conv-out"
  bottom: "label"
  top: "loss"
}
Contributor

Russell91 commented May 19, 2015

@jeffdonahue thanks for the reference. Here is a debugged version of Jeff's prototxt if anyone else is interested (the layers needed names and the SoftmaxWithLoss layer doesn't like >4d blobs):

name: "4d-net"
# -> data: 10 x 3 x 2 x 3 x 4 x 5
# -> label: 10
layer {
  name: "dummy"
  type: "DummyData"
  top: "data"
  top: "label"
  dummy_data_param {
    shape { dim: 10 dim: 3 dim: 2 dim: 3 dim: 4 dim: 5 }
    shape { dim: 10 dim: 1 }
  }
}
# -> conv-out: 10 x 8 x 1 x 1 x 1 x 1
layer {
  name: "conv-out"
  type: "Convolution"
  bottom: "data"
  top: "conv-out"
  convolution_param {
    num_output: 8
    # specifies the index of the "channels" axis --
    # may be omitted as 1 is the default
    axis: 1
    kernel_size: 2
    kernel_size: 3
    kernel_size: 4
    kernel_size: 5  # i could just specify "kernel_size: 2" if i wanted 2x2x2x2 filters
  }
}
layer {
  name: "ip-out"
  type: "InnerProduct"
  bottom: "conv-out"
  top: "ip-out"
  inner_product_param {
    num_output: 1
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip-out"
  bottom: "label"
  top: "loss"
}

jmerkow commented May 20, 2015

@jeffdonahue , If you change Line 150 in filler.hpp to remove legacy calls (i.e. use blob->shape(0)).
You'd probably want to change the sqrt(3) to account for dimensions too.
You can fill your weights with xavier for nd convolution. There are a number of legacy calls in that file that need to be updated.
Im happy to make a PR to update.

https://github.com/jeffdonahue/caffe/blob/718802ef46045c3efe6ef6b5035702d692138010/include/caffe/filler.hpp#L150

Thanks for sharing the prototxt @Russell91. I'm trying to use this with ND data (N>=3).
Do you have an idea how I can create a lmdb database from a bunch of ND datasets? The example convert_imageset uses caffe::Datum which is limited to height/weight/channel.

jmerkow commented Jun 1, 2015

@tomdeschamps I would try a hdf5 data layer. I believe those can be used to load ND images.

Thanks @jmerkow. Yes, I'm trying to load it as ND using the hdf5 data layer, but I have an error in LegacyShape() in blob.hpp:141 "Cannot use legacy accessors on Blobs with > 4 axes".
My HDF5Data dimensions is typically (batch size, 1, Z, Y, X) These are batches of grayscale volumes of size XxYxZ. Any idea what I should do? Did anybody actually tried to load ND data in this branch?

Contributor

jeffdonahue commented Jun 2, 2015

Sorry for the trouble -- there are indeed a lot of places in the code that still use the legacy blob dim accessors (num, channels, height, width). As a quick fix, you might be able to remove that check, and depending on what layers you're using, things might be fine...

The legacy accessors should be removed from most places, definitely in HDF5DataLayer -- PRs welcome. I'm slightly hesitant to say they should be removed everywhere as they can make the code clearer where they're correct -- I introduced the shape call with ND blobs in #1970, and decided against a global find/replace of the legacy calls for that reason. Maybe they should only remain in the still-4D-only vision layers (PoolingLayer, LRNLayer). Any thoughts from other Caffe devs?

Yes that seems to work with the check removed. I used @jmerkow nd-pooling branch. However HDF5Data seems to be handled very differently (no scaling, not sure we write whole batches in each .h5 file, etc...). Is there a documentation on how caffe deals with this format?

@jeffdonahue @tomdeschamps tested, check remove can be compile successfully. I'm wondering that nd convolution and nd pooling in @jmerkow can be used together now?

Hi, is it possible to implement an cudnn 3d convolution? Flatten 3d volumn into 2d dim or something like that.
Because it really hurt the speed without support cudnn. On another side, 3d convolution always process videos, which need more compute resources.

Tgaaly commented Jun 25, 2015

@jeffdonahue @Russell91 This is great work! Thanks. Today is my first day brewing Caffe!

I'm trying to use this nd-conv with a "Flatten" layer. The "Flatten" layer does not work on blobs with > 4 axes so it fails when the n-dimensions are larger than 2. Is there a fix/solution for this? Would appreciate any advise/help.

I0625 16:20:14.583077  4782 net.cpp:380]  flatdata <- data_dummy_0_split_1
I0625 16:20:14.583091  4782 net.cpp:338] flatdata -> flatdata
I0625 16:20:14.583103  4782 net.cpp:113] Setting up flatdata
F0625 16:20:14.583122  4782 blob.hpp:141] Check failed: num_axes() <= 4 (5 vs. 4) Cannot use legacy accessors on Blobs with > 4 axes.

Tgaaly commented Jun 25, 2015

I disabled the check on line 141 of blob.hpp (shown below) and it ran. Will that cause any problems? Actually looking back at the thread - this seems to be the consensus of others and it make it work.

CHECK_LE(num_axes(), 4)<< "Cannot use legacy accessors on Blobs with > 4 axes.";

@tomdeschamps, @jmerkow, you guys mentioned about loading ND images with hdf5 earlier. I try to load 3D volume images (Width, height, and Depth) too. I want to make sure a couple of things. I followed your conversations above. I do not think I can convert my 3D image files into lmdb or leveldb. Am I right? It looks like hdf5 data layer is only the way to load my 3D images to ND blob. Have you successfully loaded 3D images? If so, please give me some advise how to load 3D images.

I think the nd-pooling branch is based on the nd-convolution branch. See
its description.

On Sat, Jun 13, 2015 at 5:51 AM, dzhwinter notifications@github.com wrote:

@jeffdonahue https://github.com/jeffdonahue @tomdeschamps
https://github.com/tomdeschamps tested, check remove can be compile
successfully. I'm wondering that nd convolution and nd pooling in @jmerkow
https://github.com/jmerkow can be used together now?


Reply to this email directly or view it on GitHub
#2049 (comment).

HDF5 has been the only way for me to do it so far.

On Mon, Jun 29, 2015 at 10:14 PM, toru notifications@github.com wrote:

@tomdeschamps https://github.com/tomdeschamps, @jmerkow
https://github.com/jmerkow, you guys mentioned about loading ND images
with hdf5 earlier. I try to load 3D volume images (Width, height, and
Depth) too. I want to make sure a couple of things. I followed your
conversations above. I do not think I can convert my 3D image files into
lmdb or leveldb. Am I right? It looks like hdf5 data layer is only the way
to load my 3D images to ND blob. Have you successfully loaded 3D images? If
so, please give me some advise how to load 3D images.


Reply to this email directly or view it on GitHub
#2049 (comment).

@tomdeschamps, Thanks for your information,

Tgaaly commented Jul 1, 2015

@ToruHironaka I was able to load 3D data using HDF5 as follows:

layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  hdf5_data_param{
    source: "<path>/list.txt"
    batch_size: 10
  }
  include{
    phase: TRAIN
  }
}

To create the data I used the example in caffe/matlab/hdf5creation

Despite that I was not able to output any data using HDF5. I'm still stuck on this. The following:

layer {
  name: "output"
  type: "HDF5Output"
  bottom: "ip1"
  bottom: "label"
  hdf5_output_param {
    file_name: "<path>/output.h5"
  }
}

gives the following error:

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 140533133920256:
  #000: ../../../src/H5F.c line 1504 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: ../../../src/H5F.c line 1307 in H5F_open(): unable to truncate a file which is already open
    major: File accessibilty
    minor: Unable to open file
F0630 20:28:44.699677 18387 hdf5_output_layer.cpp:20] Check failed: file_id_ >= 0 (-1 vs. 0) Failed to open HDF5 file/home/.../Code_Research/Caffe_data_prep/output_cnn.h5

I found a workaround for this error -> found here: #1189

@Tgaaly Thanks! I will try this. Just a curiosity, What kind of 3D data image did you use?

Tgaaly commented Jul 4, 2015

when is this pull request going to be merged into master??

Tgaaly commented Jul 5, 2015

Has anyone verified if this branch is working at all? I'm trying a 3D CNN on 3D synthetic data and my network does not converge.

My data consists of 2 classes with binary voxels - class 1 is a box-like object in the center of the voxel grid and class 2 is 2 square-like objects separated - I have about 1920 training samples with labels. This is synthetic data with no noise. Despite what appears to be an easy task, the CNN is not able to learn anything. The accuracy is stuck at 0.5 and the loss goes up and down. See below.

name: "3DCNN"
state {
  phase: TEST
}
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  hdf5_data_param {
    source: "/home/.../list.txt"
    batch_size: 100
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 48
    kernel_size: 7
    kernel_size: 7
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "conv1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 160
    kernel_size: 5
    kernel_size: 5
    kernel_size: 5
    stride: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "conv2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip3"
  type: "InnerProduct"
  bottom: "ip2"
  top: "ip3"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "SigmoidCrossEntropyLoss"
  bottom: "ip3"
  bottom: "label"
  top: "cross_entropy_loss"
  loss_weight: 1
}
I0704 20:49:41.932796  9560 layer_factory.hpp:74] Creating layer data
I0704 20:49:41.932816  9560 net.cpp:84] Creating Layer data
I0704 20:49:41.932829  9560 net.cpp:338] data -> data
I0704 20:49:41.932847  9560 net.cpp:338] data -> label
I0704 20:49:41.932862  9560 net.cpp:113] Setting up data
I0704 20:49:41.932874  9560 hdf5_data_layer.cpp:66] Loading list of HDF5 filenames from: /home/.../list.txt
I0704 20:49:41.932909  9560 hdf5_data_layer.cpp:80] Number of HDF5 files: 1
I0704 20:49:41.983906  9560 net.cpp:120] Top shape: 100 1 21 21 21 (926100)
I0704 20:49:41.983955  9560 net.cpp:120] Top shape: 100 2 (200)
I0704 20:49:41.983973  9560 layer_factory.hpp:74] Creating layer conv1
I0704 20:49:41.984004  9560 net.cpp:84] Creating Layer conv1
I0704 20:49:41.984055  9560 net.cpp:380] conv1 <- data
I0704 20:49:41.984074  9560 net.cpp:338] conv1 -> conv1
I0704 20:49:41.984098  9560 net.cpp:113] Setting up conv1
I0704 20:49:41.984284  9560 net.cpp:120] Top shape: 100 48 8 8 8 (2457600)
I0704 20:49:41.984310  9560 layer_factory.hpp:74] Creating layer conv2
I0704 20:49:41.984328  9560 net.cpp:84] Creating Layer conv2
I0704 20:49:41.984340  9560 net.cpp:380] conv2 <- conv1
I0704 20:49:41.984355  9560 net.cpp:338] conv2 -> conv2
I0704 20:49:41.984375  9560 net.cpp:113] Setting up conv2
I0704 20:49:41.993021  9560 net.cpp:120] Top shape: 100 160 2 2 2 (128000)
I0704 20:49:41.993072  9560 layer_factory.hpp:74] Creating layer ip1
I0704 20:49:41.993093  9560 net.cpp:84] Creating Layer ip1
I0704 20:49:41.993105  9560 net.cpp:380] ip1 <- conv2
I0704 20:49:41.993121  9560 net.cpp:338] ip1 -> ip1
I0704 20:49:41.993142  9560 net.cpp:113] Setting up ip1
I0704 20:49:42.004967  9560 net.cpp:120] Top shape: 100 1000 (100000)
I0704 20:49:42.005023  9560 layer_factory.hpp:74] Creating layer relu1
I0704 20:49:42.005044  9560 net.cpp:84] Creating Layer relu1
I0704 20:49:42.005056  9560 net.cpp:380] relu1 <- ip1
I0704 20:49:42.005071  9560 net.cpp:327] relu1 -> ip1 (in-place)
I0704 20:49:42.005087  9560 net.cpp:113] Setting up relu1
I0704 20:49:42.005103  9560 net.cpp:120] Top shape: 100 1000 (100000)
I0704 20:49:42.005115  9560 layer_factory.hpp:74] Creating layer ip2
I0704 20:49:42.005131  9560 net.cpp:84] Creating Layer ip2
I0704 20:49:42.005143  9560 net.cpp:380] ip2 <- ip1
I0704 20:49:42.005157  9560 net.cpp:338] ip2 -> ip2
I0704 20:49:42.005174  9560 net.cpp:113] Setting up ip2
I0704 20:49:42.009666  9560 net.cpp:120] Top shape: 100 500 (50000)
I0704 20:49:42.009723  9560 layer_factory.hpp:74] Creating layer ip3
I0704 20:49:42.009744  9560 net.cpp:84] Creating Layer ip3
I0704 20:49:42.009757  9560 net.cpp:380] ip3 <- ip2
I0704 20:49:42.009771  9560 net.cpp:338] ip3 -> ip3
I0704 20:49:42.009788  9560 net.cpp:113] Setting up ip3
I0704 20:49:42.009820  9560 net.cpp:120] Top shape: 100 2 (200)
I0704 20:49:42.009840  9560 layer_factory.hpp:74] Creating layer loss
I0704 20:49:42.009857  9560 net.cpp:84] Creating Layer loss
I0704 20:49:42.009868  9560 net.cpp:380] loss <- ip3
I0704 20:49:42.009879  9560 net.cpp:380] loss <- label
I0704 20:49:42.009893  9560 net.cpp:338] loss -> cross_entropy_loss
I0704 20:49:42.009908  9560 net.cpp:113] Setting up loss
I0704 20:49:42.009927  9560 net.cpp:120] Top shape: (1)
I0704 20:49:42.009938  9560 net.cpp:122]     with loss weight 1
I0704 20:49:42.009958  9560 net.cpp:167] loss needs backward computation.
I0704 20:49:42.009969  9560 net.cpp:167] ip3 needs backward computation.
I0704 20:49:42.009979  9560 net.cpp:167] ip2 needs backward computation.
I0704 20:49:42.009990  9560 net.cpp:167] relu1 needs backward computation.
I0704 20:49:42.010000  9560 net.cpp:167] ip1 needs backward computation.
I0704 20:49:42.010011  9560 net.cpp:167] conv2 needs backward computation.
I0704 20:49:42.010021  9560 net.cpp:167] conv1 needs backward computation.
I0704 20:49:42.010032  9560 net.cpp:169] data does not need backward computation.
I0704 20:49:42.010042  9560 net.cpp:205] This network produces output cross_entropy_loss
I0704 20:49:42.010057  9560 net.cpp:447] Collecting Learning Rate and Weight Decay.
I0704 20:49:42.010071  9560 net.cpp:217] Network initialization done.
I0704 20:49:42.010082  9560 net.cpp:218] Memory required for data: 15048404
I0704 20:49:42.010154  9560 solver.cpp:42] Solver scaffolding done.
I0704 20:49:42.010198  9560 solver.cpp:222] Solving 3DCNN
I0704 20:49:42.010210  9560 solver.cpp:223] Learning Rate Policy: inv
I0704 20:49:42.010222  9560 solver.cpp:266] Iteration 0, Testing net (#0)
I0704 20:50:45.217939  9560 solver.cpp:315]     Test net output #0: cross_entropy_loss = 1.35162 (* 1 = 1.35162 loss)
I0704 20:50:45.375299  9560 solver.cpp:189] Iteration 0, loss = 1.38567
I0704 20:50:45.375416  9560 solver.cpp:204]     Train net output #0: accuracy = 0.5
I0704 20:50:45.375435  9560 solver.cpp:204]     Train net output #1: cross_entropy_loss = 1.38567 (* 1 = 1.38567 loss)
I0704 20:50:45.375468  9560 solver.cpp:464] Iteration 0, lr = 0.01
I0704 20:51:10.861009  9560 solver.cpp:189] Iteration 100, loss = -5.12955e-08
I0704 20:51:10.861086  9560 solver.cpp:204]     Train net output #0: accuracy = 0.5
I0704 20:51:10.861106  9560 solver.cpp:204]     Train net output #1: cross_entropy_loss = 0 (* 1 = 0 loss)
I0704 20:51:10.861124  9560 solver.cpp:464] Iteration 100, lr = 0.00992565
I0704 20:51:35.468628  9560 solver.cpp:189] Iteration 200, loss = -5.12955e-08
I0704 20:51:35.468865  9560 solver.cpp:204]     Train net output #0: accuracy = 0.5
I0704 20:51:35.468895  9560 solver.cpp:204]     Train net output #1: cross_entropy_loss = 0 (* 1 = 0 loss)
I0704 20:51:35.468914  9560 solver.cpp:464] Iteration 200, lr = 0.00985258
I0704 20:52:00.172819  9560 solver.cpp:189] Iteration 300, loss = -5.12955e-08
I0704 20:52:00.172878  9560 solver.cpp:204]     Train net output #0: accuracy = 0.5
I0704 20:52:00.172899  9560 solver.cpp:204]     Train net output #1: cross_entropy_loss = 0 (* 1 = 0 loss)
I0704 20:52:00.172916  9560 solver.cpp:464] Iteration 300, lr = 0.00978075
I0704 20:52:24.744035  9560 solver.cpp:189] Iteration 400, loss = -2.03011e-08
I0704 20:52:24.744181  9560 solver.cpp:204]     Train net output #0: accuracy = 0.5
I0704 20:52:24.744223  9560 solver.cpp:204]     Train net output #1: cross_entropy_loss = 3.09944e-08 (* 1 = 3.09944e-08 loss)
I0704 20:52:24.744251  9560 solver.cpp:464] Iteration 400, lr = 0.00971013
I0704 20:52:49.456743  9560 solver.cpp:334] Snapshotting to CNN3D__iter_500.caffemodel
I0704 20:52:49.498097  9560 solver.cpp:342] Snapshotting solver state to CNN3D__iter_500.solverstate
I0704 20:52:49.523550  9560 solver.cpp:266] Iteration 500, Testing net (#0)
I0704 20:53:49.756580  9560 solver.cpp:315]     Test net output #0: cross_entropy_loss = 1.6098e-08 (* 1 = 1.6098e-08 loss)
I0704 20:53:49.886836  9560 solver.cpp:189] Iteration 500, loss = -1.9109e-08
I0704 20:53:49.886880  9560 solver.cpp:204]     Train net output #0: accuracy = 0.5
I0704 20:53:49.886899  9560 solver.cpp:204]     Train net output #1: cross_entropy_loss = 3.21865e-08 (* 1 = 3.21865e-08 loss)
I0704 20:53:49.886915  9560 solver.cpp:464] Iteration 500, lr = 0.00964069
I0704 20:54:14.471463  9560 solver.cpp:189] Iteration 600, loss = -5.12955e-08
I0704 20:54:14.471525  9560 solver.cpp:204]     Train net output #0: accuracy = 0.5
I0704 20:54:14.471546  9560 solver.cpp:204]     Train net output #1: cross_entropy_loss = 0 (* 1 = 0 loss)
I0704 20:54:14.471565  9560 solver.cpp:464] Iteration 600, lr = 0.0095724

jmerkow commented Jul 9, 2015

I can verify that it works.

Contributor

Russell91 commented Jul 9, 2015

Yea, it worked great for me too.
On Jul 8, 2015 5:20 PM, "Jameson" notifications@github.com wrote:

I can verify that it works.


Reply to this email directly or view it on GitHub
#2049 (comment).

Tgaaly commented Jul 9, 2015

can you share your prototxt scripts?

This was referenced Jul 14, 2015

@jeffdonahue Do you know why the nd implementation is slower than the 2d conv. Is it due to the new version of im2col? If so, the fix can be straightforward, say, just using the 2d version of im2col.

Member

naibaf7 commented Aug 20, 2015

I re-implemented this and ND-Convolution in #2610 for OpenCL and CUDA.
It also supports strided kernels over there (ND-SK convolutions and max pooling).

Contributor

jeffdonahue commented Aug 21, 2015

I just spent an absurd amount of time debugging this, so just a note to possibly save someone else a lot of time: if you happen to be using this PR with the static col_buffer_ to save memory (#2016), things are likely going to be broken in the backward pass. This is because the GPU implementation relies on col_buffer_.gpu_shape() which will likely have been changed by other layers during the forward pass, which runs Reshape -- during backward the shape will probably be incorrect for all layers except the last one in the forward pass (and first one in the backward pass). (For my own work I've hackily fixed this by inserting col_buffer_.Reshape(col_buffer_shape_) into all of the BaseConvolutionLayer methods that use col_buffer_.)

Member

naibaf7 commented Aug 21, 2015

@jeffdonahue
I think I found another fix for this on #2610:
Keep a non-initialized col_buffer_ with gpu_shape() locally in the layer (does not waste memory unless .gpu_data() or .mutable_gpu_data() would get called) while using a device-context bound col_buffer() for the actual storage (that will be used on all layers). I think this fixes the issue well enough for now.

Contributor

jeffdonahue commented Aug 21, 2015

@naibaf7 right, that's a less hacky way to do it (wish I had known two days ago that you'd independently found and fixed this bug!). I was thinking of getting rid of gpu_shape altogether and just using an extra Blob to store the shape itself, which would be a little more consistent than the current gpu_shape way, as that's what's done for the kernel/stride/pad shapes.

xjtuljy commented Aug 25, 2015

Thank you very much for developing this code! It works on my data for training! But I found that the saved snapshot is empty (saved caffemodel file is a empty file) in my case, so could anyone help with this?

Contributor

jeffdonahue commented Aug 26, 2015

This is rebased & ready for review. I've restored the original 2D im2col implementations and added a few tests, including modifying @shelhamer's explicit 2D conv implementation for 3D convolution (by adding another 2 levels of loop nesting...). Besides the tests, I'm reasonably confident it works, now that I've used it myself.

If we want to merge this, I can prepare another commit to add to #2016 to address the compatibility issue discussed above.

I cherry-picked my commit in #2959 to make the Python tests pass -- if we want to merge this but not #2959, I'll go back and change all the kernel_size=x, stride=x, pad=x to kernel_size=[x], stride=[x], pad=[x] in the NetSpec tests (not wanting to do that everywhere in my own work was what prompted me to create PR #2959).

Member

naibaf7 commented Aug 26, 2015

@jeffdonahue
Would you be interested to have a look at my ND-SK kernels (ConvolutionND) in #2610? They are based on your work.
They add support for another parameter: The kernel stride.

The kernel stride is a feature that allows to have a stride within the kernel rather than just having a continuous kernel. This gives rise to very interesting network architectures:
Reference: http://arxiv.org/abs/1412.4526

The parameter can be included at little to no extra cost in the existing im2col/col2im kernels for higher dimensions.

Contributor

jeffdonahue commented Aug 26, 2015

@naibaf7 sounds like a nice feature, but I'd rather restrict this PR to just generalizing existing functionality to N dimensions -- it's big enough as is. I'd be willing to review a separate PR for internally strided kernels, though.

BTW, in case anyone was relying on it or wants to refer to it for any reason, I've left the previous version of this PR available at a new nd-convolution-old branch. It could be helpful for understanding the ND implementations to look at the 2D vs. ND diff which is no longer in the diff for this PR since I went back and restored the 2D im2col implementations.

Member

naibaf7 commented Aug 26, 2015

@jeffdonahue
Ok cool, yeah you're right, just wanted to hint you at it since it seems special convolutions are your thing :)
Let me know if you need assistance with it if you want to tackle it someday.

shelhamer added the focus label Aug 31, 2015

Contributor

jeffdonahue commented Sep 2, 2015

I updated the PR with fixes for CuDNNConvolutionLayer. (Previously it didn't compile.)

Contributor

jeffdonahue commented Sep 3, 2015

Rebased to remove extraneous NetSpec commit after merge of #2959.

Owner

shelhamer commented Sep 4, 2015

The legacy accessors should be removed from most places [...]
Maybe they should only remain in the still-4D-only vision layers (PoolingLayer, LRNLayer).

Agreed. They make the 4D layers more clear, although they might not stay 4D for long since there is an ND pooling follow up already #2442.

Owner

shelhamer commented Sep 4, 2015

@jeffdonahue nice indexing logic and tests (in particular the 0D multiplication test)! It took me a while to read, but this looks good to me save my one minor comment about the legacy accessors in a test.

Merge as you like.

Contributor

jeffdonahue commented Sep 4, 2015

Thanks for reviewing this massive PR @shelhamer! I updated the 0D test per your comment.

I decided to time this on Caffe reference model, and I found that it was still 20-30% slower... It turned out that I wasn't actually reading the value of force_nd_im2col so everything was using the ND version (d'oh). I fixed that bug in the first fixup commit above, and that brought it back up nearly to master speed. But it still seemed to be slower by something like 5-15 ms (1 or 2%ish) per iteration (on a Titan X). I suspected that gpu_shape might be de/allocating memory every iteration, and that might be the issue, and confirmed as much by adding print statements to CaffeMallocHost -- there were dozens of 4-16 byte cudaMallocs every training iteration. I changed Blob::Reshape to fix that with the last commit above (which allocates only when resizing to more axes, the way data and diff already work), and I think that may have improved the performance by 5-10 ms, but the caffe time variance makes it hard to tell... I need to write a script to alternately check out this branch and master N times and run and parse caffe time, then run some t tests, I guess.

I'm going to let this sit for a little while with the two new commits before squashing and merging.

Owner

shelhamer commented Sep 19, 2015

@jeffdonahue thanks for looping back and catching those issues! I did a time comparison between master and the latest nd-convolution in 2D mode for VGG with batch size 64 over 20 iterations (thinking that more convolution should yield less variance) and the times come out essentially the same.

@ master:

Average Forward-Backward: 2257.18 ms.
Average Forward-Backward: 2540.17 ms.
Average Forward-Backward: 2539.6 ms.
Average Forward-Backward: 2539.05 ms.
Average Forward-Backward: 2537.91 ms.
Average Forward-Backward: 2539.19 ms.
Average Forward-Backward: 2538.15 ms.
Average Forward-Backward: 2538.36 ms.
Average Forward-Backward: 2537.73 ms.
Average Forward-Backward: 2537.24 ms.

Grand Average*: 2538.6 ms

@ nd-convolution:

Average Forward-Backward: 2276.38 ms.
Average Forward-Backward: 2539.79 ms.
Average Forward-Backward: 2540.43 ms.
Average Forward-Backward: 2539.31 ms.
Average Forward-Backward: 2539.07 ms.
Average Forward-Backward: 2539.14 ms.
Average Forward-Backward: 2539.24 ms.
Average Forward-Backward: 2539.32 ms.
Average Forward-Backward: 2539.09 ms.
Average Forward-Backward: 2539.75 ms.

Grand Average*: 2539.46 ms

I think this can be squashed and merged.

* The grand average excludes the first round of 20 iterations because of the weird "burn-in" variance in timing.

Contributor

jeffdonahue commented Sep 19, 2015

Thanks for the extensive benchmarking @shelhamer! I'll merge as suggested.

Feel free to comment if you suspect any performance or other issues related to this PR.

@jeffdonahue jeffdonahue added a commit that referenced this pull request Sep 19, 2015

@jeffdonahue jeffdonahue Merge pull request #2049 from jeffdonahue/nd-convolution
ND convolution with im2col
2e1c1cb

@jeffdonahue jeffdonahue merged commit 2e1c1cb into BVLC:master Sep 19, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

jeffdonahue deleted the jeffdonahue:nd-convolution branch Sep 20, 2015

@Tgaaly, I assume that you stacked a series of 2D images to make 3D image dataset in hdf5 and list hdf5 files in train.txt and test.txt for each classes. Is it so? Sorry, I have been working on something else. It's been for awhile for getting in touch with you.

Hi, All

I converted 3D images (2 CT-scanned human brain images) into HDF5 datasets with (Number of images, Width, Length, channel [not specified for grayscale]) in python. I created HDF5 dataset files for each 3D volume images and listed these HDF5 files on train.txt and test.txt. Then, I defined a net with below code from caffe/examples/02-brewing-logreg.ipynb

Note: below codes are the original codes from the example so I modified this code for my dataset and net.

def logreg(hdf5, batch_size):
# logistic regression: data, matrix multiplication, and 2-class softmax loss
n = caffe.NetSpec()
n.data, n.label = L.HDF5Data(batch_size=batch_size, source=hdf5, ntop=2)
n.ip1 = L.InnerProduct(n.data, num_output=2, weight_filler=dict(type='xavier'))
n.accuracy = L.Accuracy(n.ip1, n.label)
n.loss = L.SoftmaxWithLoss(n.ip1, n.label)
return n.to_proto()

with open('examples/hdf5_classification/logreg_auto_train.prototxt', 'w') as f:
f.write(str(logreg('examples/hdf5_classification/data/train.txt', 10)))

with open('examples/hdf5_classification/logreg_auto_test.prototxt', 'w') as f:
f.write(str(logreg('examples/hdf5_classification/data/test.txt', 10)))

After that, I run test with below code from the same example

caffe.set_mode_gpu()
solver = caffe.get_solver('examples/hdf5_classification/solver.prototxt')
solver.solve()

accuracy = 0
batch_size = solver.test_nets[0].blobs['data'].num
test_iters = int(len(Xt) / batch_size)
for i in range(test_iters):
solver.test_nets[0].forward()
accuracy += solver.test_nets[0].blobs['accuracy'].data
accuracy /= test_iters

print("Accuracy: {:.3f}".format(accuracy))

I got the results like below and it looks alright. Can anyone confirm my way of building caffe 3D (Depth,Width,Height,Channel [channel ignored if gray scale image]) model is correct or not?

Results:

I1120 12:43:45.839637 28983 solver.cpp:734] Snapshotting solver state to binary proto file/hdf_FT_iter_10000.solverstate
I1120 12:43:45.846904 28983 solver.cpp:326] Iteration 10000, loss = 0
I1120 12:43:45.847026 28983 solver.cpp:346] Iteration 10000, Testing net (#0)
I1120 12:43:45.847059 28983 net.cpp:781] Copying source layer data
I1120 12:43:45.847087 28983 net.cpp:781] Copying source layer label_data_1_split
I1120 12:43:45.847115 28983 net.cpp:781] Copying source layer ip1
I1120 12:43:45.847146 28983 net.cpp:781] Copying source layer ip1_ip1_0_split
I1120 12:43:45.847172 28983 net.cpp:781] Copying source layer accuracy
I1120 12:43:45.847198 28983 net.cpp:781] Copying source layer loss
I1120 12:43:45.847236 28983 hdf5_data_layer.cu:33] Looping around to first file.
I1120 12:43:45.847262 28983 hdf5_data_layer.cpp:29] Loading HDF5 file: /CTBraintest.h5
I1120 12:43:45.856613 28983 hdf5_data_layer.cpp:68] Successully loaded 20 rows
I1120 12:43:45.862017 28983 hdf5_data_layer.cpp:29] Loading HDF5 file: /CTImagetest.h5
I1120 12:43:45.874867 28983 hdf5_data_layer.cpp:68] Successully loaded 20 rows
I1120 12:43:45.879843 28983 hdf5_data_layer.cu:33] Looping around to first file.
I1120 12:43:45.879860 28983 hdf5_data_layer.cpp:29] Loading HDF5 file: /CTBraintest.h5
.
.
I1120 12:43:46.574106 28983 hdf5_data_layer.cpp:68] Successully loaded 20 rows
I1120 12:43:46.579458 28983 solver.cpp:414] Test net output #0: accuracy = 0.9
I1120 12:43:46.579494 28983 solver.cpp:414] Test net output #1: loss = 8.73365 (* 1 = 8.73365 loss)
I1120 12:43:46.579507 28983 solver.cpp:331] Optimization Done.
I1120 12:43:46.580368 28983 hdf5_data_layer.cu:33] Looping around to first file.
I1120 12:43:46.580379 28983 hdf5_data_layer.cpp:29] Loading HDF5 file: /CTBraintest.h5
I1120 12:43:46.589470 28983 hdf5_data_layer.cpp:68] Successully loaded 20 rows
I1120 12:43:46.598145 28983 hdf5_data_layer.cpp:29] Loading HDF5 file: /CTImagetest.h5
I1120 12:43:46.606251 28983 hdf5_data_layer.cpp:68] Successully loaded 20 rows
I1120 12:43:46.615324 28983 hdf5_data_layer.cu:33] Looping around to first file.
I1120 12:43:46.615339 28983 hdf5_data_layer.cpp:29] Loading HDF5 file: /CTBraintest.h5
I1120 12:43:46.624083 28983 hdf5_data_layer.cpp:68] Successully loaded 20 rows
I1120 12:43:46.632520 28983 hdf5_data_layer.cpp:29] Loading HDF5 file: /CTImagetest.h5
I1120 12:43:46.641490 28983 hdf5_data_layer.cpp:68] Successully loaded 20 rows
I1120 12:43:46.649895 28983 hdf5_data_layer.cu:33] Looping around to first file.
I1120 12:43:46.649909 28983 hdf5_data_layer.cpp:29] Loading HDF5 file: /CTBraintest.h5
I1120 12:43:46.657821 28983 hdf5_data_layer.cpp:68] Successully loaded 20 rows
Accuracy: 0.880

Contributor

jeffdonahue commented Nov 20, 2015

(Number of images, Width, Length, channel [not specified for grayscale])

The 'channels' axis should be right after the batch axis, so the shape should be (batch_size, channels, *spatial_axes). The ordering of spatial axes is arbitrary. (The only reason channels are treated separately from spatial axes is to support groups, which make the layer act like a locally-connected layer along the channel axis.)

@jeffdonahue, Thanks for answering me my question, You stated earlier in this pull. Sorry, I missed it. I set my dimensions as follow: depth (stuck of 3D volume), channel (RGB), width, length. Caffe run okay and the result was better. I was probably training width and channel instead of height in my last run. Just one more question, can I use the same method for multi-channel or spectrum images in hdf5?

jmerkow commented Nov 21, 2015

You can load whatever you want with hdf5 as long as its sticks to the NxCxSxSx....(where S is a spatial dim). You can have multiple channels in each image, or multiple batches in each file. I typically stick to a batch of 1, and increase/decrease with the batch_size param. But I don't think you need to, for example if you want images grouped into pre-determined batches.
You can also put multiple images into each hdf5 layer and reference them with different tops. These can be concatenated/silenced if you want to try different combinations.
--Jameson

@jeffdonahue @jmerkow @Tgaaly, I previously forgot to set convolution_param so my last train was not 3D. It was 2D because I did not add kernel_size. I referred Tgaaly's model layer above but I got the error below.

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 32:16: Non-repeated field "kernel_size" is specified multiple times.

I think my hdf5 dataset's dimension is wrong. I have my dataset is (number of files, channel, width, length). I think I am suppose set my dataset dinmesion (batch_size, channel, width, length, depth). Then, add 3 kernel_size in convolution_param.

Do you have any suggestion?

Member

naibaf7 commented Dec 3, 2015

@ToruHironaka
If you are interested, I am currently working on a python interface for custom network training/processing in 3D.
It should simplify dataset loading because you can manipulate/preprocess in python and then load it to Caffe via MemoryDataLayers.

Branch: https://github.com/naibaf7/caffe (includes OpenCL support and 3D max pooling support)
Interface: https://github.com/naibaf7/PyGreentea

it's still work in progress but if you can provide me your network and how the data is formatted I might be able to prepare a working python script for you.

@naibaf7
Thanks, I will check your Branch out.

@ToruHironaka
I’m new to caffe and have just run the examples of mnist and cifar10. Now I want to train my own network using 3D CT slices base on caffe-windows. The data I have is in the form of mat(N_C_Sx_Sy_Sz, S:spatial). I read the discussion above, but still confused.
The process to train 2D image is as follows.

  1. Changing the data format into Leveldb or LMDB(convert_imageset.cpp).
  2. Mean(compute_image_mean.cpp).
  3. Train(train_net.cpp).
    My understanding of train 3D image is as follows. I’m not sure whether is correct.
  4. Using jeffdonahue’s project(changing caffe.proto,convolution and blobs).
  5. Changing the data format into hdf5(I have the original format of mat,so no change).
  6. Mean(the input of compute_image_mean should be leveldb,so do I need to write a new function to count the mean?).
  7. Train(Can I use train_net.cpp here).

Does anybody has a examples of this?
Thanks!

Ps: I don’t know anything about python.

@anewlearner

  1. Using jeffdonahue’s project(changing caffe.proto,convolution and blobs). Yes
  2. Changing the data format into hdf5(I have the original format of mat,so no change). Yes
  3. Mean(the input of compute_image_mean should be leveldb,so do I need to write a new function to count the mean?). I did not use mean.
  4. Train(Can I use train_net.cpp here). I could use caffe train command to train with this promotion of caffe

You have to use hdf data format for 3D-convolution in this promotion of caffe. I wrote a python script to convert my CT image files (Width, Height, Depth) into hdf data file. Then, I could train my 3D hdf datasets with this promotion of caffe. It worked but I did not get good results yet. My accuracy was very low like 0.4~0.6 and loss was always high like 1.5 or 1.6.

I am now troubleshooting my image-to-hdf python scripts. I tested my python script to create 2D dataset and trained in the official caffe. I got the accuracy about 0.87 and loss was about 0.62. Then, I used other person's image-to-hdf matlab script to create hdf datasets with the same images and trained exactly the same way as my python script test. It got accuracy about 0.88 and loss was about 0.2. I created lmdb datasets with the same images by using caffe conversion command, which you used for 2D images. I got the accuracty about 0.93 and loss was 0.35. So, my image-to-hdf python conversion script was obviously worst. I am finding my data conversion problem now.

This promotion of caffe accepted 3D dataset in hdf5 and it worked. Also, many people confirmed it. You should try it out. If you need my help, let me know because it helps my data conversion problem. If your hdf datasets work, you got my answer.

@ToruHironaka
Thanks for your answer!
I plan to change the data with the matlab script store2hdf5 provided by caffe.
The process of mean may help the net to converge faster, you can try it later.

Has anyone successfully gotten ND-Pooling to work? ND Convolution works without issues (from the master branch of BVLC-Caffe)

Member

naibaf7 commented Jun 8, 2016 edited

@pietromaximoff
https://github.com/BVLC/caffe/tree/opencl
in this branch I have implemented MaxPooling for ND, and it works successfully.
Would be great though if someone has the time to complete this for the other pooling types & make a PR to master.
Feel free to use the code there as a starting point.

Code is here:
https://github.com/BVLC/caffe/blob/opencl/src/caffe/layers/pooling_layer.cu
(MaxPoolingNDForward/MaxPoolingNDBackward)
and the Reshape + constructor from here:
https://github.com/BVLC/caffe/blob/opencl/src/caffe/layers/pooling_layer.cpp

@jeffdonahue Hi, i am new to caffe. I used the "nd-convolution" branch. It gives me the
Check failed: num_kernel_dims == 1 || num_kernel_dims == num_spatial_axes_ kernel_size must be specified once, or once per spatial dimension (kernel_size specified 3 times; 2 spatial dims).

How can i resolve it

@chuckcho chuckcho added a commit to chuckcho/video-caffe that referenced this pull request Jul 13, 2016

@chuckcho chuckcho WIP for N-d pooling (based on dd0d374

@naibaf7 I get the following error when I try to use the opencl branch of caffe (with python):

pycaffe.py:13: RuntimeWarning: to-Python converter for std::vector<int, std::allocator > already registered; second conversion method ignored.
from ._caffe import
Floating point exception

I have no idea whatsoever what this error is and how to resolve it. Do you know how I can fix this?

pcmoritz referenced this pull request in pcmoritz/Strada.jl Aug 5, 2016

Open

support for 3D? #1

paulcx commented Nov 7, 2016

@ToruHironaka Do you have an example of how to train such data format of CT images on Caffe with 3D convolution?

@paulcx I wrote a python script for converting image files into hdf5 format and I followed this promotion's thread above. I could trained models but I did not get good results so I did something wrong.

xjtuljy commented Nov 8, 2016

@ToruHironaka Is hdf5 the only format that work with this PR? Did you try N-D max pooling together with N-D convolution?

@xjtuljy yes, hdf5 is the only format for this promotion. I tried to train my 3D-CNN for ND-Pooling with Promotion 2442 and 2824. They ran and seemed to be working but my result was bad, so I think I am doing something wrong with my training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment