What does InsertSplit() / SplitLayer do? #767

lim0606 · 2014-07-23T07:44:44Z

Hi! I'm learning 'caffe' to develop new some other functionality in it, including new solvers.
However, I'm stuck at understanding the initialization of Net class (network architecture), especially understanding InsertSplit's purpose.
I've tried to understand its function in Init() of Net class, but I still have no clues. Is there anyone who can give an advise on what it stands for?
That will be a great help!. Thanks :)

research2010 · 2014-07-23T08:35:08Z

Hi, could you have a test without the InsertSplits(in_param, &param); in the Net::Init() function?

lim0606 · 2014-07-23T09:58:00Z

I'm testing it in accordance to your advise.
I commented out InsertSplits(in_param, &param);, and added following; param.CopyFrom(in_param);
~~However, it seems no difference in the lenet case (for mnist data).
I think the lenet's network architecture is not complex enough for InsertSplits' to do something meaningful.~~ There was no difference in the lenet case (for mnist data, supervised). In the case of mnist autoencoder example, It gives an error, Unknown blob input data to layer0` from net.cpp:88.

research2010 · 2014-07-23T10:43:27Z

In line 52~53 of the file caffe/util/insert_splits.cpp, there is a comment as follows:

     // Create split layer for any input blobs used by other layers as bottom
     // blobs more than once.

I think it's the reasons. And I have wrote the following testing code to check the difference. Because the lenet's network doesn't need the split layers, there is no difference in your kind test above.
I think maybe the split layer is for another purposes.

The following file test_net_init.cpp is the testing code:

    #include <cstring>
    #include <cstdio>
    #include <cstdlib>
    #include <string>
    #include <utility>
    #include <vector>

    #include "caffe/common.hpp"
    #include "caffe/net.hpp"
    #include "caffe/util/insert_splits.hpp"
    #include "caffe/util/upgrade_proto.hpp"

    using namespace caffe;

    int main(int argc, char **argv)
    {
        if(argc!=2)
        {
            printf("./test_net_init net_proto_file\n");
            return 0;
        }

        string proto_filename=string(argv[1]);

        NetParameter param1;
        caffe::ReadNetParamsFromTextFileOrDie(proto_filename, &param1);

        printf("params before InsertSplits():\n");
        for(int layer_id=0;layer_id<param1.layers_size();layer_id++)
        {
            const LayerParameter& layer_param = param1.layers(layer_id);
            printf("%d: %s\n",layer_id, layer_param.name().c_str());
        }

        //do InsertSplits
        NetParameter param2;
        caffe::InsertSplits(param1, &param2);

        printf("\nparams after InsertSplits():\n");
        for(int layer_id=0;layer_id<param2.layers_size();layer_id++)
        {
            const LayerParameter& layer_param = param2.layers(layer_id);
            printf("%d: %s\n",layer_id, layer_param.name().c_str());
        }
        return 0;
    }

the Makefile is:

    CC=g++

    all:
        $(CC) -I../include -I/usr/local/cuda/include \
        test_net_init.cpp \
        -L/usr/local/lib/ -lprotobuf \
        -L/usr/local/cuda/lib64 -lcudart -lcublas -lcurand \
        -L../build/lib/ -lcaffe \
        -o test_net_init

The net1.prototxt is:

    name: "CIFAR10_quick_train"
    layers {
      name: "cifar"
      type: DATA
      top: "data"
      top: "label"
      data_param {
        source: "cifar10-leveldb/cifar-train-leveldb"
        mean_file: "mean.binaryproto"
        batch_size: 100
      }
    }
    layers {
      name: "conv1"
      type: CONVOLUTION
      bottom: "data"
      top: "conv1"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.0001
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "pool1"
      type: POOLING
      bottom: "conv1"
      top: "pool1"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "relu1"
      type: RELU
      bottom: "pool1"
      top: "pool1"
    }
    layers {
      name: "conv2"
      type: CONVOLUTION
      bottom: "pool1"
      top: "conv2"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "relu2"
      type: RELU
      bottom: "conv2"
      top: "conv2"
    }
    layers {
      name: "pool2"
      type: POOLING
      bottom: "conv2"
      top: "pool2"
      pooling_param {
        pool: AVE
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "conv3"
      type: CONVOLUTION
      bottom: "pool2"
      top: "conv3"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 64
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "relu3"
      type: RELU
      bottom: "conv3"
      top: "conv3"
    }
    layers {
      name: "pool3"
      type: POOLING
      bottom: "conv3"
      top: "pool3"
      pooling_param {
        pool: AVE
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "ip1"
      type: INNER_PRODUCT
      bottom: "pool3"
      top: "ip1"
      blobs_lr: 1
      blobs_lr: 2
      inner_product_param {
        num_output: 64
        weight_filler {
          type: "gaussian"
          std: 0.1
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "ip2"
      type: INNER_PRODUCT
      bottom: "ip1"
      top: "ip2"
      blobs_lr: 1
      blobs_lr: 2
      inner_product_param {
        num_output: 10
        weight_filler {
          type: "gaussian"
          std: 0.1
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "loss"
      type: SOFTMAX_LOSS
      bottom: "ip2"
      bottom: "label"
    }

and the net2.prototxt is:

    name: "CIFAR10_quick_train"
    layers {
      name: "cifar"
      type: DATA
      top: "data"
      top: "label"
      data_param {
        source: "cifar10-leveldb/cifar-train-leveldb"
        mean_file: "mean.binaryproto"
        batch_size: 100
      }
    }
    layers {
      name: "conv1"
      type: CONVOLUTION
      bottom: "data"
      top: "conv1"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.0001
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "conv11"
      type: CONVOLUTION
      bottom: "data"
      top: "conv11"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.0001
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "conv12"
      type: CONVOLUTION
      bottom: "data"
      top: "conv12"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.0001
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "pool1"
      type: POOLING
      bottom: "conv1"
      top: "pool1"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "relu1"
      type: RELU
      bottom: "pool1"
      top: "pool1"
    }
    layers {
      name: "conv2"
      type: CONVOLUTION
      bottom: "pool1"
      top: "conv2"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "relu2"
      type: RELU
      bottom: "conv2"
      top: "conv2"
    }
    layers {
      name: "pool2"
      type: POOLING
      bottom: "conv2"
      top: "pool2"
      pooling_param {
        pool: AVE
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "conv3"
      type: CONVOLUTION
      bottom: "pool2"
      top: "conv3"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 64
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "relu3"
      type: RELU
      bottom: "conv3"
      top: "conv3"
    }
    layers {
      name: "pool3"
      type: POOLING
      bottom: "conv3"
      top: "pool3"
      pooling_param {
        pool: AVE
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "ip1"
      type: INNER_PRODUCT
      bottom: "pool3"
      top: "ip1"
      blobs_lr: 1
      blobs_lr: 2
      inner_product_param {
        num_output: 64
        weight_filler {
          type: "gaussian"
          std: 0.1
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "ip2"
      type: INNER_PRODUCT
      bottom: "ip1"
      top: "ip2"
      blobs_lr: 1
      blobs_lr: 2
      inner_product_param {
        num_output: 10
        weight_filler {
          type: "gaussian"
          std: 0.1
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "loss"
      type: SOFTMAX_LOSS
      bottom: "ip2"
      bottom: "label"
    }

and thw net1.png and net2.png are:

the output of ./test_net_init net1.prototxt is:

    params before InsertSplits():
    0: cifar
    1: conv1
    2: pool1
    3: relu1
    4: conv2
    5: relu2
    6: pool2
    7: conv3
    8: relu3
    9: pool3
    10: ip1
    11: ip2
    12: loss

    params after InsertSplits():
    0: cifar
    1: conv1
    2: pool1
    3: relu1
    4: conv2
    5: relu2
    6: pool2
    7: conv3
    8: relu3
    9: pool3
    10: ip1
    11: ip2
    12: loss

and the output of ./test_net_bin net2.prototxt is:

    params before InsertSplits():
    0: cifar
    1: conv1
    2: conv11
    3: conv12
    4: pool1
    5: relu1
    6: conv2
    7: relu2
    8: pool2
    9: conv3
    10: relu3
    11: pool3
    12: ip1
    13: ip2
    14: loss

    params after InsertSplits():
    0: cifar
    1: data_cifar_0_split
    2: conv1
    3: conv11
    4: conv12
    5: pool1
    6: relu1
    7: conv2
    8: relu2
    9: pool2
    10: conv3
    11: relu3
    12: pool3
    13: ip1
    14: ip2
    15: loss

lim0606 · 2014-07-23T14:31:25Z

Thank you so much!! :) It helps me a lot!
I may take more time to understand its behavior since InsertSplit() makes only one more split (input) layer even when there are more than three duplicate usages of input blob.
If I understand further on the issue, I will update here.

shelhamer · 2014-08-12T00:49:17Z

The SplitLayer is more DAG models to allow a layer output / top to be used as the input / bottom of several following layers. This lets the model branch without individual layers other than SplitLayer needing to know anything about.

Please continue the discussion on the caffe-users mailing list. As of the latest release we prefer to keep issues reserved for Caffe development. Thanks!

gmlyytt-YANG · 2017-11-14T08:49:00Z

That’s right. It will create one additional layer and blobs as many as the top layers above the bottom layer. @lim0606

shelhamer closed this as completed Aug 12, 2014

shelhamer added the question label Aug 12, 2014

shelhamer changed the title ~~I have a question about the purpose of InsertSplit class.~~ What does InsertSplit() / SplitLayer do? Aug 12, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does InsertSplit() / SplitLayer do? #767

What does InsertSplit() / SplitLayer do? #767

lim0606 commented Jul 23, 2014

research2010 commented Jul 23, 2014

lim0606 commented Jul 23, 2014

research2010 commented Jul 23, 2014

lim0606 commented Jul 23, 2014

shelhamer commented Aug 12, 2014

gmlyytt-YANG commented Nov 14, 2017

What does InsertSplit() / SplitLayer do? #767

What does InsertSplit() / SplitLayer do? #767

Comments

lim0606 commented Jul 23, 2014

research2010 commented Jul 23, 2014

lim0606 commented Jul 23, 2014

research2010 commented Jul 23, 2014

lim0606 commented Jul 23, 2014

shelhamer commented Aug 12, 2014

gmlyytt-YANG commented Nov 14, 2017