Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does InsertSplit() / SplitLayer do? #767

Closed
lim0606 opened this issue Jul 23, 2014 · 6 comments
Closed

What does InsertSplit() / SplitLayer do? #767

lim0606 opened this issue Jul 23, 2014 · 6 comments
Labels

Comments

@lim0606
Copy link

lim0606 commented Jul 23, 2014

Hi! I'm learning 'caffe' to develop new some other functionality in it, including new solvers.
However, I'm stuck at understanding the initialization of Net class (network architecture), especially understanding InsertSplit's purpose.
I've tried to understand its function in Init() of Net class, but I still have no clues. Is there anyone who can give an advise on what it stands for?
That will be a great help!. Thanks :)

@research2010
Copy link

Hi, could you have a test without the InsertSplits(in_param, &param); in the Net::Init() function?

@lim0606
Copy link
Author

lim0606 commented Jul 23, 2014

I'm testing it in accordance to your advise.
I commented out InsertSplits(in_param, &param);, and added following; param.CopyFrom(in_param);
~~However, it seems no difference in the lenet case (for mnist data).
I think the lenet's network architecture is not complex enough for InsertSplits' to do something meaningful.~~ There was no difference in the lenet case (for mnist data, supervised). In the case of mnist autoencoder example, It gives an error, Unknown blob input data to layer0` from net.cpp:88.

@research2010
Copy link

In line 52~53 of the file caffe/util/insert_splits.cpp, there is a comment as follows:

     // Create split layer for any input blobs used by other layers as bottom
     // blobs more than once.

I think it's the reasons. And I have wrote the following testing code to check the difference. Because the lenet's network doesn't need the split layers, there is no difference in your kind test above.
I think maybe the split layer is for another purposes.

The following file test_net_init.cpp is the testing code:

    #include <cstring>
    #include <cstdio>
    #include <cstdlib>
    #include <string>
    #include <utility>
    #include <vector>

    #include "caffe/common.hpp"
    #include "caffe/net.hpp"
    #include "caffe/util/insert_splits.hpp"
    #include "caffe/util/upgrade_proto.hpp"

    using namespace caffe;

    int main(int argc, char **argv)
    {
        if(argc!=2)
        {
            printf("./test_net_init net_proto_file\n");
            return 0;
        }

        string proto_filename=string(argv[1]);

        NetParameter param1;
        caffe::ReadNetParamsFromTextFileOrDie(proto_filename, &param1);

        printf("params before InsertSplits():\n");
        for(int layer_id=0;layer_id<param1.layers_size();layer_id++)
        {
            const LayerParameter& layer_param = param1.layers(layer_id);
            printf("%d: %s\n",layer_id, layer_param.name().c_str());
        }

        //do InsertSplits
        NetParameter param2;
        caffe::InsertSplits(param1, &param2);

        printf("\nparams after InsertSplits():\n");
        for(int layer_id=0;layer_id<param2.layers_size();layer_id++)
        {
            const LayerParameter& layer_param = param2.layers(layer_id);
            printf("%d: %s\n",layer_id, layer_param.name().c_str());
        }
        return 0;
    }

the Makefile is:

    CC=g++

    all:
        $(CC) -I../include -I/usr/local/cuda/include \
        test_net_init.cpp \
        -L/usr/local/lib/ -lprotobuf \
        -L/usr/local/cuda/lib64 -lcudart -lcublas -lcurand \
        -L../build/lib/ -lcaffe \
        -o test_net_init

The net1.prototxt is:

    name: "CIFAR10_quick_train"
    layers {
      name: "cifar"
      type: DATA
      top: "data"
      top: "label"
      data_param {
        source: "cifar10-leveldb/cifar-train-leveldb"
        mean_file: "mean.binaryproto"
        batch_size: 100
      }
    }
    layers {
      name: "conv1"
      type: CONVOLUTION
      bottom: "data"
      top: "conv1"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.0001
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "pool1"
      type: POOLING
      bottom: "conv1"
      top: "pool1"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "relu1"
      type: RELU
      bottom: "pool1"
      top: "pool1"
    }
    layers {
      name: "conv2"
      type: CONVOLUTION
      bottom: "pool1"
      top: "conv2"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "relu2"
      type: RELU
      bottom: "conv2"
      top: "conv2"
    }
    layers {
      name: "pool2"
      type: POOLING
      bottom: "conv2"
      top: "pool2"
      pooling_param {
        pool: AVE
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "conv3"
      type: CONVOLUTION
      bottom: "pool2"
      top: "conv3"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 64
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "relu3"
      type: RELU
      bottom: "conv3"
      top: "conv3"
    }
    layers {
      name: "pool3"
      type: POOLING
      bottom: "conv3"
      top: "pool3"
      pooling_param {
        pool: AVE
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "ip1"
      type: INNER_PRODUCT
      bottom: "pool3"
      top: "ip1"
      blobs_lr: 1
      blobs_lr: 2
      inner_product_param {
        num_output: 64
        weight_filler {
          type: "gaussian"
          std: 0.1
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "ip2"
      type: INNER_PRODUCT
      bottom: "ip1"
      top: "ip2"
      blobs_lr: 1
      blobs_lr: 2
      inner_product_param {
        num_output: 10
        weight_filler {
          type: "gaussian"
          std: 0.1
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "loss"
      type: SOFTMAX_LOSS
      bottom: "ip2"
      bottom: "label"
    }

and the net2.prototxt is:

    name: "CIFAR10_quick_train"
    layers {
      name: "cifar"
      type: DATA
      top: "data"
      top: "label"
      data_param {
        source: "cifar10-leveldb/cifar-train-leveldb"
        mean_file: "mean.binaryproto"
        batch_size: 100
      }
    }
    layers {
      name: "conv1"
      type: CONVOLUTION
      bottom: "data"
      top: "conv1"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.0001
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "conv11"
      type: CONVOLUTION
      bottom: "data"
      top: "conv11"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.0001
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "conv12"
      type: CONVOLUTION
      bottom: "data"
      top: "conv12"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.0001
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "pool1"
      type: POOLING
      bottom: "conv1"
      top: "pool1"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "relu1"
      type: RELU
      bottom: "pool1"
      top: "pool1"
    }
    layers {
      name: "conv2"
      type: CONVOLUTION
      bottom: "pool1"
      top: "conv2"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 32
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "relu2"
      type: RELU
      bottom: "conv2"
      top: "conv2"
    }
    layers {
      name: "pool2"
      type: POOLING
      bottom: "conv2"
      top: "pool2"
      pooling_param {
        pool: AVE
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "conv3"
      type: CONVOLUTION
      bottom: "pool2"
      top: "conv3"
      blobs_lr: 1
      blobs_lr: 2
      convolution_param {
        num_output: 64
        pad: 2
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "relu3"
      type: RELU
      bottom: "conv3"
      top: "conv3"
    }
    layers {
      name: "pool3"
      type: POOLING
      bottom: "conv3"
      top: "pool3"
      pooling_param {
        pool: AVE
        kernel_size: 3
        stride: 2
      }
    }
    layers {
      name: "ip1"
      type: INNER_PRODUCT
      bottom: "pool3"
      top: "ip1"
      blobs_lr: 1
      blobs_lr: 2
      inner_product_param {
        num_output: 64
        weight_filler {
          type: "gaussian"
          std: 0.1
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "ip2"
      type: INNER_PRODUCT
      bottom: "ip1"
      top: "ip2"
      blobs_lr: 1
      blobs_lr: 2
      inner_product_param {
        num_output: 10
        weight_filler {
          type: "gaussian"
          std: 0.1
        }
        bias_filler {
          type: "constant"
        }
      }
    }
    layers {
      name: "loss"
      type: SOFTMAX_LOSS
      bottom: "ip2"
      bottom: "label"
    }

and thw net1.png and net2.png are:

net1_

the output of ./test_net_init net1.prototxt is:

    params before InsertSplits():
    0: cifar
    1: conv1
    2: pool1
    3: relu1
    4: conv2
    5: relu2
    6: pool2
    7: conv3
    8: relu3
    9: pool3
    10: ip1
    11: ip2
    12: loss

    params after InsertSplits():
    0: cifar
    1: conv1
    2: pool1
    3: relu1
    4: conv2
    5: relu2
    6: pool2
    7: conv3
    8: relu3
    9: pool3
    10: ip1
    11: ip2
    12: loss

and the output of ./test_net_bin net2.prototxt is:

    params before InsertSplits():
    0: cifar
    1: conv1
    2: conv11
    3: conv12
    4: pool1
    5: relu1
    6: conv2
    7: relu2
    8: pool2
    9: conv3
    10: relu3
    11: pool3
    12: ip1
    13: ip2
    14: loss

    params after InsertSplits():
    0: cifar
    1: data_cifar_0_split
    2: conv1
    3: conv11
    4: conv12
    5: pool1
    6: relu1
    7: conv2
    8: relu2
    9: pool2
    10: conv3
    11: relu3
    12: pool3
    13: ip1
    14: ip2
    15: loss

@lim0606
Copy link
Author

lim0606 commented Jul 23, 2014

Thank you so much!! :) It helps me a lot!
I may take more time to understand its behavior since InsertSplit() makes only one more split (input) layer even when there are more than three duplicate usages of input blob.
If I understand further on the issue, I will update here.

@shelhamer
Copy link
Member

The SplitLayer is more DAG models to allow a layer output / top to be used as the input / bottom of several following layers. This lets the model branch without individual layers other than SplitLayer needing to know anything about.

Please continue the discussion on the caffe-users mailing list. As of the latest release we prefer to keep issues reserved for Caffe development. Thanks!

@shelhamer shelhamer changed the title I have a question about the purpose of InsertSplit class. What does InsertSplit() / SplitLayer do? Aug 12, 2014
@gmlyytt-YANG
Copy link

That’s right. It will create one additional layer and blobs as many as the top layers above the bottom layer. @lim0606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants