Consolidate network definitions #57

shelhamer · 2014-01-26T06:26:44Z

Right now, a model typically has three CaffeNet definitions for training, validation, and deployment (imagenet.prototxt, imagenet_val.prototxt, imagenet_deploy.prototxt respectively for the ImageNet example). These protobufs are full of redundancy and tweaking networks requires a lot of copy-and-paste.

Is a unified protocol buffer to describe the input/output for these cases together possible?

kloudkl · 2014-01-26T09:05:17Z

In principle, it is. Cuda-convnet uses one config file for each model.

diff imagenet.prototxt imagenet_deploy.prototxt
< name: "CaffeNet"
< layers {
<   layer {
<     name: "data"
<     type: "data"
<     source: "/home/jiayq/Data/ILSVRC12/train-leveldb"
<     meanfile: "/home/jiayq/Data/ILSVRC12/image_mean.binaryproto"
<     batchsize: 256
<     cropsize: 227
<     mirror: true
<   }
<   top: "data"
<   top: "label"
< }
---
> input: "data"
> input_dim: 10
> input_dim: 3
> input_dim: 227
> input_dim: 227
359,360c350,351
<     name: "loss"
<     type: "softmax_loss"
---
>     name: "prob"
>     type: "softmax"
363c354
<   bottom: "label"
---
>   top: "prob"

softmax_loss may need to be split into softmax plus independent loss function.

diff imagenet.prototxt imagenet_val.prototxt 
6c6
<     source: "/home/jiayq/Data/ILSVRC12/train-leveldb"
---
>     source: "/home/jiayq/Data/ILSVRC12/val-leveldb"
8c8
<     batchsize: 256
---
>     batchsize: 50
10c10
<     mirror: true
---
>     mirror: false
22,33d21
<     weight_filler {
<       type: "gaussian"
<       std: 0.01
<     }
<     bias_filler {
<       type: "constant"
<       value: 0.
<     }
<     blobs_lr: 1.
<     blobs_lr: 2.
<     weight_decay: 1.
<     weight_decay: 0.
84,95d71
<     weight_filler {
<       type: "gaussian"
<       std: 0.01
<     }
<     bias_filler {
<       type: "constant"
<       value: 1.
<     }
<     blobs_lr: 1.
<     blobs_lr: 2.
<     weight_decay: 1.
<     weight_decay: 0.
145,156d120
<     weight_filler {
<       type: "gaussian"
<       std: 0.01
<     }
<     bias_filler {
<       type: "constant"
<       value: 0.
<     }
<     blobs_lr: 1.
<     blobs_lr: 2.
<     weight_decay: 1.
<     weight_decay: 0.
185,196d148
<     weight_filler {
<       type: "gaussian"
<       std: 0.01
<     }
<     bias_filler {
<       type: "constant"
<       value: 1.
<     }
<     blobs_lr: 1.
<     blobs_lr: 2.
<     weight_decay: 1.
<     weight_decay: 0.
225,236d176
<     weight_filler {
<       type: "gaussian"
<       std: 0.01
<     }
<     bias_filler {
<       type: "constant"
<       value: 1.
<     }
<     blobs_lr: 1.
<     blobs_lr: 2.
<     weight_decay: 1.
<     weight_decay: 0.
265,276d204
<     weight_filler {
<       type: "gaussian"
<       std: 0.005
<     }
<     bias_filler {
<       type: "constant"
<       value: 1.
<     }
<     blobs_lr: 1.
<     blobs_lr: 2.
<     weight_decay: 1.
<     weight_decay: 0.
303,314d230
<     weight_filler {
<       type: "gaussian"
<       std: 0.005
<     }
<     bias_filler {
<       type: "constant"
<       value: 1.
<     }
<     blobs_lr: 1.
<     blobs_lr: 2.
<     weight_decay: 1.
<     weight_decay: 0.
341,352d256
<     weight_filler {
<       type: "gaussian"
<       std: 0.01
<     }
<     bias_filler {
<       type: "constant"
<       value: 0
<     }
<     blobs_lr: 1.
<     blobs_lr: 2.
<     weight_decay: 1.
<     weight_decay: 0.
359,360c263,264
<     name: "loss"
<     type: "softmax_loss"
---
>     name: "prob"
>     type: "softmax"
362a267,274
>   top: "prob"
> }
> layers {
>   layer {
>     name: "accuracy"
>     type: "accuracy"
>   }
>   bottom: "prob"
363a276
>   top: "accuracy"

imagenet_val.prototxt has the same layer as the imagenet.prototxt but does not use the optimization parameters blobs_lr, weight_decay, weight_filler, bias_filler. It is ok for test to ignore these fields.

The fields source, batchsize, mirror have conflict values. Just add a prefix train_ or test_ before each of them.

The softmax_loss vs softmax issue has appeared in the output of "diff imagenet.prototxt imagenet_deploy.prototxt".
val.prototxt has an extra accuracy layer. It is necessary to add a field to indicate specific layers are only used in one or two of the training, testing and deployment stages.

sguada · 2014-01-27T06:24:07Z

Although it will be possible to define one prototxt for both, I think the current separation allows more flexibility, and it is easy for the code to read and process them the network definitions. Although a network verification (making sure training and test are compatible) would be nice. If there were a join prototxt, then the code will need to interpret it differently for each case. For instance the deploy cases will became more difficult to handle.

@Yangqing what do you think about this?

Yangqing · 2014-01-27T06:29:23Z

I actually like the consolidation idea - having a way to consolidate
multiple protobuf files would allow us to reduce redundancy, since many of
them are actually pretty similar. This being said, I don't have a good idea
on how to do this in the scope of protobuf (they do not allow e.g. #include
type thing). Any suggestions are welcome.

Yangqing

On Sun, Jan 26, 2014 at 10:24 PM, Sergio Guadarrama <
notifications@github.com> wrote:

Although it will be possible to define one prototxt for both, I think the
current separation allows more flexibility, and it is easy for the code to
read and process them the network definitions. Although a network
verification (making sure training and test are compatible) would be nice.
If there were a join prototxt, then the code will need to interpret it
differently for each case. For instance the deploy cases will became more
difficult to handle.

@Yangqing https://github.com/Yangqing what do you think about this?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/57#issuecomment-33344460
.

jeffdonahue · 2014-01-28T19:11:32Z

One idea could be to add into either the LayerConnection or LayerParameter proto (not sure which would be more natural) a field "repeated string phase" (or maybe enum). If empty, the layer is used in all phases; if specified, the layer is ignored for all phases except the specified one. Then in imagenet.prototxt we specify, for example, two data layers with different phases:

layers {
  layer {
    name: "data"
    type: "data"
    source: "/home/jiayq/Data/ILSVRC12/train-leveldb"
    meanfile: "/home/jiayq/Data/ILSVRC12/image_mean.binaryproto"
    batchsize: 256
    cropsize: 227
    mirror: true
  }
  top: "data"
  top: "label"
  phase: "train"
}
layers {
  layer {
    name: "data"
    type: "data"
    source: "/home/jiayq/Data/ILSVRC12/val-leveldb"
    meanfile: "/home/jiayq/Data/ILSVRC12/image_mean.binaryproto"
    batchsize: 50
    cropsize: 227
    mirror: false
  }
  top: "data"
  top: "label"
  phase: "val"
}

...

layers {
  layer {
    name: "loss"
    type: "softmax_loss"
  }
  bottom: "fc8"
  bottom: "label"
  phase: "train"
}
layers {
  layer {
    name: "prob"
    type: "softmax"
  }
  bottom: "fc8"
  top: "prob"
  phase: "val"
  phase: "deploy"
}
layers {
  layer {
    name: "accuracy"
    type: "accuracy"
  }
  bottom: "prob"
  bottom: "label"
  top: "accuracy"
  phase: "val"
}

shelhamer · 2014-01-28T19:20:14Z

I like @jeffdonahue's proposal a lot. It's concise and the meaning is clear and I don't think it would complicate net construction much. I could try for a PR next week.

sguada · 2014-01-29T01:46:03Z

Another possibility will be to separate the network architecture from the training and and testing parameters/layers and then do an explicit include and merge, which will read the file in in the include_net field and merge the protobuf together.

# network.prototxt
layers {
  layer {
    name: "data"
    type: "data"
    meanfile: "/home/jiayq/ilsvrc2012_mean.binaryproto"
  }
  top: "data"
  top: "label"
}
layers {
  layer {
    name: "conv1"
    type: "conv"
    num_output: 96
    kernelsize: 11
    stride: 4
  }
  bottom: "data"
  top: "conv1"
}
layers {
  layer {
    name: "relu1"
    type: "relu"
  }
  bottom: "conv1"
  top: "conv1"
}
layers {
  layer {
    name: "pool1"
    type: "pool"
    pool: MAX
    kernelsize: 3
    stride: 2
  }
  bottom: "conv1"
  top: "pool1"
}
layers {
  layer {
    name: "norm1"
    type: "lrn"
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
  bottom: "pool1"
  top: "norm1"
}
...
layers {
  layer {
    name: "fc8"
    type: "innerproduct"
    num_output: 1000
  }
  bottom: "fc7"
  top: "fc8"
}

Then define the network_train.prototxt as:

# network_train.prototxt
include_net: "network.prototxt"
layers {
  layer {
    name: "data"
    source: "/home/jiayq/caffe-train-leveldb/"
    batchsize: 256
    cropsize: 227
    mirror: true
  }
}
layers {
  layer {
    name: "conv1"
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.
    }
    blobs_lr: 1.
    blobs_lr: 2.
    weight_decay: 1.
    weight_decay: 0.
  }
}
layers {
  layer {
    name: "conv2"
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1.
    }
    blobs_lr: 1.
    blobs_lr: 2.
    weight_decay: 1.
    weight_decay: 0.
  }
 }
...
layers {
  layer {
    name: "loss"
    type: "softmax_loss"
  }
  bottom: "fc8"
  bottom: "label"
}

And define the network_test.prototxt as:

# network_test.prototxt
include_net: "network.prototxt"
layers {
  layer {
    name: "data"
    source: "/home/jiayq/caffe-val-leveldb/"
    batchsize: 50
    cropsize: 227
    mirror: false
  }
layers {
  layer {
    name: "prob"
    type: "softmax"
  }
  bottom: "fc8"
  top: "prob"
}
layers {
  layer {
    name: "accuracy"
    type: "accuracy"
  }
  bottom: "prob"
  bottom: "label"
  top: "accuracy"
}

Additionally we could define a default layer, that contains a set of default values for a certain kind of layers, for instance it could be used to set the blobslr and weight_decay.

In this case a more compact definition of network_train.prototxt will be:

# network_train.prototxt
include_net: "network.prototxt"
layers {
  layer {
    name: "data"
    source: "/home/jiayq/caffe-train-leveldb/"
    batchsize: 256
    cropsize: 227
    mirror: true
  }
}
layers {
  layer {
    name: "default"
    type: "conv"
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.
    }
    blobs_lr: 1.
    blobs_lr: 2.
    weight_decay: 1.
    weight_decay: 0.
  }
}
layers {
  layer {
    name: "conv2"
    bias_filler {
       value: 1.
    }
  }
}
layers {
  layer {
    name: "conv4"
    bias_filler {
       value: 1.
    }
  }
}
layers {
  layer {
    name: "conv5"
    bias_filler {
       value: 1.
    }
  }
}
layers {
  layer {
    name: "fc6"
    weight_filler {
      std: 0.005
    }
    bias_filler {
       value: 1.
    }
  }
}
layers {
  layer {
    name: "loss"
    type: "softmax_loss"
  }
  bottom: "fc8"
  bottom: "label"
}

shelhamer · 2014-01-29T19:55:20Z

To me, part of the point of consolidation is to have a single definition file as in @jeffdonahue 's proposal. I want as little redundancy as can be. Including seems like more difficult logic with protobuf too. I am going to hack on a single file def with phase.

sguada · 2014-01-29T21:04:24Z

@shelhamer you could still put what I said in one file, where there is one part that define the architecture, another that define the things specific for the training phase and another that define the things specific for the test phase.

I'm looking forward to see your proposal.

sguada · 2014-02-05T19:06:12Z

We should probably use [packed=true] for all repeated fields with basic types
https://developers.google.com/protocol-buffers/docs/encoding#optional

There is actually a way to import other proto definitions. We may should consider this.
https://developers.google.com/protocol-buffers/docs/proto#other

Also there is a simple way to merge messages, that we could use to merge partial definitions of networks
https://developers.google.com/protocol-buffers/docs/encoding#optional

mavenlin · 2014-02-16T07:39:43Z

Even though the definitions are consolidated, in the current implementation there are two networks initialized in the code, one used for training, and one for testing, the test net copies parameter from the train net.

I think they should be consolidated as well, with the help of a split layer, both softmax_loss and accuracy_layer can be put in the same model. In this case, the phase parameter is not used for model initialization, but used in the forward and backward functions to decide whether calculation is needed. This will save the extra memory used during test phase.

kloudkl · 2014-02-17T12:08:13Z

@mavenlin, if your proposal is implemented, it will also speed up Solver::Test() by eliminating the time of memory copy. Why don't you create an issue?

  CHECK_NOTNULL(test_net_.get())->CopyTrainedLayersFrom(net_param);

kloudkl · 2014-02-21T09:09:52Z

Now that the amazing SplitLayer #129 has been merged into the dev branch, is there anyone working on a solution based on it?

shelhamer · 2014-07-20T13:13:35Z

To be resolved by #734.

OpenCl kernel compilation errors for android #51

sguada mentioned this issue Feb 3, 2014

Add set_device_id to solver prototxt #67

Closed

shelhamer self-assigned this Feb 10, 2014

shelhamer assigned shelhamer and unassigned shelhamer Feb 16, 2014

mavenlin mentioned this issue Feb 17, 2014

Consolidate train_net and test_net in memory #119

Closed

kloudkl mentioned this issue Feb 26, 2014

Generalize the network into graph of Blob nodes and Layer edges #166

Closed

shelhamer mentioned this issue Feb 26, 2014

Improve Net & Layer Schema #169

Closed

3 tasks

sergeyk added this to the 1.0 milestone Mar 13, 2014

shelhamer mentioned this issue Mar 14, 2014

Add memory data layer to pass data directly into the network #196

Closed

kloudkl mentioned this issue May 5, 2014

Be aware of the competing fast GPU neural network library CXXNET #382

Closed

shelhamer mentioned this issue May 20, 2014

Specify net params in solver; log {Net,Solver} parameters; multiple test nets #404

Merged

shelhamer mentioned this issue Jul 20, 2014

All-in-one nets #734

Merged

shelhamer closed this as completed Jul 20, 2014

davidbau mentioned this issue Oct 3, 2016

TestNDAgainst2D fails intermittently #4083

Open

naibaf7 added a commit that referenced this issue Feb 7, 2017

Merge pull request #57 from DVEfremov/issue-51-2

f00d3aa

OpenCl kernel compilation errors for android #51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate network definitions #57

Consolidate network definitions #57

shelhamer commented Jan 26, 2014

kloudkl commented Jan 26, 2014

sguada commented Jan 27, 2014

Yangqing commented Jan 27, 2014

jeffdonahue commented Jan 28, 2014

shelhamer commented Jan 28, 2014

sguada commented Jan 29, 2014

shelhamer commented Jan 29, 2014

sguada commented Jan 29, 2014

sguada commented Feb 5, 2014

mavenlin commented Feb 16, 2014

kloudkl commented Feb 17, 2014

kloudkl commented Feb 21, 2014

shelhamer commented Jul 20, 2014

Consolidate network definitions #57

Consolidate network definitions #57

Comments

shelhamer commented Jan 26, 2014

kloudkl commented Jan 26, 2014

sguada commented Jan 27, 2014

Yangqing commented Jan 27, 2014

jeffdonahue commented Jan 28, 2014

shelhamer commented Jan 28, 2014

sguada commented Jan 29, 2014

shelhamer commented Jan 29, 2014

sguada commented Jan 29, 2014

sguada commented Feb 5, 2014

mavenlin commented Feb 16, 2014

kloudkl commented Feb 17, 2014

kloudkl commented Feb 21, 2014

shelhamer commented Jul 20, 2014