How to Finetune Individual Layers and Change Model Structure #186

ghost · 2014-03-04T04:35:02Z

Hi

I would like to use the pretrained model as reference and plug in my own cost function at the end of the sixth layer and update the layer six alone (using gradient descent).

Right now, I am in a point where I can extract features successfully from any layer.
I digged into the cpp code and found that for each layer, a Forward_cpu and Backward_cpu is written. But to the best of my knowledge, this calculates the gradients alone. The ComputeUpdateValue and Update function updates all the layers

To do the updates in the sixth layer alone, or generally if I want to update any single layer, is it possible to reuse these functions or has it to be newly written ?

Any suggestions/help much appreciated, Thanks

Yangqing · 2014-03-04T22:27:28Z

I assume that setting the learning rate for all layers you do not want to update will suffice in this case.

shelhamer · 2014-03-04T22:32:50Z

Right, to expand on Yangqing's comment you can zero the per-layer learning rates for layers below the sixth layer that you do not want to train. See the imagenet example prototxt for setting the layer learning rates.

ghost · 2014-03-08T10:06:00Z

Hi Yangqing and Shelhamer

Thankyou so much for the suggestions.

I took a look at the Imagenet.prototxt and Imagenet_solver.prototxt files. But my issue is that

I need to start from the pretrained model. While continuing from the pretrained model, is it possible to alter the learning rates ?
Even if it is possible, the learned model has 8 layers including the last fully connected and Softmax layer which I don't want. So I guess I will have to modify the Backward_cpu, ComputeUpdateValues and Update function. Am I right ?
My cost function is written in python. The gradient calculated has to be updated into the layer six _diff variable if I am right. Is it really possible to copy the gradients calculated from my Python function to that variable ?

Thanks once again

shelhamer · 2014-03-09T22:15:33Z

What you're describing is finetuning, which is a key task in Caffe. It's as simple as

finetune_net model_solver.prototxt pretrained_model_weights

once you have edited the model definitions as described in the slide. You can absolutely start from the learned model–that's the point.

For 2: No, a nice point about Caffe is that you define and adjust models without having to code. Just edit the prototxt definition to remove the layers you do not want. They will be ignored in creating the new model by finetuning. As a simple example, try editing imagenet.prototxt to remove the softmax layer–you'll see the output is then the raw scores from the fully-connected innerproduct layer.

For 3: To use all the Caffe training machinery, you should write your own loss layer in c++ and define your model with it. Then you can simply call train_net. However, it is possible to manually train a network through python by calling Forward and Backward and setting the inputs and diffs manually. This isn't yet documented, or really recommended however, so you are on your own for that approach. All I can suggest is to read pycaffe.cpp with particular attention to CaffeBlob and get_diff(). Note that blobs, params, and diffs retrieved in this way are assignable.

Good luck!

ghost · 2014-03-10T02:10:39Z

Hi Shelhamer, Thanks a lot for all the info and support. This is amazing. Thanks once again.

wendlerc · 2014-07-22T21:22:34Z

A general question about finetuning:
Do layers with unchanged names also get affected by the finetuning process? In the code I observed that the unchanged layers get copied into the new network and afterwards Solver.solve() gets called. Therefore I assume all layers are affected during finetuning, but only the renamed ones start with random weight initialization. Is that correct?

I am asking since I did a small experiment where I visualized the filters of the first convolutional layer of my modified version of imagenet after finetuning and it looked exactly like the picture in the tutorial. (http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb )

avishayzanbar · 2016-04-03T15:24:14Z

Hi,
I am a new user of Caffe. I need to get the vectors after the fully connected layers (in my case - after the fc6, fc7 and fc8 layers) in c++.
For example, in python it is done by the command, for example: net.blobs['fc7'].data
Unfortunately, couldn't find the answer so far..
would very appreciate if anyone could help me with that or refer me to a relevant answer.
thank you very much

wendlerc · 2016-04-03T15:42:48Z

Judging frome the header files:
https://github.com/BVLC/caffe/blob/master/include/caffe/net.hpp
https://github.com/BVLC/caffe/blob/master/include/caffe/blob.hpp
it should be something like: net.blob_by_name("fc7")->data()
However I did not try it.

avishayzanbar · 2016-04-04T07:16:02Z

thank you very much..
it works with the given classification.cpp file using the following commands:
const boost::shared_ptr<Blob >& test = net_->blob_by_name("fc7");
const float* test_out = test->cpu_data();

ProGamerGov · 2018-03-03T00:23:52Z

A nice point about Caffe is that you define and adjust models without having to code. Just edit the prototxt definition to remove the layers you do not want. They will be ignored in creating the new model by finetuning.

@shelhamer To ensure that nothing changes in the model, would using base_lr: 0.000000, and max_iter: 1 in the solver.prototxt, do the trick?

shelhamer added the question label Mar 9, 2014

ghost closed this as completed Mar 10, 2014

Yangqing mentioned this issue Apr 15, 2014

Replacing layers in a trained net #328

Closed

htzheng mentioned this issue Jul 13, 2014

Finetuning out-of-memory and lack of output #682

Closed

y22ma mentioned this issue Aug 6, 2015

Using finetune to remove layers from AlexNet #2874

Closed

williford mentioned this issue Nov 23, 2016

What should I do if I want to use a pretrained model? #5016

Closed

ProGamerGov mentioned this issue Mar 3, 2018

Loss values stay the same for every iteration, with extremely large image sizes jcjohnson/neural-style#428

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Finetune Individual Layers and Change Model Structure #186

How to Finetune Individual Layers and Change Model Structure #186

ghost commented Mar 4, 2014

Yangqing commented Mar 4, 2014

shelhamer commented Mar 4, 2014

ghost commented Mar 8, 2014

shelhamer commented Mar 9, 2014

ghost commented Mar 10, 2014

wendlerc commented Jul 22, 2014

avishayzanbar commented Apr 3, 2016

wendlerc commented Apr 3, 2016

avishayzanbar commented Apr 4, 2016

ProGamerGov commented Mar 3, 2018 •

edited

How to Finetune Individual Layers and Change Model Structure #186

How to Finetune Individual Layers and Change Model Structure #186

Comments

ghost commented Mar 4, 2014

Yangqing commented Mar 4, 2014

shelhamer commented Mar 4, 2014

ghost commented Mar 8, 2014

shelhamer commented Mar 9, 2014

ghost commented Mar 10, 2014

wendlerc commented Jul 22, 2014

avishayzanbar commented Apr 3, 2016

wendlerc commented Apr 3, 2016

avishayzanbar commented Apr 4, 2016

ProGamerGov commented Mar 3, 2018 • edited

ProGamerGov commented Mar 3, 2018 •

edited