Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Finetune Individual Layers and Change Model Structure #186

Closed
ghost opened this issue Mar 4, 2014 · 10 comments
Closed

How to Finetune Individual Layers and Change Model Structure #186

ghost opened this issue Mar 4, 2014 · 10 comments
Labels

Comments

@ghost
Copy link

ghost commented Mar 4, 2014

Hi

I would like to use the pretrained model as reference and plug in my own cost function at the end of the sixth layer and update the layer six alone (using gradient descent).

Right now, I am in a point where I can extract features successfully from any layer.
I digged into the cpp code and found that for each layer, a Forward_cpu and Backward_cpu is written. But to the best of my knowledge, this calculates the gradients alone. The ComputeUpdateValue and Update function updates all the layers

To do the updates in the sixth layer alone, or generally if I want to update any single layer, is it possible to reuse these functions or has it to be newly written ?

Any suggestions/help much appreciated, Thanks

@Yangqing
Copy link
Member

Yangqing commented Mar 4, 2014

I assume that setting the learning rate for all layers you do not want to update will suffice in this case.

@shelhamer
Copy link
Member

Right, to expand on Yangqing's comment you can zero the per-layer learning rates for layers below the sixth layer that you do not want to train. See the imagenet example prototxt for setting the layer learning rates.

@ghost
Copy link
Author

ghost commented Mar 8, 2014

Hi Yangqing and Shelhamer

Thankyou so much for the suggestions.

I took a look at the Imagenet.prototxt and Imagenet_solver.prototxt files. But my issue is that

  1. I need to start from the pretrained model. While continuing from the pretrained model, is it possible to alter the learning rates ?
  2. Even if it is possible, the learned model has 8 layers including the last fully connected and Softmax layer which I don't want. So I guess I will have to modify the Backward_cpu, ComputeUpdateValues and Update function. Am I right ?
  3. My cost function is written in python. The gradient calculated has to be updated into the layer six _diff variable if I am right. Is it really possible to copy the gradients calculated from my Python function to that variable ?

Thanks once again

@shelhamer
Copy link
Member

What you're describing is finetuning, which is a key task in Caffe. It's as simple as

finetune_net model_solver.prototxt pretrained_model_weights

once you have edited the model definitions as described in the slide. You can absolutely start from the learned model–that's the point.

For 2: No, a nice point about Caffe is that you define and adjust models without having to code. Just edit the prototxt definition to remove the layers you do not want. They will be ignored in creating the new model by finetuning. As a simple example, try editing imagenet.prototxt to remove the softmax layer–you'll see the output is then the raw scores from the fully-connected innerproduct layer.

For 3: To use all the Caffe training machinery, you should write your own loss layer in c++ and define your model with it. Then you can simply call train_net. However, it is possible to manually train a network through python by calling Forward and Backward and setting the inputs and diffs manually. This isn't yet documented, or really recommended however, so you are on your own for that approach. All I can suggest is to read pycaffe.cpp with particular attention to CaffeBlob and get_diff(). Note that blobs, params, and diffs retrieved in this way are assignable.

Good luck!

@ghost
Copy link
Author

ghost commented Mar 10, 2014

Hi Shelhamer, Thanks a lot for all the info and support. This is amazing. Thanks once again.

@wendlerc
Copy link

A general question about finetuning:
Do layers with unchanged names also get affected by the finetuning process? In the code I observed that the unchanged layers get copied into the new network and afterwards Solver.solve() gets called. Therefore I assume all layers are affected during finetuning, but only the renamed ones start with random weight initialization. Is that correct?

I am asking since I did a small experiment where I visualized the filters of the first convolutional layer of my modified version of imagenet after finetuning and it looked exactly like the picture in the tutorial. (http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb )

@avishayzanbar
Copy link

Hi,
I am a new user of Caffe. I need to get the vectors after the fully connected layers (in my case - after the fc6, fc7 and fc8 layers) in c++.
For example, in python it is done by the command, for example: net.blobs['fc7'].data
Unfortunately, couldn't find the answer so far..
would very appreciate if anyone could help me with that or refer me to a relevant answer.
thank you very much

@wendlerc
Copy link

wendlerc commented Apr 3, 2016

Judging frome the header files:
https://github.com/BVLC/caffe/blob/master/include/caffe/net.hpp
https://github.com/BVLC/caffe/blob/master/include/caffe/blob.hpp
it should be something like: net.blob_by_name("fc7")->data()
However I did not try it.

@avishayzanbar
Copy link

thank you very much..
it works with the given classification.cpp file using the following commands:
const boost::shared_ptr<Blob >& test = net_->blob_by_name("fc7");
const float* test_out = test->cpu_data();

@ProGamerGov
Copy link

ProGamerGov commented Mar 3, 2018

A nice point about Caffe is that you define and adjust models without having to code. Just edit the prototxt definition to remove the layers you do not want. They will be ignored in creating the new model by finetuning.

@shelhamer To ensure that nothing changes in the model, would using base_lr: 0.000000, and max_iter: 1 in the solver.prototxt, do the trick?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants