Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to update layer parameters from python? #1855

Closed
timothy-shields opened this issue Feb 10, 2015 · 28 comments
Closed

How to update layer parameters from python? #1855

timothy-shields opened this issue Feb 10, 2015 · 28 comments

Comments

@timothy-shields
Copy link

I have a sequence of Caffe layers with no loss layer in a Caffe network. In my python code, I want to repeatedly take the following steps:

  1. Do a forward pass through the network.
  2. Compute my own losses and gradients based on the output data blob of the final layer of the network.
  3. Set the diff blob of the final layer of the network.
  4. Do a backward pass through the network, updating layer parameters.
  5. (Occassionally) Save the network to persist the learned parameters.

The problem I've encountered is that the backward pass (step 4) is not actually updating layer parameters. How can I modify my approach to get Caffe to update the layer parameters when I perform the backward pass?

(I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial, such that fitting them into that framework seems difficult.)

@timothy-shields
Copy link
Author

After further inspection, it appears that the Update method on the Net is what needs to be called in my Step 4. The problem is that this is not available on the Python Net object and is only callable via the SGDSolver. Would it be possible to expose the Update method on the Net directly, so that a custom solver in Python is made possible?

@shelhamer
Copy link
Member

Do a backward pass through the network, updating layer parameters.

The backward pass computes the gradients, not the updates, which are left as the responsibility of the solver. However the parameters are exposed in Python caffe.Nets as the net.params dictionary. These are mutable; you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update or the like.

I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial

I have a feeling you'll like #1703: it improves the Python interface and lets nets incorporate layers developed in Python. See the Python layer example in #1020 (comment) (replaced by #1703) too. If you hack your loss as a Python layer, you can do the solving as usual.

Defining your loss as a Python layer is the simplest in my opinion. See for example the Euclidean loss as a Python layer.

We're working on merging and documenting the new pycaffe shortly!

@timothy-shields
Copy link
Author

Thank you for the quick support! The following code appears to be working for me, where the assumption is that I've already properly scaled my diff at the network output layer and done net.backward(...).

for layer in net.layers:
    for blob in layer.blobs:
        blob.data[...] -= blob.diff

@mczmy
Copy link

mczmy commented Feb 21, 2015

@timothy-shields I had problems with step3, could you please give me a brief introduction on how you achieved this? Thanks.

@timothy-shields
Copy link
Author

@mczmy If your output layer name is, for example, fc9, then you can do a backward pass (as in step 3) as follows, where diff is a 4D numpy array of the appropriate shape.

net.backward(fc9=diff)

This will perform backpropagation to update diffs throughout the network.

@arunmallya
Copy link

@timothy-shields That doesn't quite allow you to use momentum or the weight decay, does it? Those are calculated in the solver step() function. I assume you have to implement the ComputeUpdateValue() function in python. Is that the case?

@timothy-shields
Copy link
Author

@arunmallya Yes, you need to implement your own solver if you do this.

@peerajak
Copy link

Hi @timothy-shields timothy-shields

I would like to do the samething. Can you please show us how to do it?

@mahdaneh
Copy link

mahdaneh commented Oct 1, 2015

I want to compute the gradient of the new loss function with respect to the last fully connected layer, If I tries to change .diff of the last layer with my computed gradient, the backward applies my computed gradient through the network? In this condition HOW many iterations should the optimization (forward and backward) run? We should control the number of iteration simply by a for in python?
Any help appreciate.

Thank you guys!

@timothy-shields
Copy link
Author

@peerajak @mahdaneh If you take this route, you need to implement the stochastic gradient descent solver yourself. Section 5 of ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al.) tells you how to do this. You will learn a lot by doing it yourself.

@mahdaneh
Copy link

mahdaneh commented Oct 1, 2015

Thank you for your answer. But I dont want to reimplement the SGD by myself. @shelhamer said in the above comment: "you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update or the like". What I would like to do is that change diff of the parameters of the last inner product layer with the computed gradient of the loss function with respect to the parameters of the last inner product, and then backpro algorithm done with calling net.backward() based on this. I could not do that?

@timothy-shields
Copy link
Author

@mahdaneh That is of course what you would do. net.backward(...) will set the diff blobs throughout the network. But then you need to actually apply those diff blobs to update the parameters. I don't believe you'll be able to use Caffe as the SGD solver, so you'll have to implement it yourself or find a general purpose library that supplies the SGD algorithm.

@mahdaneh
Copy link

mahdaneh commented Oct 1, 2015

Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network?

In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me.

@mczmy
Copy link

mczmy commented Oct 1, 2015

Hey mahdaneh,

Do you have a email that I can email you?

Sent from my IPhone

On Oct 1, 2015, at 11:23 AM, mahdaneh notifications@github.com wrote:

Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network?

In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me.


Reply to this email directly or view it on GitHub.

@shelhamer
Copy link
Member

@mahdaneh @timothy-shields Define your loss as Python layer to make use of the rest of the Caffe machinery, like solvers, without giving up the convenience of Python. See for example the Euclidean loss as a Python layer.

@mahdaneh
Copy link

mahdaneh commented Oct 2, 2015

mabbulv@yahoo.com

@mahdaneh
Copy link

mahdaneh commented Oct 4, 2015

Thank you for your help @timothy-shields and @shelhamer . Since I just utilizing Caffe's layers not solver, I should implement my solver and not use SGDSolver.

@peerajak
Copy link

@timothy-shields Thanks for your reply. Where do you store your weights? in the python environment?

Hi @mczmy, @mahdaneh . Could you please send me the solver code? peerajak@gmail.com

Hi @shelhamer I am trying to use python layer but I have big problem. Python Layer does not support weights. I mean Python layers does not have net.params['layer_name']. Therefore, I have to store the weights on python environment, or save it to the disk. If I want to use caffe mechanism, I need to be able to save weights to the disk. But then again, SGD would not update these weights so I have to update them myself during backward pass. Can I do it this way? Is there another way around this?

@kchhhk
Copy link

kchhhk commented Nov 10, 2015

Hi guys. @mczmy, @mahdaneh , I've been facing a lot of problem in implementing my own solver, where I cant figure out how to update my parameters after a net.backward() pass. Could you please send me your solver code too? Thanks
kabir.chhabra12@gmail.com

@zeakey
Copy link

zeakey commented Sep 22, 2016

Hi guys,
I feed image data to my network and get the output by net.forward(input_data)

Then I compute the gradient manually then use net.backward(gradient) to back pass gradient.

But I find that this will never update the net parameters.

And I cannot find a update method in caffe.Net or caffe.Solver, I just want a standard SGD to update.

I'm using the latest caffe version at the time.

Can someone help me ?

I have tried to add 'force_backward' to my network, and it doesn't work.

@shelhamer @arunmallya @mczmy

@Soumali13
Copy link

can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey , force_backward doesnot work

@peerajak
Copy link

I did it successfully. I write my own classifier layer with python on
caffe. I will write a how to when have time.

On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:

can anyone tell me how to calculate the gradient manually for the last
layer of caffe? I have the same problem as @zeakey
https://github.com/zeakey , force_backward doesnot work


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1855 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AE8qCe1zeKPyIBBm9tv5EIwVyVQnI1qYks5q_eLkgaJpZM4DeZbO
.

@Soumali13
Copy link

Thanks a lot,

I am loading a net from the prototxt file in a c++ program. Then I am
calling forward() and backward() on the inputs. On every input and every
iteration the forward() works correctly but the backward() doesnot work.
Can you tell me why? The gradients in the cpu->diff() are always 0.

On Sat, Nov 19, 2016 at 5:29 AM, peerajak notifications@github.com wrote:

I did it successfully. I write my own classifier layer with python on
caffe. I will write a how to when have time.

On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:

can anyone tell me how to calculate the gradient manually for the last
layer of caffe? I have the same problem as @zeakey
https://github.com/zeakey , force_backward doesnot work


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1855 (comment), or
mute
the thread
<https://github.com/notifications/unsubscribe-auth/
AE8qCe1zeKPyIBBm9tv5EIwVyVQnI1qYks5q_eLkgaJpZM4DeZbO>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1855 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AWQcOPeM5NZtVY54V6bUylW2JgTq3neWks5q_ntCgaJpZM4DeZbO
.

Soumali Roychowdhury
PhD Student
IMT Institute for Advanced Studies
Piazza San Francesco, 19
55100 Lucca – Italy
Tel: +39 3892450516
E-mail: soumali.roychowdhury@imtlucca.it
soumali.roychowdhury@imtlucca.it

@peerajak
Copy link

peerajak commented Feb 20, 2017

Python Layer can setup its weight parameter's blob, and can back propagate the gradient to this blob, as well as input layer. This caffe updates happened around a year ago. There is no need to write your own C++ solver. Here is how
`

    def setup(self, bottom, top):
           % This is to initialize your weight layer
           self.blobs.add_blob(some_int_len_value)  
            % This is how to initialize with array
           self.blobs[0].data[...] = some_np_array_of_correct_size 

     def reshape(self, bottom, top):
          self.diff = np.zeros_like(bottom[0].data)  % set up diff. 
          top[0].reshape(1)    % in my case its loss layer. 

     def forward(self, bottom, top): 
          w = self.blobs[0].data   %This is how to get the value from param

     def backward(self, top, propagate_down, bottom):  
          % This is how to bp w.r.t input      
          bottom[0].diff[...] = some_nparray_of_size_bottom  
          % This is how to bp w.r.t this weight 
          self.blobs[0].diff[...] = some_nparray_of_size_w 

`

@zuowang
Copy link

zuowang commented Feb 20, 2017

@peerajak I can't follow you on your above code. Could you give more explanation?
Currently I do manual update like following, could you tell me how to make it work in your way?
Thanks a lot!

# Manual SGD
solver = None
solver = caffe.SGDSolver('examples/mnist/lenet_solver.prototxt')
base_lr = 0.01
momentum = 0.9
weight_decay = 0.0005
lr_w_mult = 1
lr_b_mult = 2
gamma = 0.1
stepsize = 5000

momentum_hist = {}
for layer in solver.net.params:
    m_w = np.zeros_like(solver.net.params[layer][0].data)
    m_b = np.zeros_like(solver.net.params[layer][1].data)
    momentum_hist[layer] = [m_w, m_b]

for it in range(1, niter+1):
    solver.net.forward()  # fprop
    solver.net.backward()  # bprop
    # *manually update*
    for layer in solver.net.params:
        momentum_hist[layer][0] = momentum_hist[layer][0] * momentum + (solver.net.params[layer][0].diff + weight_decay *
                                                       solver.net.params[layer][0].data) * base_lr * lr_w_mult
        momentum_hist[layer][1] = momentum_hist[layer][1] * momentum + (solver.net.params[layer][1].diff + weight_decay *
                                                       solver.net.params[layer][1].data) * base_lr * lr_b_mult
        solver.net.params[layer][0].data[...] -= momentum_hist[layer][0]
        solver.net.params[layer][1].data[...] -= momentum_hist[layer][1]
        solver.net.params[layer][0].diff[...] *= 0
        solver.net.params[layer][1].diff[...] *= 0
    base_lr = base_lr * np.power(gamma, (np.floor(it / stepsize)))

@peerajak
Copy link

peerajak commented Feb 20, 2017

zuowang,

Please see.

  1. http://chrischoy.github.io/research/caffe-python-layer/
  2. Give the python layer parameter/weight blobs. #2944
    and you will recognize my code above.

@VasLem
Copy link

VasLem commented Jun 16, 2017

@peerajak will (1) work with minibatch learning? Shouldn't all computations of diff be inside backward() method so that to actually work for every kind of training? If I try to get bottom[0].data let's say from inside the backward(), what data will I get while mini batch training? The mean of all the data inside batch or the last blob of the batch? If the last one stands, then the training will not be correct. Am I missing something?

@peerajak
Copy link

@VasLem

  1. Yes. It works for minibatch.
  2. Normally, yes. You calculate the diff during backward(). I found out, however, that I can save some matrices calculated during the forward() to the class variable, and later calculate the diff based on those saved matrices. In my case, my layer is the loss layer. I am not sure if this trick can be apply to non loss layer.
  3. I never call bottom[0].data during backward(), so I am not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests