How to update layer parameters from python? #1855

timothy-shields · 2015-02-10T16:00:41Z

I have a sequence of Caffe layers with no loss layer in a Caffe network. In my python code, I want to repeatedly take the following steps:

Do a forward pass through the network.
Compute my own losses and gradients based on the output data blob of the final layer of the network.
Set the diff blob of the final layer of the network.
Do a backward pass through the network, updating layer parameters.
(Occassionally) Save the network to persist the learned parameters.

The problem I've encountered is that the backward pass (step 4) is not actually updating layer parameters. How can I modify my approach to get Caffe to update the layer parameters when I perform the backward pass?

(I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial, such that fitting them into that framework seems difficult.)

timothy-shields · 2015-02-10T22:19:14Z

After further inspection, it appears that the Update method on the Net is what needs to be called in my Step 4. The problem is that this is not available on the Python Net object and is only callable via the SGDSolver. Would it be possible to expose the Update method on the Net directly, so that a custom solver in Python is made possible?

shelhamer · 2015-02-11T01:16:32Z

Do a backward pass through the network, ~~updating layer parameters.~~

The backward pass computes the gradients, not the updates, which are left as the responsibility of the solver. However the parameters are exposed in Python caffe.Nets as the net.params dictionary. These are mutable; you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update or the like.

I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial

I have a feeling you'll like #1703: it improves the Python interface and lets nets incorporate layers developed in Python. See the Python layer example in #1020 (comment) (replaced by #1703) too. If you hack your loss as a Python layer, you can do the solving as usual.

Defining your loss as a Python layer is the simplest in my opinion. See for example the Euclidean loss as a Python layer.

We're working on merging and documenting the new pycaffe shortly!

timothy-shields · 2015-02-11T17:53:36Z

Thank you for the quick support! The following code appears to be working for me, where the assumption is that I've already properly scaled my diff at the network output layer and done net.backward(...).

for layer in net.layers:
    for blob in layer.blobs:
        blob.data[...] -= blob.diff

mczmy · 2015-02-21T05:34:26Z

@timothy-shields I had problems with step3, could you please give me a brief introduction on how you achieved this? Thanks.

timothy-shields · 2015-02-27T22:49:35Z

@mczmy If your output layer name is, for example, fc9, then you can do a backward pass (as in step 3) as follows, where diff is a 4D numpy array of the appropriate shape.

net.backward(fc9=diff)

This will perform backpropagation to update diffs throughout the network.

arunmallya · 2015-03-25T21:32:52Z

@timothy-shields That doesn't quite allow you to use momentum or the weight decay, does it? Those are calculated in the solver step() function. I assume you have to implement the ComputeUpdateValue() function in python. Is that the case?

timothy-shields · 2015-03-25T22:06:58Z

@arunmallya Yes, you need to implement your own solver if you do this.

peerajak · 2015-08-26T08:27:09Z

Hi @timothy-shields timothy-shields

I would like to do the samething. Can you please show us how to do it?

mahdaneh · 2015-10-01T13:57:57Z

I want to compute the gradient of the new loss function with respect to the last fully connected layer, If I tries to change .diff of the last layer with my computed gradient, the backward applies my computed gradient through the network? In this condition HOW many iterations should the optimization (forward and backward) run? We should control the number of iteration simply by a for in python?
Any help appreciate.

Thank you guys!

timothy-shields · 2015-10-01T16:15:24Z

@peerajak @mahdaneh If you take this route, you need to implement the stochastic gradient descent solver yourself. Section 5 of ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al.) tells you how to do this. You will learn a lot by doing it yourself.

mahdaneh · 2015-10-01T17:53:14Z

Thank you for your answer. But I dont want to reimplement the SGD by myself. @shelhamer said in the above comment: "you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update or the like". What I would like to do is that change diff of the parameters of the last inner product layer with the computed gradient of the loss function with respect to the parameters of the last inner product, and then backpro algorithm done with calling net.backward() based on this. I could not do that?

timothy-shields · 2015-10-01T18:05:24Z

@mahdaneh That is of course what you would do. net.backward(...) will set the diff blobs throughout the network. But then you need to actually apply those diff blobs to update the parameters. I don't believe you'll be able to use Caffe as the SGD solver, so you'll have to implement it yourself or find a general purpose library that supplies the SGD algorithm.

mahdaneh · 2015-10-01T18:23:04Z

Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network?

In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me.

mczmy · 2015-10-01T18:55:58Z

Hey mahdaneh,

Do you have a email that I can email you?

Sent from my IPhone

On Oct 1, 2015, at 11:23 AM, mahdaneh notifications@github.com wrote:

Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network?

In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me.

—
Reply to this email directly or view it on GitHub.

shelhamer · 2015-10-01T23:10:05Z

@mahdaneh @timothy-shields Define your loss as Python layer to make use of the rest of the Caffe machinery, like solvers, without giving up the convenience of Python. See for example the Euclidean loss as a Python layer.

mahdaneh · 2015-10-02T19:24:29Z

mabbulv@yahoo.com

mahdaneh · 2015-10-04T20:36:48Z

Thank you for your help @timothy-shields and @shelhamer . Since I just utilizing Caffe's layers not solver, I should implement my solver and not use SGDSolver.

peerajak · 2015-10-27T02:18:27Z

@timothy-shields Thanks for your reply. Where do you store your weights? in the python environment?

Hi @mczmy, @mahdaneh . Could you please send me the solver code? peerajak@gmail.com

Hi @shelhamer I am trying to use python layer but I have big problem. Python Layer does not support weights. I mean Python layers does not have net.params['layer_name']. Therefore, I have to store the weights on python environment, or save it to the disk. If I want to use caffe mechanism, I need to be able to save weights to the disk. But then again, SGD would not update these weights so I have to update them myself during backward pass. Can I do it this way? Is there another way around this?

kchhhk · 2015-11-10T12:29:27Z

Hi guys. @mczmy, @mahdaneh , I've been facing a lot of problem in implementing my own solver, where I cant figure out how to update my parameters after a net.backward() pass. Could you please send me your solver code too? Thanks
kabir.chhabra12@gmail.com

zeakey · 2016-09-22T12:27:17Z

Hi guys,
I feed image data to my network and get the output by net.forward(input_data)

Then I compute the gradient manually then use net.backward(gradient) to back pass gradient.

But I find that this will never update the net parameters.

And I cannot find a update method in caffe.Net or caffe.Solver, I just want a standard SGD to update.

I'm using the latest caffe version at the time.

Can someone help me ?

I have tried to add 'force_backward' to my network, and it doesn't work.

@shelhamer @arunmallya @mczmy

Soumali13 · 2016-11-18T17:39:18Z

can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey , force_backward doesnot work

peerajak · 2016-11-19T04:29:25Z

I did it successfully. I write my own classifier layer with python on
caffe. I will write a how to when have time.

On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:

can anyone tell me how to calculate the gradient manually for the last
layer of caffe? I have the same problem as @zeakey
https://github.com/zeakey , force_backward doesnot work

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1855 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AE8qCe1zeKPyIBBm9tv5EIwVyVQnI1qYks5q_eLkgaJpZM4DeZbO
.

Soumali13 · 2016-11-19T08:39:51Z

Thanks a lot,

I am loading a net from the prototxt file in a c++ program. Then I am
calling forward() and backward() on the inputs. On every input and every
iteration the forward() works correctly but the backward() doesnot work.
Can you tell me why? The gradients in the cpu->diff() are always 0.

On Sat, Nov 19, 2016 at 5:29 AM, peerajak notifications@github.com wrote:

I did it successfully. I write my own classifier layer with python on
caffe. I will write a how to when have time.

On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:

can anyone tell me how to calculate the gradient manually for the last
layer of caffe? I have the same problem as @zeakey
https://github.com/zeakey , force_backward doesnot work

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1855 (comment), or
mute
the thread
<https://github.com/notifications/unsubscribe-auth/
AE8qCe1zeKPyIBBm9tv5EIwVyVQnI1qYks5q_eLkgaJpZM4DeZbO>
.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1855 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AWQcOPeM5NZtVY54V6bUylW2JgTq3neWks5q_ntCgaJpZM4DeZbO
.

Soumali Roychowdhury
PhD Student
IMT Institute for Advanced Studies
Piazza San Francesco, 19
55100 Lucca – Italy
Tel: +39 3892450516
E-mail: soumali.roychowdhury@imtlucca.it
soumali.roychowdhury@imtlucca.it

peerajak · 2017-02-20T06:05:18Z

Python Layer can setup its weight parameter's blob, and can back propagate the gradient to this blob, as well as input layer. This caffe updates happened around a year ago. There is no need to write your own C++ solver. Here is how
`

    def setup(self, bottom, top):
           % This is to initialize your weight layer
           self.blobs.add_blob(some_int_len_value)  
            % This is how to initialize with array
           self.blobs[0].data[...] = some_np_array_of_correct_size 

     def reshape(self, bottom, top):
          self.diff = np.zeros_like(bottom[0].data)  % set up diff. 
          top[0].reshape(1)    % in my case its loss layer. 

     def forward(self, bottom, top): 
          w = self.blobs[0].data   %This is how to get the value from param

     def backward(self, top, propagate_down, bottom):  
          % This is how to bp w.r.t input      
          bottom[0].diff[...] = some_nparray_of_size_bottom  
          % This is how to bp w.r.t this weight 
          self.blobs[0].diff[...] = some_nparray_of_size_w

`

zuowang · 2017-02-20T06:38:06Z

@peerajak I can't follow you on your above code. Could you give more explanation?
Currently I do manual update like following, could you tell me how to make it work in your way?
Thanks a lot!

# Manual SGD
solver = None
solver = caffe.SGDSolver('examples/mnist/lenet_solver.prototxt')
base_lr = 0.01
momentum = 0.9
weight_decay = 0.0005
lr_w_mult = 1
lr_b_mult = 2
gamma = 0.1
stepsize = 5000

momentum_hist = {}
for layer in solver.net.params:
    m_w = np.zeros_like(solver.net.params[layer][0].data)
    m_b = np.zeros_like(solver.net.params[layer][1].data)
    momentum_hist[layer] = [m_w, m_b]

for it in range(1, niter+1):
    solver.net.forward()  # fprop
    solver.net.backward()  # bprop
    # *manually update*
    for layer in solver.net.params:
        momentum_hist[layer][0] = momentum_hist[layer][0] * momentum + (solver.net.params[layer][0].diff + weight_decay *
                                                       solver.net.params[layer][0].data) * base_lr * lr_w_mult
        momentum_hist[layer][1] = momentum_hist[layer][1] * momentum + (solver.net.params[layer][1].diff + weight_decay *
                                                       solver.net.params[layer][1].data) * base_lr * lr_b_mult
        solver.net.params[layer][0].data[...] -= momentum_hist[layer][0]
        solver.net.params[layer][1].data[...] -= momentum_hist[layer][1]
        solver.net.params[layer][0].diff[...] *= 0
        solver.net.params[layer][1].diff[...] *= 0
    base_lr = base_lr * np.power(gamma, (np.floor(it / stepsize)))

peerajak · 2017-02-20T09:42:40Z

zuowang,

Please see.

http://chrischoy.github.io/research/caffe-python-layer/
Give the python layer parameter/weight blobs. #2944
and you will recognize my code above.

VasLem · 2017-06-16T10:42:50Z

@peerajak will (1) work with minibatch learning? Shouldn't all computations of diff be inside backward() method so that to actually work for every kind of training? If I try to get bottom[0].data let's say from inside the backward(), what data will I get while mini batch training? The mean of all the data inside batch or the last blob of the batch? If the last one stands, then the training will not be correct. Am I missing something?

peerajak · 2017-06-16T14:09:17Z

@VasLem

Yes. It works for minibatch.
Normally, yes. You calculate the diff during backward(). I found out, however, that I can save some matrices calculated during the forward() to the class variable, and later calculate the diff based on those saved matrices. In my case, my layer is the loss layer. I am not sure if this trick can be apply to non loss layer.
I never call bottom[0].data during backward(), so I am not sure.

erictzeng closed this as completed Feb 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to update layer parameters from python? #1855

How to update layer parameters from python? #1855

timothy-shields commented Feb 10, 2015

timothy-shields commented Feb 10, 2015

shelhamer commented Feb 11, 2015

timothy-shields commented Feb 11, 2015

mczmy commented Feb 21, 2015

timothy-shields commented Feb 27, 2015

arunmallya commented Mar 25, 2015

timothy-shields commented Mar 25, 2015

peerajak commented Aug 26, 2015

mahdaneh commented Oct 1, 2015

timothy-shields commented Oct 1, 2015

mahdaneh commented Oct 1, 2015

timothy-shields commented Oct 1, 2015

mahdaneh commented Oct 1, 2015

mczmy commented Oct 1, 2015

shelhamer commented Oct 1, 2015

mahdaneh commented Oct 2, 2015

mahdaneh commented Oct 4, 2015

peerajak commented Oct 27, 2015

kchhhk commented Nov 10, 2015

zeakey commented Sep 22, 2016 •

edited

Soumali13 commented Nov 18, 2016

peerajak commented Nov 19, 2016

Soumali13 commented Nov 19, 2016

peerajak commented Feb 20, 2017 •

edited

zuowang commented Feb 20, 2017 •

edited

peerajak commented Feb 20, 2017 •

edited

VasLem commented Jun 16, 2017 •

edited

peerajak commented Jun 16, 2017

How to update layer parameters from python? #1855

How to update layer parameters from python? #1855

Comments

timothy-shields commented Feb 10, 2015

timothy-shields commented Feb 10, 2015

shelhamer commented Feb 11, 2015

timothy-shields commented Feb 11, 2015

mczmy commented Feb 21, 2015

timothy-shields commented Feb 27, 2015

arunmallya commented Mar 25, 2015

timothy-shields commented Mar 25, 2015

peerajak commented Aug 26, 2015

mahdaneh commented Oct 1, 2015

timothy-shields commented Oct 1, 2015

mahdaneh commented Oct 1, 2015

timothy-shields commented Oct 1, 2015

mahdaneh commented Oct 1, 2015

mczmy commented Oct 1, 2015

shelhamer commented Oct 1, 2015

mahdaneh commented Oct 2, 2015

mahdaneh commented Oct 4, 2015

peerajak commented Oct 27, 2015

kchhhk commented Nov 10, 2015

zeakey commented Sep 22, 2016 • edited

Soumali13 commented Nov 18, 2016

peerajak commented Nov 19, 2016

Soumali13 commented Nov 19, 2016

peerajak commented Feb 20, 2017 • edited

zuowang commented Feb 20, 2017 • edited

peerajak commented Feb 20, 2017 • edited

VasLem commented Jun 16, 2017 • edited

peerajak commented Jun 16, 2017

zeakey commented Sep 22, 2016 •

edited

peerajak commented Feb 20, 2017 •

edited

zuowang commented Feb 20, 2017 •

edited

peerajak commented Feb 20, 2017 •

edited

VasLem commented Jun 16, 2017 •

edited