New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to update layer parameters from python? #1855
Comments
After further inspection, it appears that the |
The backward pass computes the gradients, not the updates, which are left as the responsibility of the solver. However the parameters are exposed in Python
I have a feeling you'll like #1703: it improves the Python interface and lets nets incorporate layers developed in Python. See the Python layer example in #1020 (comment) (replaced by #1703) too. If you hack your loss as a Python layer, you can do the solving as usual. Defining your loss as a Python layer is the simplest in my opinion. See for example the Euclidean loss as a Python layer. We're working on merging and documenting the new pycaffe shortly! |
Thank you for the quick support! The following code appears to be working for me, where the assumption is that I've already properly scaled my diff at the network output layer and done
|
@timothy-shields I had problems with step3, could you please give me a brief introduction on how you achieved this? Thanks. |
@mczmy If your output layer name is, for example,
This will perform backpropagation to update diffs throughout the network. |
@timothy-shields That doesn't quite allow you to use momentum or the weight decay, does it? Those are calculated in the solver step() function. I assume you have to implement the ComputeUpdateValue() function in python. Is that the case? |
@arunmallya Yes, you need to implement your own solver if you do this. |
Hi @timothy-shields timothy-shields I would like to do the samething. Can you please show us how to do it? |
I want to compute the gradient of the new loss function with respect to the last fully connected layer, If I tries to change .diff of the last layer with my computed gradient, the backward applies my computed gradient through the network? In this condition HOW many iterations should the optimization (forward and backward) run? We should control the number of iteration simply by a for in python? Thank you guys! |
@peerajak @mahdaneh If you take this route, you need to implement the stochastic gradient descent solver yourself. Section 5 of ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al.) tells you how to do this. You will learn a lot by doing it yourself. |
Thank you for your answer. But I dont want to reimplement the SGD by myself. @shelhamer said in the above comment: "you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update or the like". What I would like to do is that change diff of the parameters of the last inner product layer with the computed gradient of the loss function with respect to the parameters of the last inner product, and then backpro algorithm done with calling net.backward() based on this. I could not do that? |
@mahdaneh That is of course what you would do. |
Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network? In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me. |
Do you have a email that I can email you? Sent from my IPhone
|
@mahdaneh @timothy-shields Define your loss as Python layer to make use of the rest of the Caffe machinery, like solvers, without giving up the convenience of Python. See for example the Euclidean loss as a Python layer. |
Thank you for your help @timothy-shields and @shelhamer . Since I just utilizing Caffe's layers not solver, I should implement my solver and not use SGDSolver. |
@timothy-shields Thanks for your reply. Where do you store your weights? in the python environment? Hi @mczmy, @mahdaneh . Could you please send me the solver code? peerajak@gmail.com Hi @shelhamer I am trying to use python layer but I have big problem. Python Layer does not support weights. I mean Python layers does not have net.params['layer_name']. Therefore, I have to store the weights on python environment, or save it to the disk. If I want to use caffe mechanism, I need to be able to save weights to the disk. But then again, SGD would not update these weights so I have to update them myself during backward pass. Can I do it this way? Is there another way around this? |
Hi guys. @mczmy, @mahdaneh , I've been facing a lot of problem in implementing my own solver, where I cant figure out how to update my parameters after a net.backward() pass. Could you please send me your solver code too? Thanks |
Hi guys, Then I compute the gradient manually then use But I find that this will never update the net parameters. And I cannot find a update method in I'm using the latest caffe version at the time. Can someone help me ? I have tried to add 'force_backward' to my network, and it doesn't work. |
can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey , force_backward doesnot work |
I did it successfully. I write my own classifier layer with python on On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:
|
Thanks a lot, I am loading a net from the prototxt file in a c++ program. Then I am On Sat, Nov 19, 2016 at 5:29 AM, peerajak notifications@github.com wrote:
Soumali Roychowdhury |
Python Layer can setup its weight parameter's blob, and can back propagate the gradient to this blob, as well as input layer. This caffe updates happened around a year ago. There is no need to write your own C++ solver. Here is how
` |
@peerajak I can't follow you on your above code. Could you give more explanation?
|
zuowang, Please see.
|
@peerajak will (1) work with minibatch learning? Shouldn't all computations of diff be inside backward() method so that to actually work for every kind of training? If I try to get bottom[0].data let's say from inside the backward(), what data will I get while mini batch training? The mean of all the data inside batch or the last blob of the batch? If the last one stands, then the training will not be correct. Am I missing something? |
|
I have a sequence of Caffe layers with no loss layer in a Caffe network. In my python code, I want to repeatedly take the following steps:
The problem I've encountered is that the backward pass (step 4) is not actually updating layer parameters. How can I modify my approach to get Caffe to update the layer parameters when I perform the backward pass?
(I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial, such that fitting them into that framework seems difficult.)
The text was updated successfully, but these errors were encountered: