Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python manual sgd #3959

Closed
aniketvartak opened this issue Apr 8, 2016 · 10 comments · Fixed by #6238
Closed

Python manual sgd #3959

aniketvartak opened this issue Apr 8, 2016 · 10 comments · Fixed by #6238

Comments

@aniketvartak
Copy link

I am trying to implement the SGD functionality to update weights in python manually in caffe python instead of using solver.step() function. The goal is to match the weight updates after doing solver.step() and that by manually updating the weights.

The setup is as follows: Use MNIST data. Set the random seed in solver.prototxt as: random_seed: 52. Make sure momentum: 0.0, weight_decay: 0.0 and, base_lr: 0.01, lr_policy: "fixed". This is done so that, I can simply implement the SGD update equation (with out momentum, regularization etc.). The equation is simply: W_t+1 = W_t - mu * W_t_diff

Following are the two tests:

Test1: Using pycaffe's forward() and backward() to calculate the forward propagation and backward propagation. For each layer that contain weights I do:

for k in weight_layer_idx:
    solver.net.layers[k].blobs[0].diff[...] *= lr # weights
    solver.net.layers[k].blobs[1].diff[...] *= lr # biases

Next, update the weight/biases as:

    solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
    solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

I run this for 5 iterations.

Test2: Run caffe's solver.step(5).

Now, what I expect is the two tests should yield exactly same weights after the two iterations.

I save the weights values after each of the above tests and calculate the norm difference between the weight vectors by the two tests, and I see that they are not bit-exact. Can some one spot something that I might be missing?

Following is the entire code for reference:

import caffe
caffe.set_device(0)
caffe.set_mode_gpu()
import numpy as np
niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')

# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_solver_step = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
lr = 0.01

# Get layer types
layer_types = []
for ll in solver.net.layers:
    layer_types.append(ll.type)

# Get the indices of layers that have weights in them
weight_layer_idx = [idx for idx,l in enumerate(layer_types) if 'Convolution' in l or 'InnerProduct' in l]

for it in range(1, niter+1):
    solver.net.forward()  # fprop
    solver.net.backward()  # bprop
    for k in weight_layer_idx:
        solver.net.layers[k].blobs[0].diff[...] *= lr
        solver.net.layers[k].blobs[1].diff[...] *= lr
        solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
        solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

# save the weights to compare later
w_fwdbwd_update = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_fwdbwd_update = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)

The last line that compares the weights with the two tests produces:

after iter 5 : weight diff: 0.000203027766144 and bias diff: 1.78390789051e-05

where as I expect this difference to be 0.0

Any ideas?


Also note,
If I run these two tests for only 1 iteration, I get exactly matching weight vectors from all layers, but not the subsequent iterations.

@seanbell
Copy link

seanbell commented Apr 24, 2016

It looks like you're not clearing the diff in each blob. If you want to match the C++ code, you need to clear the blobs manually before each forward/backward pass (set them all to 0): https://github.com/BVLC/caffe/blob/master/src/caffe/solver.cpp#L203

It's not cleared automatically in order to support gradient accumulation (the iter_size solver parameter).

@nathanin
Copy link

@aniketvartak Hi, I'm trying to do something similar because I want to perform forward pass, then update some layers, then perform backpropagation, then update the weights. My data is manually loaded MNIST images and their labels. Manually loading the data is important for my eventual application. I use an Input layer for this: n.data, n.labels = L.Input(shape=[dict(dim=[64,1,28,28]), dict(dim=[64])], transform_param=dict(scale=1./255), ntop=2)

To do the learning, I am trying to perform learning without calling solver.step()

for it in range(100):
    # Manually load data - returns batch = ndarray(64,1,28,28), labels = ndarray(64)
    batch, labels = zip(*(get_random_digit() for _ in range(64)))

    # Set data into network
    solver.net.blobs['data'].data[...] = batch
    solver.net.blobs['labels'].data[...] = labels

    solver.net.forward()
    solver.net.backward()

    ## Test 2 code or Test 1 code ....

My problem is that even when I run for many iterations, no learning takes place. In fact, the first iteration will have at least different classes assigned for each image in the mini-batch. After the first iteration, all labels seem to be some random class.

At first I thought it was a problem with my inputs. So I dropped the update snippet into the LeNet example (here). Even here, accuracy stays the same over 1000 iterations, so no learning occurs.

@seanbell Any ideas?

@automan000
Copy link

automan000 commented May 12, 2017

@nathanin I think you should update weights in each blob manually. net.backward() doesn't do this for you.

        net.backward()
        # manually update
        for layer in net.layers:
            for blob in layer.blobs:
                blob.data[...] -= current_lr * blob.diff

@nathanin
Copy link

@automan000

Hi, thanks for the input. Manually updating like this I found many problems. There were a couple ways to get around. The one I chose was to expose solver::ApplyUpdate() in the python interface. This way, the equivalent to solver.step(1) is:

solver.net.forward()
solver.net.backward()
solver.update()

Had to move the iter_ increment to inside the sgd_solver.. but I haven't really found a problem with doing that.

@automan000
Copy link

@nathanin Thanks for your sharing.

@xiao7199
Copy link

@nathanin
Hi, I try to link the ApplyUpdate in _caffe.cpp but this method is protected, how do you solve it?

@nathanin
Copy link

@xiao7199 I moved it to be public in include/caffe/sgd_solvers.hpp .... careful that I don't know if this has consequences otherwise. If you find issue with this solution, please let me know. Good luck :)

@mitar
Copy link
Contributor

mitar commented Oct 28, 2017

I did changes @nathanin described above in this fork: https://github.com/mitar/caffe

@Noiredd
Copy link
Member

Noiredd commented Feb 14, 2018

Closing as this is not related to Caffe development; also the original question seems to have been answered.

@mitar
Copy link
Contributor

mitar commented Feb 14, 2018

I have opened this pull request with a fix for this issue: #6238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants