Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pycaffe backward() #583

Closed
to3i opened this issue Jul 2, 2014 · 12 comments
Closed

Using pycaffe backward() #583

to3i opened this issue Jul 2, 2014 · 12 comments

Comments

@to3i
Copy link
Contributor

to3i commented Jul 2, 2014

In the case of the imagenet caffenet example I want to use the python wrapper to compute a single forward pass (similar to the predict method in classifer.py) followed by a single backward pass.

Classify

    caffe_in = np.asarray([self.preprocess(self.inputs[0], in_)
                for in_ in inputs])
    out = self.forward_all(**{self.inputs[0]: caffe_in})

Compute backward pass

    bottom_diff = self.backward(**{self.outputs[0]: out['prob']})

My issue is that the backward method returns a bottom_diff['data'] that is all zeros. I would expect to see some more gradient information passed through the network. For debugging I am using the cat example and the classify part of the code is outputting the probabilities as featured in the ipython notebook example. Most likely I missed something important going from forward to backward. Do I have to set a learning rate somehow? It would be great if someone could give me a pointer. Thanks!

@shelhamer
Copy link
Member

Try adding force_backward: true to your model definition. By default Caffe does not backpropagate to the data since it has no parameters. Likewise, if there is no loss then there will be no gradients to compute.

name: "CaffeNet"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 227
input_dim: 227
force_backward: true
...

p.s. Note that bottom_diff is the bottom derivative for the entire network. The gradients along the way are available at through the blobs interface e.g. net.blobs['conv1'].diff (and that is without force_backward).

@to3i
Copy link
Contributor Author

to3i commented Jul 3, 2014

@shelhamer Thanks, that solved the issue! I did not consider that computing the bottom_diff is an unnecessary computation in most cases.

While looking into classifier.py I might have spotted a bug in the predict() method:

Scale to standardize input dimensions.
    inputs = np.asarray([caffe.io.resize_image(im, self.image_dims)
                         for im in inputs])

    if oversample:
        # Generate center, corner, and mirrored crops.
        inputs = caffe.io.oversample(inputs, self.crop_dims)

In the case of the imagenet_deploy.prototxt the self.image_dims are set to 227. Before the oversampling is taking place the image is resized to 227x227 and since self.crop is as well 227 only the mirroring is working as expected. I solved this by passing a separate resize-size var instead of using self.image_dims.

@to3i
Copy link
Contributor Author

to3i commented Jul 3, 2014

False alarm. Looking at the code again I see now that image_dims can be passed using the Classifier init. In order to use oversampling, you just need to init the Classifier correctly and everything is fine:

net = caffe.Classifier(MODEL_FILE, PRETRAINED, image_dims=(256, 256))

This should probably be added in the imagenet_classification.ipynb.

@shelhamer
Copy link
Member

@to3i thanks for catching that image_dims should have been set in the example. Fixed in e4dece5.

@shuangao
Copy link

Hi @shelhamer @to3i , I have a similar problem while using @to3i 's code. After add the statement force_backward: true in my model file, it says

135             if diff.shape[0] != self.blobs[top].num:
136                 raise Exception('Diff is not batch sized')
137             self.blobs[top].diff[...] = diff
138 
Exception: Diff is not batch sized

Then I turn to

bottom_diff = net.backward(**{net.outputs[0]: net.blobs['prob'].diff})

I checked the net.blobs['prob'].diff, all its values are 1.
and comment these two lines in pycaffe which forces the diff to be a 4-d array

 if diff.ndim != 4:
     raise Exception('{} diff is not 4-d'.format(top))

I still get all zeros in bottom_diff['data']. Did I missed anything? Thanks!

PS: I use the deploy model file to initialize the net.

@shuangao
Copy link

I solved it by replacing the net.blobs['prob'].diff with net.blobs['prob'].data

bottom_diff = net.backward(**{net.outputs[0]: net.blobs['prob'].data})

@NightFury13
Copy link

Hi,
I'm facing a similar issue. While using the feature_extraction ipython notebook sample given by caffe, I'm setting the 'force_backward: true' variable in my deploy.prototxt file, however, the backward pass still gives an array full of 0's. Can someone help me out as to what am I doing wrong?

CODE:

caffe.set_mode_cpu()
net = caffe.Net(caffe_root + 'models/bvlc_reference_caffenet/forceback_deploy.prototxt',
caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
caffe.TEST)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) # mean pixel
transformer.set_raw_scale('data', 255)
transformer.set_channel_swap('data', (2,1,0))

net.blobs['data'].reshape(1,3,227,227)
net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image(caffe_root + 'examples/images/fish-bike.jpg'))
out = net.forward()
print("Predicted class is #{}.".format(out['prob'].argmax()))
back = net.backward()
j = back['data'].copy()
print j
^^^^^^^^^^^^^^^^^^^^^^^ THIS GIVES ME AN ARRAY OF ZEROS ^^^^^^^^^^^^^^^^^

forceback_deploy.prototxt

name: "CaffeNet"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 227
input_dim: 227
force_backward: true
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
inner_product_param {
num_output: 4096
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
inner_product_param {
num_output: 1000
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc8"
top: "prob"
}

@saukymo
Copy link

saukymo commented Feb 4, 2016

I want to run backward of the model in the /examples/mnist.
I get all 0s too. In fact, It didn't run the _backward at all.

I add "force_backward: true" in both "lenet.prototxt" and "lenet_train_test.prototxt".

@NightFury13
Copy link

Set the diff of the target class = 1. To be precise :

net.blobs['last-layer'].diff[0]['target_class']=1
back_pass = net.backward()

This should give you the required weights. :)

Regards,
Mohit

On Thu, Feb 4, 2016 at 10:03 PM Saukymo notifications@github.com wrote:

I want to run backward of the model in the /examples/mnist.
I get all 0s too. In fact, It didn't run the _backward at all.

I add "force_backward: true" in both "lenet.prototxt" and
"lenet_train_test.prototxt".


Reply to this email directly or view it on GitHub
#583 (comment).

@saukymo
Copy link

saukymo commented Feb 5, 2016

@NightFury13 Thank you for your reply. But still zero.

In fact, I used similar code.

probs = np.zeros_like(net.blobs['prob'].data)
probs[0][intended_outcome] = 1
gradient = net.backward(prob=probs)
return gradient['data'].copy()

And these ran well in googlenet but all zeros in the LeNet of the mnist tutorial.

Is it possible that I missed some setting in the train phase?

One more thing, I used OSX EI 10.11.13.
So I meet the
dyld: Library not loaded: libcaffe.so.1.0.0-rc3 error.
And I have to reset back to the commit #a97300c.
Are there any relative changes after this commit?

Thank you for your prompt reply again.

@NightFury13
Copy link

You seem to be setting the data parameters as 1 in the code snippet you've
written above. Set the 'diff' to 1. I asked this question on the
'caffe-users' group and Evan confirmed my approach.
https://groups.google.com/forum/#!searchin/caffe-users/mohit$20jain/caffe-users/e9QDkCcZsDw/UVJhVuOQEAAJ

As for the libcaffe.so error, maybe you can expand a bit more as to
when/how do you get that error and whats the error-trace? It really
shouldn't be a commit-version issue. I might not be able to get you through
the problem though (try opening a new issue and tag the contributors). Or
you may ask on the caffe-users group. The github discussions are primarily
for bug reports while the google group is a good 'how-to-do-x' type of
question solver.

Regards,
Mohit

On Fri, Feb 5, 2016 at 8:08 AM Saukymo notifications@github.com wrote:

@NightFury13 https://github.com/NightFury13 Thank you for your reply.
But still zero.

In fact, I used similar code.

probs = np.zeros_like(net.blobs['prob'].data)
probs[0][intended_outcome] = 1
gradient = net.backward(prob=probs)
return gradient['data'].copy()

And these ran well in googlenet but all zeros in the LeNet of the mnist
tutorial.

Is it possible that I missed some setting in the train phase?

One more thing, I used OSX EI 10.11.13.
So I meet the dyld: Library not loaded: libcaffe.so.1.0.0-rc3 error.
And I have to reset back to the commit #a97300c.
Is there any relative changes after this commit?

Thank you for your reply again, it's really on time.


Reply to this email directly or view it on GitHub
#583 (comment).

@saukymo
Copy link

saukymo commented Feb 5, 2016

@NightFury13 Thank you again.
For the zero problem. My code indeed set the 'diff' to 1 and I'm pretty sure about this point. But after I saw the source code of softmax layer and its backward algorithm, I found it involved with the 'data' params. So I tried to set the data all 1 and 'diff' all 0 with intend_class 1and I get the result seems really good while I'm not sure that it is the real diff. Whatever, I can finally put this problem behind me.

And for the libcaffe.so. I found lots of people had the just exact same problem these days so I just mentioned it without details.

And thank you for your advice for the google group. I have not asked problem before since my poor English. And this time I saw the relative issue here so I just add my problem here. :) I will try google group later of course.

Really thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants