Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix diff sharing in FlattenLayer #6488

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mike-shvets
Copy link

Make FlattenLayer behave like ReshapeLayer, setting diff
pointer of the top blob to diff pointer of the bottom blob
in Reshape, insead of sharing the other way around in Backward.

This prevents breaking the backpropagation in case previous
layer shares its top blob diff (aka this layer's bottom blob diff).

Below is the case where backpropagation would break in current master.
RecurrentLayer (LSTMLayer) shares top blob diff with its unrolled net, but
the following flatten layer breaks this connection during backward.

import numpy as np
import tempfile
import caffe
from caffe import layers as L
from caffe import params as P

TESTCASE = 'flatten'

spec = caffe.NetSpec()
spec.inp =  L.Input(shape=dict(dim=[2, 3, 4]))
spec.cont = L.Input(shape=dict(dim=[2, 3]))
spec.out = L.Input(shape=dict(dim=[2 * 3, 5]))
recurrent_param = dict(
    num_output=5,
    weight_filler=dict(type='xavier'),
    bias_filler=dict(type='constant', value=0.),
)
spec.lstm = L.LSTM(spec.inp, spec.cont, recurrent_param=recurrent_param)
if TESTCASE == 'flatten':
    spec.flat = L.Flatten(spec.lstm, flatten_param=dict(axis=0, end_axis=1))
else:
    # do the same with Reshape
    spec.flat = L.Reshape(spec.lstm, reshape_param=dict(shape=dict(dim=[-1, 5])))
spec.loss = L.EuclideanLoss(spec.flat, spec.out,  loss_weight=1.)

model_name = None
with tempfile.NamedTemporaryFile(delete=False) as f:
    f.write(str(spec.to_proto()))
    model_name = f.name
net = caffe.Net(model_name, caffe.TRAIN)

net.blobs['inp'].data[...] = np.random.rand(2, 3, 4)
net.blobs['cont'].data[...] = np.array([[0., 0., 0.], [1., 1., 1.]])
net.blobs['out'].data[...] = np.random.rand(2 * 3, 5)

net.clear_param_diffs()
net.forward()
net.backward()

# lstm params gradients
# prints all zeros in current master for 'flatten' test case
print(net.params['lstm'][0].diff)

Make FlattenLayer behave like ReshapeLayer, setting diff
pointer of the top blob to diff pointer of the bottom blob
in Reshape, insead of sharing the other way around in Backward.

This prevents breaking the backpropagation in case previous
layer shares its top blob diff (aka this layer's bottom blob diff).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant