Fix diff sharing in FlattenLayer #6488

mike-shvets · 2018-08-03T18:15:25Z

Make FlattenLayer behave like ReshapeLayer, setting diff
pointer of the top blob to diff pointer of the bottom blob
in Reshape, insead of sharing the other way around in Backward.

This prevents breaking the backpropagation in case previous
layer shares its top blob diff (aka this layer's bottom blob diff).

Below is the case where backpropagation would break in current master.
RecurrentLayer (LSTMLayer) shares top blob diff with its unrolled net, but
the following flatten layer breaks this connection during backward.

import numpy as np
import tempfile
import caffe
from caffe import layers as L
from caffe import params as P

TESTCASE = 'flatten'

spec = caffe.NetSpec()
spec.inp =  L.Input(shape=dict(dim=[2, 3, 4]))
spec.cont = L.Input(shape=dict(dim=[2, 3]))
spec.out = L.Input(shape=dict(dim=[2 * 3, 5]))
recurrent_param = dict(
    num_output=5,
    weight_filler=dict(type='xavier'),
    bias_filler=dict(type='constant', value=0.),
)
spec.lstm = L.LSTM(spec.inp, spec.cont, recurrent_param=recurrent_param)
if TESTCASE == 'flatten':
    spec.flat = L.Flatten(spec.lstm, flatten_param=dict(axis=0, end_axis=1))
else:
    # do the same with Reshape
    spec.flat = L.Reshape(spec.lstm, reshape_param=dict(shape=dict(dim=[-1, 5])))
spec.loss = L.EuclideanLoss(spec.flat, spec.out,  loss_weight=1.)

model_name = None
with tempfile.NamedTemporaryFile(delete=False) as f:
    f.write(str(spec.to_proto()))
    model_name = f.name
net = caffe.Net(model_name, caffe.TRAIN)

net.blobs['inp'].data[...] = np.random.rand(2, 3, 4)
net.blobs['cont'].data[...] = np.array([[0., 0., 0.], [1., 1., 1.]])
net.blobs['out'].data[...] = np.random.rand(2 * 3, 5)

net.clear_param_diffs()
net.forward()
net.backward()

# lstm params gradients
# prints all zeros in current master for 'flatten' test case
print(net.params['lstm'][0].diff)

Make FlattenLayer behave like ReshapeLayer, setting diff pointer of the top blob to diff pointer of the bottom blob in Reshape, insead of sharing the other way around in Backward. This prevents breaking the backpropagation in case previous layer shares its top blob diff (aka this layer's bottom blob diff).

mike-shvets added 2 commits August 3, 2018 11:45

hotfix: remove redundant semicolon in FlattenLayer header

6fdbc9b

mike-shvets force-pushed the fix-flatten branch from 40d1a91 to 6fdbc9b Compare August 3, 2018 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix diff sharing in FlattenLayer #6488

Fix diff sharing in FlattenLayer #6488

mike-shvets commented Aug 3, 2018

Fix diff sharing in FlattenLayer #6488

Are you sure you want to change the base?

Fix diff sharing in FlattenLayer #6488

Conversation

mike-shvets commented Aug 3, 2018