Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: gradient of convolution in newest theano from master #3763

Closed
matthias-k opened this issue Dec 10, 2015 · 4 comments
Closed

Bug: gradient of convolution in newest theano from master #3763

matthias-k opened this issue Dec 10, 2015 · 4 comments

Comments

@matthias-k
Copy link
Contributor

I found a strange problem with the newest theano from master. I implemented a gaussian convolution. In theano 0.7.0, taking the gradient of the output with respect to the kernel size works perfectly, however with the newest master (0.7.0.dev-e521b20e578c033d51e548181bd1edd24af64427) I get an execption ValueError: ('You cannot drop a non-broadcastable dimension.', ((True, False, False, False), (2, 3)))

Here is a minimal example to reproduce the problem:

import numpy as np
import theano
import theano.tensor as T

def gaussian_filter_theano_1d(input, sigma, window_radius=10):
    filter_1d = T.arange(-window_radius, window_radius+1)
    filter_1d = filter_1d.astype(theano.config.floatX)
    filter_1d = T.exp(-0.5*filter_1d**2/sigma**2)
    filter_1d = filter_1d / filter_1d.sum()

    filter_W = filter_1d.dimshuffle(['x', 'x', 0, 'x'])

    blur_op = T.nnet.conv2d(input, filter_W, border_mode='full', filter_shape=[1, 1, None, None])
    return blur_op

x1  = T.tensor4('x')
x1_data = np.random.randn(1, 1, 300, 300)
sigma = T.scalar('sigma')
sigma_data = 20

y = gaussian_filter_theano_1d(x1, sigma, window_radius=3)
print(y.eval({x1: x1_data, sigma: sigma_data}))
T.grad(y.sum(), sigma)

It fails (with python2.7 and python3.4) in the last line as follows:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-2e2d8647467e> in <module>()
     23 y = gaussian_filter_theano_1d(x1, sigma, window_radius=3)
     24 print(y.eval({x1: x1_data, sigma: sigma_data}))
---> 25 T.grad(y.sum(), sigma)

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in grad(cost, wrt, consider_constant, disconnected_inputs, add_names, known_grads, return_disconnected, null_gradients)
    559 
    560     rval = _populate_grad_dict(var_to_app_to_idx,
--> 561                                grad_dict, wrt, cost_name)
    562 
    563     for i in xrange(len(rval)):

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in _populate_grad_dict(var_to_app_to_idx, grad_dict, wrt, cost_name)
   1322         return grad_dict[var]
   1323 
-> 1324     rval = [access_grad_cache(elem) for elem in wrt]
   1325 
   1326     return rval

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in <listcomp>(.0)
   1322         return grad_dict[var]
   1323 
-> 1324     rval = [access_grad_cache(elem) for elem in wrt]
   1325 
   1326     return rval

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_grad_cache(var)
   1277                     for idx in node_to_idx[node]:
   1278 
-> 1279                         term = access_term_cache(node)[idx]
   1280 
   1281                         if not isinstance(term, gof.Variable):

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_term_cache(node)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in <listcomp>(.0)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_grad_cache(var)
   1277                     for idx in node_to_idx[node]:
   1278 
-> 1279                         term = access_term_cache(node)[idx]
   1280 
   1281                         if not isinstance(term, gof.Variable):

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_term_cache(node)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in <listcomp>(.0)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_grad_cache(var)
   1277                     for idx in node_to_idx[node]:
   1278 
-> 1279                         term = access_term_cache(node)[idx]
   1280 
   1281                         if not isinstance(term, gof.Variable):

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_term_cache(node)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in <listcomp>(.0)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_grad_cache(var)
   1277                     for idx in node_to_idx[node]:
   1278 
-> 1279                         term = access_term_cache(node)[idx]
   1280 
   1281                         if not isinstance(term, gof.Variable):

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_term_cache(node)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in <listcomp>(.0)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_grad_cache(var)
   1277                     for idx in node_to_idx[node]:
   1278 
-> 1279                         term = access_term_cache(node)[idx]
   1280 
   1281                         if not isinstance(term, gof.Variable):

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_term_cache(node)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in <listcomp>(.0)
    971             inputs = node.inputs
    972 
--> 973             output_grads = [access_grad_cache(var) for var in node.outputs]
    974 
    975             # list of bools indicating if each output is connected to the cost

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_grad_cache(var)
   1277                     for idx in node_to_idx[node]:
   1278 
-> 1279                         term = access_term_cache(node)[idx]
   1280 
   1281                         if not isinstance(term, gof.Variable):

/usr/local/lib/python3.4/dist-packages/theano/gradient.py in access_term_cache(node)
   1111                                 str(g_shape))
   1112 
-> 1113                 input_grads = node.op.grad(inputs, new_output_grads)
   1114 
   1115                 if input_grads is None:

/usr/local/lib/python3.4/dist-packages/theano/tensor/elemwise.py in grad(self, inp, grads)
    410             return [inp[0].zeros_like(dtype=theano.config.floatX)]
    411         else:
--> 412             return [DimShuffle(gz.type.broadcastable, grad_order)(
    413                 Elemwise(scalar.identity)(gz))]
    414 

/usr/local/lib/python3.4/dist-packages/theano/tensor/elemwise.py in __init__(self, input_broadcastable, new_order, inplace)
    162                     raise ValueError(
    163                         "You cannot drop a non-broadcastable dimension.",
--> 164                         (input_broadcastable, new_order))
    165 
    166         # this is the list of the original dimensions that we keep

ValueError: ('You cannot drop a non-broadcastable dimension.', ((True, False, False, False), (2,)))
@matthias-k
Copy link
Contributor Author

I ran git bisect from 0.7.0 to HEAD and found that the bug was introduced in bcc9336 i.e. with the switch to abstract_conv_2d.

@nouiz
Copy link
Member

nouiz commented Dec 11, 2015

I think you can use theano.tensor.nnet.conv.conv2d until we fix this.

thanks for the report.

On Thu, Dec 10, 2015 at 2:02 PM, matthias-k notifications@github.com
wrote:

I ran git bisect from 0.7.0 to HEAD and found that the bug was introduced
in bcc9336
bcc9336
i.e. with the switch to abstract_conv_2d.


Reply to this email directly or view it on GitHub
#3763 (comment).

@nouiz
Copy link
Member

nouiz commented Dec 15, 2015

Just to let you know that we merged the fix.

thanks for the report.

@matthias-k
Copy link
Contributor Author

Great, thanks for the quick bugfix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants