Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: An update must have the same type as the original shared variable #728

Open
rubenvereecken opened this issue Jul 29, 2016 · 12 comments

Comments

@rubenvereecken
Copy link
Contributor

First off, this is not a usage question. This is, I believe, a bug report.

I got this issue using any 2D convolutional network with channel depth 1. Any higher number is fine and does not net me the follow exception:

TypeError: ('An update must have the same type as the original shared variable (shared_var=<CudaNdarrayType(float32, (False, True, False, False))>, shared_var.type=CudaNdarrayType(float32, (False, True, False, False)), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float32, 4D)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

I think the exception is indeed about the broadcastable dimension at index=1.

Printing out the variable that gave me the error, it's apparently the weight tensor W of the convolutional layer. It looks like this <CudaNdarrayType(float32, (False, True, False, False))>.

If I change channel depth (1st dimension) to 2 and look at W again, it simply is <CudaNdarrayType(float32, 4D)>.

Now, I believe the seasoned Theano user will immediately see something along the lines of "yep it's definitely that broadcastable dimension you have there messing up" but that still doesn't take away this is not expected and probably is unwanted, though I might be missing a use case for this behavior.

During my struggles yesterday I fixed this by adding a reshape layer that did absolutely nothing, simply shaped to exactly the same shape. My next attempt will be to follow the advice of the Exception and unbroadcast that dimension.

@rubenvereecken
Copy link
Contributor Author

Currently supplying my own weights W of exactly the same shape (checked by calling get_W_shape on the original) and that works fine too, it's also a <CudaNdarrayType(float32, 4D)>.

@ebenolson
Copy link
Member

Can you post a minimal script to replicate the error?

@f0k
Copy link
Member

f0k commented Jul 29, 2016

That's annoying... it's possible this was introduced with #715, but #143 should allow this. A minimal script would be very helpful!

@rubenvereecken
Copy link
Contributor Author

Took me a bit to slim it down sufficiently. Right now, I hope it's good enough to demonstrate the bug: https://gist.github.com/rubenvereecken/d9277cf861d1c297d953d4c27edcdd02

Change num_frames to 2 and you get a different bug to do with the improper example, but the fact remainds that for the value of 1 you'll see the Exception mentioned above.

@ebenolson
Copy link
Member

looks like the pylearn2 cuda_convnet wrapper might need a fix like this one? Theano/Theano#3774

@f0k
Copy link
Member

f0k commented Jul 29, 2016

looks like the pylearn2 cuda_convnet wrapper might need a fix like this one?

Very good, that seems spot on! So #715 didn't do anything wrong, but it uncovered a bug in pylearn2.

@rubenvereecken: You can file a PR to pylearn2 and hope they merge it (it's not in active development any more). The method in question is https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/sandbox/cuda_convnet/img_acts.py#L157, and the same in weight_acts.py and filter_acts.py.

A workaround in Lasagne would be to enforce the broadcast pattern on the update, but this might hide other bugs, so we shouldn't implement it. Anyway, if you want to do this in your own code, it would be as simple as:

def fix_update_bcasts(updates):
    for param, update in updates.items():
        if param.broadcastable != update.broadcastable:
            updates[param] = T.patternbroadcast(update, param.broadcastable)

@nouiz
Copy link

nouiz commented Aug 19, 2016

Just bumping this again as I saw this problem elsewhere too.

@f0k
Copy link
Member

f0k commented Aug 22, 2016

Just bumping this again as I saw this problem elsewhere too.

Elsewhere in pylearn2 or elsewhere altogether? I believe the best solution would be to fix this particular instance in pylearn2. We probably neither want to ignore this kind of problem in Theano nor in Lasagne.

@nouiz
Copy link

nouiz commented Aug 22, 2016

I don't know anyone using Pylearn2 anymore. It was with Lasagne:

https://github.com/MarcCote/sb_resnet/blob/master/sb/sb_resnet.py#L133

The fix in that repo:

MarcCote/sb_resnet@41074fa

@f0k
Copy link
Member

f0k commented Aug 22, 2016

I don't know anyone using Pylearn2 anymore.

Yes, the OP, for cuda-convnet. It doesn't return the correct broadcast pattern for the gradient, that's why the problem crept up.

The fix in that repo:

MarcCote/sb_resnet@41074fa

That was a mistake in their ADAM implementation then, Lasagne does it correctly:
https://github.com/Lasagne/Lasagne/blob/master/lasagne/updates.py#L595
Again, this was only uncovered because we're setting a broadcast pattern on the network weights since #715. I'm sorry this introduced complications, but network parameters were never guaranteed to be fully unbroadcastable, so I didn't expect any negative implications from #715. We specifically supported broadcastable parameters since #143, we just never created them in Lasagne.

If this turns out to affect many users, the easiest solution may be an extra keyword argument to theano.function, allowing to turn this into a warning or ignoring it.

@chandanmishra-03
Copy link

TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

`def adam(lr, tparams, grads, inp, cost):
gshared = [theano.shared(p.get_value() * 0., name='%s_grad'%k) for k, p in tparams.iteritems()]
gsup = [(gs, g) for gs, g in zip(gshared, grads)]

f_grad_shared = theano.function(inp, cost, updates=gsup, profile=False)

b1 = 0.1
b2 = 0.001
e = 1e-8

updates = []

i = theano.shared(numpy.float32(0.))
i_t = i + 1.
fix1 = 1. - b1**(i_t)
fix2 = 1. - b2**(i_t)
lr_t = lr * (tensor.sqrt(fix2) / fix1)

for p, g in zip(tparams.values(), gshared):
    m = theano.shared(p.get_value() * 0.)
    v = theano.shared(p.get_value() * 0.)
    m_t = (b1 * g) + ((1. - b1) * m)
    v_t = (b2 * tensor.sqr(g)) + ((1. - b2) * v)
    g_t = m_t / (tensor.sqrt(v_t) + e)
    p_t = p - (lr_t * g_t)
    updates.append((m, m_t))
    updates.append((v, v_t))
    updates.append((p, p_t))
updates.append((i, i_t))
# print(updates.size)

f_update = theano.function([lr], [], updates=updates, on_unused_input='raise', profile=False)

return f_grad_shared, f_update

`
getting the error in line "f_update = theano.function([lr], [], updates=updates, on_unused_input='raise', profile=False)". Help me to fix it.

@f0k
Copy link
Member

f0k commented Jul 30, 2019

If you look closely at the error message, the first type is "TensorType(float32, matrix)" and the second is "TensorType(float64, matrix)". You can't know, but "matrix" is the short term for a tensor of two non-broadcastable dimensions -- so the broadcast pattern is the same for both. (Theano could actually check this and omit the hint about the broadcast pattern, it is misleading here.) What's different, though, is the dtype: float32 vs. float64. Somewhere you are either starting with a float64, or a float32 is upcasted to a float64. A possible source of upcasting is an operation that involves a float32 and a numpy integer or float64.
To find the problem more quickly, you can add an assert for every update you register:

assert m.dtype == m_t.dtype
assert v.dtype == v_t.dtype
assert p.dtype == p_t.dtype

Any reason why you cannot just use lasagne.updates.adam(gshared, list(tparams.values()))?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants