Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pix2Story: Model won't build - type mismatch when building optimizers. #83

Closed
TomReidNZ opened this issue Feb 7, 2019 · 8 comments
Closed

Comments

@TomReidNZ
Copy link

On a new clone of your repo, I can't get the model to train. There's a type mismatch in the updates when building the optimizer.

Running on conda on macOS (using CPU). I didn't mess with any files, just added in the .txt file. I tried updating the n_words, changing various things in the config file but no luck.

Any help would be much appreciated. Thanks, Tom

Error message:

Building optimizers...
Traceback (most recent call last):
File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/compile/pfunc.py", line 193, in rebuild_collect_shared
allow_convert=False)
File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/tensor/type.py", line 234, in filter_variable
self=self))
TypeError: Cannot convert Type TensorType(float64, matrix) (of Variable Elemwise{add,no_inplace}.0) into Type TensorType(float32, matrix). You can try to manually convert Elemwise{add,no_inplace}.0 into a TensorType(float32, matrix).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "training.py", line 6, in
EncTrainer.train()
File "/Users/tom/Documents/development/ailab/Pix2Story/source/training/train_encoder.py", line 40, in train
trainer(self.text, self.training_options)
File "/Users/tom/Documents/development/ailab/Pix2Story/source/skipthoughts_vectors/training/train.py", line 128, in trainer
f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost)
File "/Users/tom/Documents/development/ailab/Pix2Story/source/skipthoughts_vectors/encdec_functs/optim.py", line 40, in adam
f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore', profile=False)
File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/compile/function.py", line 317, in function
output_keys=output_keys)
File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/compile/pfunc.py", line 449, in pfunc
no_default_updates=no_default_updates)
File "/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
raise TypeError(err_msg, err_sug)
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

@TomReidNZ TomReidNZ changed the title Model won't build - type mismatch when building optimizers. Pix2Story: Model won't build - type mismatch when building optimizers. Feb 7, 2019
@ericmcmc
Copy link
Contributor

Hello ,

I think it could be caused by theano config. Seeing the error you are getting, the problem could be the parameter floatX in theano's config is set as float64 when it should be float32.

To check theano's config you can do:

import theano

print(theano.config)

To set a new config for an execution you can do it like this:

THEANO_FLAGS='floatX=float32' python training.py

More information here: http://deeplearning.net/software/theano/library/config.html

If the problem remains the same, could you send a more detailed info about theano's config and the sentences you are passing to the net?

Regards

@ericmcmc
Copy link
Contributor

Hello,
Did my answer solve your issue?
Thanks!

@mrcabellom
Copy link

Can we close this issue? @TomReidNZ did you solve the problem?

@TomReidNZ
Copy link
Author

TomReidNZ commented Mar 4, 2019

Now I get this error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "training.py", line 6, in <module>
    EncTrainer.train()
  File "/home/zarmada/pix2story/Lab/source/training/train_encoder.py", line 40, n train
    trainer(self.text, self.training_options)
  File "/home/zarmada/pix2story/Lab/source/skipthoughts_vectors/training/train.py", line 150, in trainer
    cost = f_grad_shared(x, x_mask, y, y_mask, z, z_mask)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/th                                                                                                                               eano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/gof/vm.py", line 305, in __call__
    link.raise_with_op(node, thunk)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/gof/vm.py", line 301, in __call__
    thunk()
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/gof/op.py", line 892, in rval
    r = p(n, [x[0] for x in i], o)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/tensor/elemwise.py", line 790, in perform
    variables = ufunc(*ufunc_args, **ufunc_kwargs)
  File "/home/zarmada/anaconda3/envs/storytelling/lib/python3.5/site-packages/theano/scalar/basic.py", line 4023, in impl
    output_storage = [[None] for i in xrange(self.nout)]
SystemError: <class 'range'> returned a result with an error set
Apply node that caused the error: Elemwise{Composite{Switch(i0, ((i1 * i2) / i3), i2)}}[(0, 2)](InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of 5.0}, Elemwis                                                                                                                               e{Add}[(0, 1)].0, InplaceDimShuffle{x,x}.0)
Toposort index: 741
Inputs types: [TensorType(bool, (True, True)), TensorType(float32, (True, True)), TensorType(float32, matrix), TensorType(float32, (True, True))]
Inputs shapes: [(1, 1), (1, 1), (4800, 20000), (1, 1)]
Inputs strides: [(1, 1), (4, 4), (80000, 4), (4, 4)]
Inputs values: [array([[ True]], dtype=bool), array([[ 5.]], dtype=float32), 'not shown', array([[ 44.09825897]], dtype=float32)]
Outputs clients: [['output']]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano f                                                                                                                               lag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

@ericmcmc
Copy link
Contributor

ericmcmc commented Mar 4, 2019

Hello @TomReidNZ ,

Could you post the whole output you are getting when you execute training.py along some samples of the list of sentences you are passing to the net?

Regards

@gsegares
Copy link
Contributor

gsegares commented Mar 5, 2019

Hi @ericmcmc, this is the output of the error. We created this text model using these books for the test.

@ericmcmc
Copy link
Contributor

ericmcmc commented Mar 5, 2019

Hi @TomReidNZ and @gsegares,

Seeing the output you get it could be caused by the Theano's version you are using. Could you try updating Theano to 1.0.3 version?

Please, let me know if using the original code with Theano 1.0.3 you can train the models.

Regards

@gsegares
Copy link
Contributor

gsegares commented Mar 6, 2019

Hi @ericmcmc, we changed the NC6 VM that we were using in Azure to use a deep learning template with all the GPU packages included and we are not getting that error anymore. We also changed the conda file to use the specific version of theano. We are having issues with some missing components in the repo (like paths['v_expansion'] = '../models/GoogleNews-vectors-negative300.bin') but that's a different problem. Your suggestion regarding the theano flag to fix the data type mismatch worked. I think we can close this issue. Thanks.

@fpelaez fpelaez closed this as completed May 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants