Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running gpt2_generate_main.py #147

Closed
Akella17 opened this issue May 5, 2019 · 7 comments
Closed

Error running gpt2_generate_main.py #147

Akella17 opened this issue May 5, 2019 · 7 comments

Comments

@Akella17
Copy link

Akella17 commented May 5, 2019

When I try to run the gpt2_generate_main.py file, I face the following error,

ValueError: The shape for transformer_decoder_1/transformer_decoder/while/Merge_27:0 is not an invariant for the loop. It enters the loop with shape (1, 768), but has shape (?, 768) after one iteration. Provide shape invariants using either the shape_invariants argument of tf.while_loop or set_shape() on the loop variables.

Also, how to use this model for conditioned text generation tasks? I am working on Reading Comprehension task that takes in a single stream input (Passage + ": " + Question + "? " + Answer) and am using a custom mask to extract loss between the answer start and sequence length indices. Is there a more elegant way to get this done?

Here is the entire list of callbacks:

Traceback (most recent call last):
File "gpt2_generate_main.py", line 210, in
tf.app.run()
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "gpt2_generate_main.py", line 144, in main
mode=tf.estimator.ModeKeys.PREDICT)
File "/home1/deepak/RaviTej/texar/texar/module_base.py", line 116, in call
return self._template(*args, **kwargs)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 455, in call
result = self._call_func(args, kwargs)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 406, in _call_func
result = self._func(*args, **kwargs)
File "/home1/deepak/RaviTej/texar/texar/modules/decoders/transformer_decoders.py", line 569, in _build
scope=self.variable_scope)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 309, in dynamic_decode
swap_memory=swap_memory)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3202, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2940, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2914, in _BuildLoop
next_vars.append(_AddNextAndBackEdge(m, v))
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 688, in _AddNextAndBackEdge
_EnforceShapeInvariant(m, v)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 632, in _EnforceShapeInvariant
(merge_var.name, m_shape, n_shape))
ValueError: The shape for transformer_decoder_1/transformer_decoder/while/Merge_27:0 is not an invariant for the loop. It enters the loop with shape (1, 768), but has shape (?, 768) after one iteration. Provide shape invariants using either the shape_invariants argument of tf.while_loop or set_shape() on the loop variables.

originally defined at:
File "gpt2_generate_main.py", line 133, in main
hparams=gpt2_config.decoder)
File "/home1/deepak/RaviTej/texar/texar/modules/decoders/transformer_decoders.py", line 98, in init
ModuleBase.init(self, hparams)
File "/home1/deepak/RaviTej/texar/texar/module_base.py", line 73, in init
create_scope_now_=True)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 153, in make_template
**kwargs)

@ZhitingHu
Copy link
Member

Which TF version are you using?

To train conditional generation, you'd need a custom mask.

@Akella17
Copy link
Author

Akella17 commented May 5, 2019

tensorflow-gpu 1.7.0

So when you say custom mask, is it to selectively mask the loss function corresponding to the segments of input where the model is expected to learn to predict?

@ZhitingHu
Copy link
Member

Could you upgrade to tf>=1.12 and try?

Yes. You may want to use the mask function in a forked repo:
reduce_with_weights. Just set weights to your mask.

@ZhitingHu
Copy link
Member

Here is a reference code snippet to mask the loss:

         loss = tx.losses.sequence_sparse_softmax_cross_entropy(
             labels=ids[:, 1:],
             logits=logits[:, :-1, :], 
             sequence_length=full_len-1,
             average_across_timesteps=False,
             sum_over_timesteps=False,
             average_across_batch=False,
             sum_over_batch=False)
         mask = tf.sequence_mask(
             full_len-1,
             dtype=tf.float32)
         mask_prefix = 1 - tf.sequence_mask(
             prefix_len-1,
             maxlen=tf.reduce_max(full_len)-1,
             dtype=tf.float32)
         mask = mask * mask_prefix
         loss = tx.utils.reduce_with_weights(
             tensor=loss,
             weights=mask,
             average_across_remaining=True,
             sum_over_remaining=False)

@Akella17
Copy link
Author

Akella17 commented May 5, 2019

Hey, updating TF solved this issue. Thanks for sharing the code snippet for masking!

@Akella17 Akella17 closed this as completed May 5, 2019
@Akella17 Akella17 reopened this May 6, 2019
@Akella17
Copy link
Author

Akella17 commented May 6, 2019

I came across a new error after upgrading to TF == 1.12. The _map_tensor_names() in utils/model_utils.py raises the following error when I load a fine-tuned GPT2 checkpoint,

ValueError: invalid literal for int() with base 10: 'ayer_9'

The input argument, original_tensor_name = "transformer_decoder/layer_9/self_attention/multihead_attention/value/kernel" is causing this error.

Traceback (most recent call last):
  File "gpt2_generate_main.py", line 210, in <module>
    tf.app.run()
  File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "gpt2_generate_main.py", line 147, in main
    model_utils.init_gpt2_checkpoint(sess, ckpt_path)
  File "/home1/deepak/RaviTej/gpt-2/utils/model_utils.py", line 192, in init_gpt2_checkpoint
    init_checkpoint)
  File "/home1/deepak/RaviTej/gpt-2/utils/model_utils.py", line 176, in _get_assignment_map_from_checkpoint
    local_tensor_name = _map_tensor_names(ckpt_tensor_name)
  File "/home1/deepak/RaviTej/gpt-2/utils/model_utils.py", line 99, in _map_tensor_names
    layer_num = int(original_tensor_name_split[1][1:])
ValueError: invalid literal for int() with base 10: 'ayer_9'

This problem does not occur when I load the original GPT2 checkpoint. So the saved models of gpt2_train_main.py do not load to gpt2_generate_main.py

@ZhitingHu
Copy link
Member

gpt2_generate_main.py updated. Now you can load saved checkpoint by specifying --checkpoint, or the original OpenAI checkpoint by specifying --pretrain_checkpoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants