Error running gpt2_generate_main.py #147

Akella17 · 2019-05-05T14:13:50Z

When I try to run the gpt2_generate_main.py file, I face the following error,

ValueError: The shape for transformer_decoder_1/transformer_decoder/while/Merge_27:0 is not an invariant for the loop. It enters the loop with shape (1, 768), but has shape (?, 768) after one iteration. Provide shape invariants using either the shape_invariants argument of tf.while_loop or set_shape() on the loop variables.

Also, how to use this model for conditioned text generation tasks? I am working on Reading Comprehension task that takes in a single stream input (Passage + ": " + Question + "? " + Answer) and am using a custom mask to extract loss between the answer start and sequence length indices. Is there a more elegant way to get this done?

Here is the entire list of callbacks:

Traceback (most recent call last):
File "gpt2_generate_main.py", line 210, in
tf.app.run()
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "gpt2_generate_main.py", line 144, in main
mode=tf.estimator.ModeKeys.PREDICT)
File "/home1/deepak/RaviTej/texar/texar/module_base.py", line 116, in call
return self._template(*args, **kwargs)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 455, in call
result = self._call_func(args, kwargs)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 406, in _call_func
result = self._func(*args, **kwargs)
File "/home1/deepak/RaviTej/texar/texar/modules/decoders/transformer_decoders.py", line 569, in _build
scope=self.variable_scope)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 309, in dynamic_decode
swap_memory=swap_memory)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3202, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2940, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2914, in _BuildLoop
next_vars.append(_AddNextAndBackEdge(m, v))
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 688, in _AddNextAndBackEdge
_EnforceShapeInvariant(m, v)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 632, in _EnforceShapeInvariant
(merge_var.name, m_shape, n_shape))
ValueError: The shape for transformer_decoder_1/transformer_decoder/while/Merge_27:0 is not an invariant for the loop. It enters the loop with shape (1, 768), but has shape (?, 768) after one iteration. Provide shape invariants using either the shape_invariants argument of tf.while_loop or set_shape() on the loop variables.

originally defined at:
File "gpt2_generate_main.py", line 133, in main
hparams=gpt2_config.decoder)
File "/home1/deepak/RaviTej/texar/texar/modules/decoders/transformer_decoders.py", line 98, in init
ModuleBase.init(self, hparams)
File "/home1/deepak/RaviTej/texar/texar/module_base.py", line 73, in init
create_scope_now_=True)
File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 153, in make_template
**kwargs)

The text was updated successfully, but these errors were encountered:

ZhitingHu · 2019-05-05T15:45:49Z

Which TF version are you using?

To train conditional generation, you'd need a custom mask.

Akella17 · 2019-05-05T16:01:50Z

tensorflow-gpu 1.7.0

So when you say custom mask, is it to selectively mask the loss function corresponding to the segments of input where the model is expected to learn to predict?

ZhitingHu · 2019-05-05T16:34:42Z

Could you upgrade to tf>=1.12 and try?

Yes. You may want to use the mask function in a forked repo:
reduce_with_weights. Just set weights to your mask.

ZhitingHu · 2019-05-05T16:39:27Z

Here is a reference code snippet to mask the loss:

         loss = tx.losses.sequence_sparse_softmax_cross_entropy(
             labels=ids[:, 1:],
             logits=logits[:, :-1, :], 
             sequence_length=full_len-1,
             average_across_timesteps=False,
             sum_over_timesteps=False,
             average_across_batch=False,
             sum_over_batch=False)
         mask = tf.sequence_mask(
             full_len-1,
             dtype=tf.float32)
         mask_prefix = 1 - tf.sequence_mask(
             prefix_len-1,
             maxlen=tf.reduce_max(full_len)-1,
             dtype=tf.float32)
         mask = mask * mask_prefix
         loss = tx.utils.reduce_with_weights(
             tensor=loss,
             weights=mask,
             average_across_remaining=True,
             sum_over_remaining=False)

Akella17 · 2019-05-05T17:08:11Z

Hey, updating TF solved this issue. Thanks for sharing the code snippet for masking!

Akella17 · 2019-05-06T05:13:11Z

I came across a new error after upgrading to TF == 1.12. The _map_tensor_names() in utils/model_utils.py raises the following error when I load a fine-tuned GPT2 checkpoint,

ValueError: invalid literal for int() with base 10: 'ayer_9'

The input argument, original_tensor_name = "transformer_decoder/layer_9/self_attention/multihead_attention/value/kernel" is causing this error.

Traceback (most recent call last):
  File "gpt2_generate_main.py", line 210, in <module>
    tf.app.run()
  File "/home1/deepak/anaconda/envs/nlp_proj/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "gpt2_generate_main.py", line 147, in main
    model_utils.init_gpt2_checkpoint(sess, ckpt_path)
  File "/home1/deepak/RaviTej/gpt-2/utils/model_utils.py", line 192, in init_gpt2_checkpoint
    init_checkpoint)
  File "/home1/deepak/RaviTej/gpt-2/utils/model_utils.py", line 176, in _get_assignment_map_from_checkpoint
    local_tensor_name = _map_tensor_names(ckpt_tensor_name)
  File "/home1/deepak/RaviTej/gpt-2/utils/model_utils.py", line 99, in _map_tensor_names
    layer_num = int(original_tensor_name_split[1][1:])
ValueError: invalid literal for int() with base 10: 'ayer_9'

This problem does not occur when I load the original GPT2 checkpoint. So the saved models of gpt2_train_main.py do not load to gpt2_generate_main.py

ZhitingHu · 2019-05-11T20:17:11Z

gpt2_generate_main.py updated. Now you can load saved checkpoint by specifying --checkpoint, or the original OpenAI checkpoint by specifying --pretrain_checkpoint

Akella17 closed this as completed May 5, 2019

Akella17 reopened this May 6, 2019

ZhitingHu closed this as completed May 25, 2019

ZhitingHu mentioned this issue Nov 26, 2019

Using reinforce_loss #251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running gpt2_generate_main.py #147

Error running gpt2_generate_main.py #147

Akella17 commented May 5, 2019

ZhitingHu commented May 5, 2019

Akella17 commented May 5, 2019

ZhitingHu commented May 5, 2019

ZhitingHu commented May 5, 2019

Akella17 commented May 5, 2019

Akella17 commented May 6, 2019

ZhitingHu commented May 11, 2019

Error running gpt2_generate_main.py #147

Error running gpt2_generate_main.py #147

Comments

Akella17 commented May 5, 2019

ZhitingHu commented May 5, 2019

Akella17 commented May 5, 2019

ZhitingHu commented May 5, 2019

ZhitingHu commented May 5, 2019

Akella17 commented May 5, 2019

Akella17 commented May 6, 2019

ZhitingHu commented May 11, 2019