Issue with embeddings when using pre-trained models #1387

Rushab1 · 2019-04-06T19:55:54Z

I get the following error The size of tensor a (9515) must match the size of tensor b (5000) at non-singleton dimension 0 when using the following pre-trained model - sum_transformer_model_acc_57.25_ppl_9.22_e16.pt. Not always though, it works on certain datasets, but not on others.

[2019-04-06 15:43:33,234 INFO] Translating shard 0.
/home1/r/rushab/.local/lib/python2.7/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  var = torch.tensor(arr, dtype=self.dtype, device=device)
Traceback (most recent call last):
  File "translate.py", line 48, in <module>
    main(opt)
  File "translate.py", line 32, in main
    attn_debug=opt.attn_debug
  File "/mnt/nlpgridio2/nlp/users/rushab/PTSD/Science_NYT/Summarizer/OpenNMT-py/onmt/translate/translator.py", line 330, in translate
    batch, data.src_vocabs, attn_debug
  File "/mnt/nlpgridio2/nlp/users/rushab/PTSD/Science_NYT/Summarizer/OpenNMT-py/onmt/translate/translator.py", line 520, in translate_batch
    return_attention=attn_debug or self.replace_unk)
  File "/mnt/nlpgridio2/nlp/users/rushab/PTSD/Science_NYT/Summarizer/OpenNMT-py/onmt/translate/translator.py", line 612, in _translate_batch
    src, enc_states, memory_bank, src_lengths = self._run_encoder(batch)
  File "/mnt/nlpgridio2/nlp/users/rushab/PTSD/Science_NYT/Summarizer/OpenNMT-py/onmt/translate/translator.py", line 527, in _run_encoder
    src, src_lengths)
  File "/home1/r/rushab/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nlpgridio2/nlp/users/rushab/PTSD/Science_NYT/Summarizer/OpenNMT-py/onmt/encoders/transformer.py", line 113, in forward
    emb = self.embeddings(src)
  File "/home1/r/rushab/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nlpgridio2/nlp/users/rushab/PTSD/Science_NYT/Summarizer/OpenNMT-py/onmt/modules/embeddings.py", line 241, in forward
    source = module(source, step=step)
  File "/home1/r/rushab/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nlpgridio2/nlp/users/rushab/PTSD/Science_NYT/Summarizer/OpenNMT-py/onmt/modules/embeddings.py", line 50, in forward
    emb = emb + self.pe[:emb.size(0)]
RuntimeError: The size of tensor a (9515) must match the size of tensor b (5000) at non-singleton dimension 0
<shab/PTSD/Science_NYT/Summarizer/OpenNMT-py>

The text was updated successfully, but these errors were encountered:

Rushab1 · 2019-04-09T12:29:59Z

Is there a fix for this? I saw similar issues, but didn't really work.

shane-carroll · 2019-04-18T12:53:07Z

You may have a very long sentence in your source file (9515 tokens, apparently). The positional encodings default to only 5000 positions.

vince62s · 2019-06-28T05:26:43Z

reopen if needed

nick11roberts · 2019-08-02T16:43:40Z

Did anyone come up with a good way to deal with this aside from just skipping said sentences? I'm experiencing the same issue when using the English-German - Transformer from http://opennmt.net/Models-py/ with the provided tokenized WMT data.

I'm running it as follows:

python3 translate.py --model averaged-10-epoch.pt --src train.en

and I get

RuntimeError: The size of tensor a (12131) must match the size of tensor b (5000) at non-singleton dimension 0

1-800-BAD-CODE · 2019-08-02T16:57:27Z

The maximum length (5000) is a parameter fixed at training time. See this constructor.

Short of retraining, you need to drop or breakup the inputs longer than 5000 (sub)words.

vince62s closed this as completed Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with embeddings when using pre-trained models #1387

Issue with embeddings when using pre-trained models #1387

Rushab1 commented Apr 6, 2019

Rushab1 commented Apr 9, 2019

shane-carroll commented Apr 18, 2019

vince62s commented Jun 28, 2019

nick11roberts commented Aug 2, 2019

1-800-BAD-CODE commented Aug 2, 2019

Issue with embeddings when using pre-trained models #1387

Issue with embeddings when using pre-trained models #1387

Comments

Rushab1 commented Apr 6, 2019

Rushab1 commented Apr 9, 2019

shane-carroll commented Apr 18, 2019

vince62s commented Jun 28, 2019

nick11roberts commented Aug 2, 2019

1-800-BAD-CODE commented Aug 2, 2019