New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with embeddings when using pre-trained models #1387
Comments
Is there a fix for this? I saw similar issues, but didn't really work. |
You may have a very long sentence in your source file (9515 tokens, apparently). The positional encodings default to only 5000 positions. |
reopen if needed |
Did anyone come up with a good way to deal with this aside from just skipping said sentences? I'm experiencing the same issue when using the English-German - Transformer from http://opennmt.net/Models-py/ with the provided tokenized WMT data. I'm running it as follows:
and I get
|
The maximum length (5000) is a parameter fixed at training time. See this constructor. Short of retraining, you need to drop or breakup the inputs longer than 5000 (sub)words. |
I get the following error
The size of tensor a (9515) must match the size of tensor b (5000) at non-singleton dimension 0
when using the following pre-trained model -sum_transformer_model_acc_57.25_ppl_9.22_e16.pt
. Not always though, it works on certain datasets, but not on others.The text was updated successfully, but these errors were encountered: