-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to agree with new TorchText #189
Comments
Hmm, looking into this, wonder if torchtext changed. |
Torchtext changed on master (our development branch), but the version installed by your requirements.txt file should still be fixed since we haven't tagged a new version for pypi. |
Discussion here pytorch/text#95; I'm going to take a look at it again soon |
thanks @jekbradbury . We're live with |
The current plan is to release an updated tag in a couple weeks with this change, one or two other minor breaking changes, new built-in datasets, and (backwards-compatible) support for fields that can also reverse their tokenization/preprocessing steps (including a simple reversible tokenizer I'm calling revtok that optionally supports subwords). We'll let you know when it's ready and won't release it to pypi until it's confirmed non-breaking for you. |
Awesome. While we have you, there are two other random feature that I think we would love, I can make issues. 1) Auto-padding of non-vocab fields (we do it now with post-processing). 2) Ability to have non-tensor objects associated with a batch. The second comes up because we are using dynamic dictionaries associated with each source sentence. We hack around it now, but I would love to have a list of dicts in the batch. |
Yes, both of those sound like good ideas in general! The second one is definitely a missing feature, which should be pretty easy to achieve. There's this PR from our intern Alexander, which would have used The first sounds like something I intended to work (glancing at the Batch/Field implementation, non-vocab-having Fields should have |
Finally released v0.2.0 (also on PyPI) that rationalizes the original problem from this thread and introduces |
Thanks for the fix @jekbradbury! |
Is this issue fixed? I'm getting the same error: Processing torchtext-0.2.0-py3.6.egg Thanks |
While running train.py, using the data in data/ directory, and following the Quickstart procedure, it quickly fails with below error:
File "/home/myhome/data/code/OpenNMT-py/onmt/Models.py", line 206, in forward
packed_emb = pack(emb, lengths)
ValueError: lengths array has to be sorted in decreasing order
pack_padded_sequence requires the lengths argument to be sorted by decreasing order.
The text was updated successfully, but these errors were encountered: