Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to agree with new TorchText #189

Closed
JianyuZhan opened this issue Aug 24, 2017 · 10 comments
Closed

Update to agree with new TorchText #189

JianyuZhan opened this issue Aug 24, 2017 · 10 comments

Comments

@JianyuZhan
Copy link
Contributor

JianyuZhan commented Aug 24, 2017

While running train.py, using the data in data/ directory, and following the Quickstart procedure, it quickly fails with below error:

File "/home/myhome/data/code/OpenNMT-py/onmt/Models.py", line 206, in forward
packed_emb = pack(emb, lengths)
ValueError: lengths array has to be sorted in decreasing order

pack_padded_sequence requires the lengths argument to be sorted by decreasing order.

@srush
Copy link
Contributor

srush commented Aug 24, 2017

Hmm, looking into this, wonder if torchtext changed.

@jekbradbury
Copy link

Torchtext changed on master (our development branch), but the version installed by your requirements.txt file should still be fixed since we haven't tagged a new version for pypi.

@jekbradbury
Copy link

Discussion here pytorch/text#95; I'm going to take a look at it again soon

@srush
Copy link
Contributor

srush commented Aug 26, 2017

thanks @jekbradbury . We're live with torchtext now, it's been great. Let us know when we should fix things.

@jekbradbury
Copy link

The current plan is to release an updated tag in a couple weeks with this change, one or two other minor breaking changes, new built-in datasets, and (backwards-compatible) support for fields that can also reverse their tokenization/preprocessing steps (including a simple reversible tokenizer I'm calling revtok that optionally supports subwords). We'll let you know when it's ready and won't release it to pypi until it's confirmed non-breaking for you.

@srush
Copy link
Contributor

srush commented Aug 26, 2017

Awesome.

While we have you, there are two other random feature that I think we would love, I can make issues. 1) Auto-padding of non-vocab fields (we do it now with post-processing). 2) Ability to have non-tensor objects associated with a batch. The second comes up because we are using dynamic dictionaries associated with each source sentence. We hack around it now, but I would love to have a list of dicts in the batch.

@jekbradbury
Copy link

jekbradbury commented Aug 26, 2017

Yes, both of those sound like good ideas in general!

The second one is definitely a missing feature, which should be pretty easy to achieve. There's this PR from our intern Alexander, which would have used field=None to represent data that should be attached to a batch as a raw list of objects. We didn't merge it because it would have broken the existing use of field=None to represent "ignore this column in CSV/TSV source data," but we can replace it with field=RawField() or the like.

The first sounds like something I intended to work (glancing at the Batch/Field implementation, non-vocab-having Fields should have pad called like normal and simply skip the stoi lookup in numericalize), so something else must be missing/not working as intended. Maybe open an issue there? I'll make sure to add the second feature you mention for the next tag and find a fix for the first one.

@srush srush changed the title lengths argument in Encoder.forward() is not sorted by decreasing order Update to agree with new TorchText Aug 29, 2017
@jekbradbury
Copy link

Finally released v0.2.0 (also on PyPI) that rationalizes the original problem from this thread and introduces RawField while also fixing use of floats/ints as data in a sequential Field.

@JianyuZhan
Copy link
Contributor Author

JianyuZhan commented Oct 20, 2017

Thanks for the fix @jekbradbury!

@raviolli
Copy link

raviolli commented Nov 2, 2017

Is this issue fixed? I'm getting the same error:
File "/home/ravi/anaconda3/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 79, in pack_padded_sequence
raise ValueError("lengths array has to be sorted in decreasing order")
ValueError: lengths array has to be sorted in decreasing order

Processing torchtext-0.2.0-py3.6.egg
Copying torchtext-0.2.0-py3.6.egg to /home/ravi/anaconda3/lib/python3.6/site-packages
Adding torchtext 0.2.0 to easy-install.pth file

Thanks

@srush srush closed this as completed Dec 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants