Update to agree with new TorchText #189

JianyuZhan · 2017-08-24T13:53:28Z

While running train.py, using the data in data/ directory, and following the Quickstart procedure, it quickly fails with below error:

File "/home/myhome/data/code/OpenNMT-py/onmt/Models.py", line 206, in forward
packed_emb = pack(emb, lengths)
ValueError: lengths array has to be sorted in decreasing order

pack_padded_sequence requires the lengths argument to be sorted by decreasing order.

srush · 2017-08-24T14:49:54Z

Hmm, looking into this, wonder if torchtext changed.

jekbradbury · 2017-08-26T19:34:21Z

Torchtext changed on master (our development branch), but the version installed by your requirements.txt file should still be fixed since we haven't tagged a new version for pypi.

jekbradbury · 2017-08-26T19:35:31Z

Discussion here pytorch/text#95; I'm going to take a look at it again soon

srush · 2017-08-26T22:26:28Z

thanks @jekbradbury . We're live with torchtext now, it's been great. Let us know when we should fix things.

jekbradbury · 2017-08-26T22:31:35Z

The current plan is to release an updated tag in a couple weeks with this change, one or two other minor breaking changes, new built-in datasets, and (backwards-compatible) support for fields that can also reverse their tokenization/preprocessing steps (including a simple reversible tokenizer I'm calling revtok that optionally supports subwords). We'll let you know when it's ready and won't release it to pypi until it's confirmed non-breaking for you.

srush · 2017-08-26T22:40:37Z

Awesome.

While we have you, there are two other random feature that I think we would love, I can make issues. 1) Auto-padding of non-vocab fields (we do it now with post-processing). 2) Ability to have non-tensor objects associated with a batch. The second comes up because we are using dynamic dictionaries associated with each source sentence. We hack around it now, but I would love to have a list of dicts in the batch.

jekbradbury · 2017-08-26T23:14:32Z

Yes, both of those sound like good ideas in general!

The second one is definitely a missing feature, which should be pretty easy to achieve. There's this PR from our intern Alexander, which would have used field=None to represent data that should be attached to a batch as a raw list of objects. We didn't merge it because it would have broken the existing use of field=None to represent "ignore this column in CSV/TSV source data," but we can replace it with field=RawField() or the like.

The first sounds like something I intended to work (glancing at the Batch/Field implementation, non-vocab-having Fields should have pad called like normal and simply skip the stoi lookup in numericalize), so something else must be missing/not working as intended. Maybe open an issue there? I'll make sure to add the second feature you mention for the next tag and find a fix for the first one.

jekbradbury · 2017-10-20T04:56:11Z

Finally released v0.2.0 (also on PyPI) that rationalizes the original problem from this thread and introduces RawField while also fixing use of floats/ints as data in a sequential Field.

JianyuZhan · 2017-10-20T08:57:14Z

Thanks for the fix @jekbradbury!

raviolli · 2017-11-02T20:41:47Z

Is this issue fixed? I'm getting the same error:
File "/home/ravi/anaconda3/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 79, in pack_padded_sequence
raise ValueError("lengths array has to be sorted in decreasing order")
ValueError: lengths array has to be sorted in decreasing order

Processing torchtext-0.2.0-py3.6.egg
Copying torchtext-0.2.0-py3.6.egg to /home/ravi/anaconda3/lib/python3.6/site-packages
Adding torchtext 0.2.0 to easy-install.pth file

Thanks

JianyuZhan mentioned this issue Aug 24, 2017

Model/Encoder: make sure lengths are sorted in decreasing order #190

Closed

srush added the type:bug label Aug 24, 2017

JianyuZhan mentioned this issue Aug 26, 2017

Batch should be sorted by decreasing size. pytorch/text#95

Merged

srush added type:support and removed type:bug labels Aug 27, 2017

wjbianjason mentioned this issue Aug 28, 2017

torchtext.data.iterator have add reverse #200

Closed

srush changed the title ~~lengths argument in Encoder.forward() is not sorted by decreasing order~~ Update to agree with new TorchText Aug 29, 2017

JianyuZhan mentioned this issue Oct 20, 2017

Torchtext v0.2 issue #345

Closed

srush closed this as completed Dec 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to agree with new TorchText #189

Update to agree with new TorchText #189

JianyuZhan commented Aug 24, 2017 •

edited

Loading

srush commented Aug 24, 2017

jekbradbury commented Aug 26, 2017

jekbradbury commented Aug 26, 2017

srush commented Aug 26, 2017

jekbradbury commented Aug 26, 2017

srush commented Aug 26, 2017

jekbradbury commented Aug 26, 2017 •

edited

Loading

jekbradbury commented Oct 20, 2017

JianyuZhan commented Oct 20, 2017 •

edited

Loading

raviolli commented Nov 2, 2017

Update to agree with new TorchText #189

Update to agree with new TorchText #189

Comments

JianyuZhan commented Aug 24, 2017 • edited Loading

srush commented Aug 24, 2017

jekbradbury commented Aug 26, 2017

jekbradbury commented Aug 26, 2017

srush commented Aug 26, 2017

jekbradbury commented Aug 26, 2017

srush commented Aug 26, 2017

jekbradbury commented Aug 26, 2017 • edited Loading

jekbradbury commented Oct 20, 2017

JianyuZhan commented Oct 20, 2017 • edited Loading

raviolli commented Nov 2, 2017

JianyuZhan commented Aug 24, 2017 •

edited

Loading

jekbradbury commented Aug 26, 2017 •

edited

Loading

JianyuZhan commented Oct 20, 2017 •

edited

Loading