New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torchtext.data.iterator have add reverse #200

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
6 participants
@wjbianjason
Contributor

wjbianjason commented Aug 28, 2017

For a new starter, the latest torchtext have added reverse operation, then it will cause error "lengths argument in Encoder.forward() is not sorted by decreasing order" in rnn.
So there may need update the sign of self.sort_key()

torchtext.data.iterator have add reverse
For a new starter, the latest torchtext have added reverse operation, then it  will cause error "lengths argument in Encoder.forward() is not sorted by decreasing order" in rnn.
So there may need update the sign of self.sort_key()
@wjbianjason

This comment has been minimized.

Show comment
Hide comment
@wjbianjason
Contributor

wjbianjason commented Aug 28, 2017

@srush

This comment has been minimized.

Show comment
Hide comment
@srush

srush Aug 28, 2017

Contributor

I think we will need to do this somehow without breaking old code. At least until they update their code on in pip

Contributor

srush commented Aug 28, 2017

I think we will need to do this somehow without breaking old code. At least until they update their code on in pip

@jekbradbury

This comment has been minimized.

Show comment
Hide comment
@jekbradbury

jekbradbury Aug 28, 2017

Yeah, we definitely had the wrong default before. Would you rather we update pip ASAP to the new default, or add a parameter to make reversal optional?

jekbradbury commented Aug 28, 2017

Yeah, we definitely had the wrong default before. Would you rather we update pip ASAP to the new default, or add a parameter to make reversal optional?

@srush

This comment has been minimized.

Show comment
Hide comment
@srush

srush Aug 29, 2017

Contributor

Just let us know when you update pip.

Contributor

srush commented Aug 29, 2017

Just let us know when you update pip.

@wjbianjason

This comment has been minimized.

Show comment
Hide comment
@wjbianjason

wjbianjason Aug 31, 2017

Contributor

Agree with your thinking, but I think there should be some comment at the error line to remind somebody who install torchtext by git from the origin repo.
And I have implemented the conv2conv(https://arxiv.org/abs/1705.03122) and the conv2lstm(https://arxiv.org/abs/1611.02344) based on the old version of openmt-py.
If you think this is a meaningful module, I will update to fit the newest version and upload.

Contributor

wjbianjason commented Aug 31, 2017

Agree with your thinking, but I think there should be some comment at the error line to remind somebody who install torchtext by git from the origin repo.
And I have implemented the conv2conv(https://arxiv.org/abs/1705.03122) and the conv2lstm(https://arxiv.org/abs/1611.02344) based on the old version of openmt-py.
If you think this is a meaningful module, I will update to fit the newest version and upload.

@srush

This comment has been minimized.

Show comment
Hide comment
@srush

srush Aug 31, 2017

Contributor

Will do.

Yes, we would love to have contributions of conv2conv, particularly if you were able to replicate. You can look at how we implemented "Attention is all you need" for a guide. Just add in encoder_type=cnn and decoder_type=cnn and add a new CNNDecoderState and send us a PR. We'll add you as a contributor.

Contributor

srush commented Aug 31, 2017

Will do.

Yes, we would love to have contributions of conv2conv, particularly if you were able to replicate. You can look at how we implemented "Attention is all you need" for a guide. Just add in encoder_type=cnn and decoder_type=cnn and add a new CNNDecoderState and send us a PR. We'll add you as a contributor.

@wjbianjason wjbianjason closed this Sep 9, 2017

marcotcr pushed a commit to marcotcr/OpenNMT-py that referenced this pull request Sep 20, 2017

@ZijiaLewisLu

This comment has been minimized.

Show comment
Hide comment
@ZijiaLewisLu

ZijiaLewisLu Oct 21, 2017

Hi, I think the bug occurs again when running valid_data_iter.

def make_valid_data_iter(valid_data, opt):
    return onmt.IO.OrderedIterator(
                dataset=valid_data, batch_size=opt.batch_size,
                device=opt.gpuid[0] if opt.gpuid else -1,
                train=False, sort=True)

For this valid_data_iter, self.batches is already sorted in decreasing order before self.__iter__ is called.
However, in torchtext.data.iterator.__iter__(), it execute following code:

                if self.sort_within_batch:
                    # NOTE: `rnn.pack_padded_sequence` requires that a minibatch
                    # be sorted by decreasing order, which requires reversing
                    # relative to typical sort keys
                    if self.sort:
                        minibatch.reverse()
                    else:
                        minibatch.sort(key=self.sort_key, reverse=True)  

valid_data_iter has self.sort == self.sort_within_batch == true, so the data is reversed again, to increasing order!

For train_data_iter, it works fine.

I use code from master branch.

ZijiaLewisLu commented Oct 21, 2017

Hi, I think the bug occurs again when running valid_data_iter.

def make_valid_data_iter(valid_data, opt):
    return onmt.IO.OrderedIterator(
                dataset=valid_data, batch_size=opt.batch_size,
                device=opt.gpuid[0] if opt.gpuid else -1,
                train=False, sort=True)

For this valid_data_iter, self.batches is already sorted in decreasing order before self.__iter__ is called.
However, in torchtext.data.iterator.__iter__(), it execute following code:

                if self.sort_within_batch:
                    # NOTE: `rnn.pack_padded_sequence` requires that a minibatch
                    # be sorted by decreasing order, which requires reversing
                    # relative to typical sort keys
                    if self.sort:
                        minibatch.reverse()
                    else:
                        minibatch.sort(key=self.sort_key, reverse=True)  

valid_data_iter has self.sort == self.sort_within_batch == true, so the data is reversed again, to increasing order!

For train_data_iter, it works fine.

I use code from master branch.

@jekbradbury

This comment has been minimized.

Show comment
Hide comment
@jekbradbury

jekbradbury Oct 21, 2017

Yes, if your data is already sorted in an order that's acceptable to pack_padded_sequence, you need to set sort_within_batch=False to avoid it being reversed.

jekbradbury commented Oct 21, 2017

Yes, if your data is already sorted in an order that's acceptable to pack_padded_sequence, you need to set sort_within_batch=False to avoid it being reversed.

@ZijiaLewisLu

This comment has been minimized.

Show comment
Hide comment
@ZijiaLewisLu

ZijiaLewisLu Oct 21, 2017

I guess it would be good to add sort_within_batch=False to make_valid_data_iter()? Because currently the demo in README throws error.

ZijiaLewisLu commented Oct 21, 2017

I guess it would be good to add sort_within_batch=False to make_valid_data_iter()? Because currently the demo in README throws error.

@vonboettichert

This comment has been minimized.

Show comment
Hide comment
@vonboettichert

vonboettichert Dec 2, 2017

Hi all,

Has this issue been addressed already? I just pulled the latest Opennmt-py and torchtext, and I'm still getting this error.
Could anyone advise on how to fix it?

Thank you!
Thorsten

vonboettichert commented Dec 2, 2017

Hi all,

Has this issue been addressed already? I just pulled the latest Opennmt-py and torchtext, and I'm still getting this error.
Could anyone advise on how to fix it?

Thank you!
Thorsten

@JianyuZhan

This comment has been minimized.

Show comment
Hide comment
@JianyuZhan

JianyuZhan Dec 2, 2017

Collaborator

@vonboettichert , not yet, please stick with torchtext 0.1.1 for now until it is fixed and available on PyPi.

Collaborator

JianyuZhan commented Dec 2, 2017

@vonboettichert , not yet, please stick with torchtext 0.1.1 for now until it is fixed and available on PyPi.

@vonboettichert

This comment has been minimized.

Show comment
Hide comment
@vonboettichert

vonboettichert Dec 2, 2017

Thank you @JianyuZhan! Could you please advise me on how to install version 0.1.1 for windows?
I installed it by using
pip install https://github.com/pytorch/text/archive/master.zip
but I'm not sure how to get a specific version (0.1.1).

Thank you!
Thorsten

vonboettichert commented Dec 2, 2017

Thank you @JianyuZhan! Could you please advise me on how to install version 0.1.1 for windows?
I installed it by using
pip install https://github.com/pytorch/text/archive/master.zip
but I'm not sure how to get a specific version (0.1.1).

Thank you!
Thorsten

@JianyuZhan

This comment has been minimized.

Show comment
Hide comment
@JianyuZhan

JianyuZhan Dec 2, 2017

Collaborator

@vonboettichert , uninstall the current torchtext version in your system, then pip install 'torchtext==0.1.1'. This works on Linux, and I assume it works on Windows too.

Collaborator

JianyuZhan commented Dec 2, 2017

@vonboettichert , uninstall the current torchtext version in your system, then pip install 'torchtext==0.1.1'. This works on Linux, and I assume it works on Windows too.

@vonboettichert

This comment has been minimized.

Show comment
Hide comment
@vonboettichert

vonboettichert Dec 2, 2017

Hi again - just for future references: I cloned https://github.com/pytorch/text/tree/release-0.1.1/torchtext and ran 'python setup.py install'. Thank you for your great work!

vonboettichert commented Dec 2, 2017

Hi again - just for future references: I cloned https://github.com/pytorch/text/tree/release-0.1.1/torchtext and ran 'python setup.py install'. Thank you for your great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment