Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubt on "pytorch-seq2seq/seq2seq/models/EncoderRNN.py" #159

Closed
caozhen-alex opened this issue Aug 31, 2018 · 5 comments
Closed

Doubt on "pytorch-seq2seq/seq2seq/models/EncoderRNN.py" #159

caozhen-alex opened this issue Aug 31, 2018 · 5 comments

Comments

@caozhen-alex
Copy link

if self.variable_lengths: embedded = nn.utils.rnn.pack_padded_sequence(embedded, input_lengths, batch_first=True) output, hidden = self.rnn(embedded) if self.variable_lengths: output, _ = nn.utils.rnn.pad_packed_sequence(output, batch_first=True)
Hi,
Why here goes two if self.variable_lengths?

And this code doesn't identify the h0, c0, is that means they default to zero?

Looking forward to your response. @kylegao91

@pskrunner14
Copy link

pskrunner14 commented Aug 31, 2018

@caozhen-alex we use nn.utils.rnn.pack_padded_sequence here when the input sequences are of variable lengths and we need to feed it to an RNN but you need to be able to get the output at the right time step. Without packing, you have to unroll your RNN to a fixed length which will get you a fixed length of output. The desired output should have different length, so you have to mask them by yourself.

For example:

>>> import torch
>>> from torch.nn.utils.rnn import pad_packed_sequence, pack_padded_sequence
>>> x = torch.LongTensor([[1, 2, 3], [7, 8, 0]])
>>> x = x.view(2, 3, 1)
>>> x
tensor([[[1],
         [2],
         [3]],

        [[7],
         [8],
         [0]]])
>>> lens = [3, 2]
>>> y = pack_padded_sequence(x, lens, batch_first=True)
>>> y       # tensor after packing 
PackedSequence(data=tensor([[1],
        [7],
        [2],
        [8],
        [3]]), batch_sizes=tensor([2, 2, 1]))
>>> z = pad_packed_sequence(y, batch_first=True)
>>> z        # the original tensor
(tensor([[[1],
         [2],
         [3]],

        [[7],
         [8],
         [0]]]), tensor([3, 2]))

And as for the h0, c0, the LSTM handles hidden and cell states on it's own, which means the hidden state of the LSTM is actually (h0, c0), so we don't manually need to identify them.

For example:

>>> import seq2seq
>>> from seq2seq.models import EncoderRNN, DecoderRNN, Seq2seq
>>> vocab_length = 10
>>> max_len = 5
>>> hidden_size = 7
>>> encoder = EncoderRNN(vocab_length, max_len, hidden_size, rnn_cell='lstm', bidirectional=True, variable_lengths=True)
>>> x = torch.LongTensor([[1, 2, 3, 0, 0]])
>>> output, hidden = encoder(x, input_lengths=[3])
>>> hidden
(tensor([[[-0.1805, -0.0206, -0.0415, -0.0419, -0.1093, -0.1006,  0.3364]],

        [[ 0.0262, -0.1567,  0.0960,  0.0525,  0.0513,  0.0613,  0.0911]]],
       grad_fn=<StackBackward>), tensor([[[-0.3646, -0.0378, -0.0712, -0.3447, -0.1835, -0.4023,  0.4809]],

        [[ 0.0828, -0.3054,  0.2106,  0.2864,  0.1017,  0.0886,  0.1711]]],
       grad_fn=<StackBackward>))
>>> h0, c0 = hidden
>>> h0
tensor([[[-0.1805, -0.0206, -0.0415, -0.0419, -0.1093, -0.1006,  0.3364]],

        [[ 0.0262, -0.1567,  0.0960,  0.0525,  0.0513,  0.0613,  0.0911]]],
       grad_fn=<StackBackward>)
>>> c0
tensor([[[-0.3646, -0.0378, -0.0712, -0.3447, -0.1835, -0.4023,  0.4809]],

        [[ 0.0828, -0.3054,  0.2106,  0.2864,  0.1017,  0.0886,  0.1711]]],
       grad_fn=<StackBackward>)

>>> hidden[0].size()   # since it's a bidirectional lstm we get 2 tensors of hidden_size
torch.Size([2, 1, 7])
>>> hidden[1].size()
torch.Size([2, 1, 7])

I'm not sure the maintainer is still active on the project. I hope this helped :)

@pskrunner14
Copy link

Closing this for now

@caozhen-alex
Copy link
Author

@pskrunner14 Hi, thank you very much for your explanation. I got the first problem. For the second one, why you let h0, c0 = hidden, I thought it should be h3, c3, or h5, c5.

@pskrunner14
Copy link

@caozhen-alex it's just an arbitrary naming convention I've used. The important part is that hidden of the lstm is a tuple consisting of hidden and cell states respectively.

@caozhen-alex
Copy link
Author

@pskrunner14 I c your point. Thank you very much for your clear explanation. Btw, How can I have a look at h0, c0 since i saw Pytorch documentation claiming they set them as 0 if they are not identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants