

The above answers addressed the question why very well. I just want to add an example for better understanding the use of pack_padded_sequence.
Let's take an example

    Note: pack_padded_sequence requires sorted sequences in the batch (in the descending order of sequence lengths). In the below example, the sequence batch were already sorted for less cluttering. Visit this gist link for the full implementation.

First, we create a batch of 2 sequences of different sequence lengths as below. We have 7 elements in the batch totally.

    Each sequence has embedding size of 2.
    The first sequence has the length: 5
    The second sequence has the length: 2


In [15]:
import torch 
from torch.nn.utils.rnn import pad_sequence
from torch import nn
seq_batch = [torch.tensor([[1, 1],
                           [2, 2],
                           [3, 3],
                           [4, 4],
                           [5, 5]]),
             torch.tensor([[10, 10],
                           [20, 20]])]

seq_lens = [5, 2]

In [6]:
seq_batch[0].shape


torch.Size([5, 2])

We pad seq_batch to get the batch of sequences with equal length of 5 (The max length in the batch). Now, the new batch has 10 elements totally.

In [12]:
# pad the seq_batch

padded_seq_batch = torch.nn.utils.rnn.pad_sequence(seq_batch, batch_first=True)

print(padded_seq_batch)
print(padded_seq_batch.shape)

tensor([[[ 1,  1],
         [ 2,  2],
         [ 3,  3],
         [ 4,  4],
         [ 5,  5]],

        [[10, 10],
         [20, 20],
         [ 0,  0],
         [ 0,  0],
         [ 0,  0]]])
torch.Size([2, 5, 2])


Then, we pack the padded_seq_batch. It returns a tuple of two tensors:

    The first is the data including all the elements in the sequence batch.
    The second is the batch_sizes which will tell how the elements related to each other by the steps.


In [13]:
# pack the padded_seq_batch
packed_seq_batch = torch.nn.utils.rnn.pack_padded_sequence(padded_seq_batch, lengths=seq_lens, batch_first=True)
packed_seq_batch

PackedSequence(data=tensor([[ 1,  1],
        [10, 10],
        [ 2,  2],
        [20, 20],
        [ 3,  3],
        [ 4,  4],
        [ 5,  5]]), batch_sizes=tensor([2, 2, 1, 1, 1]), sorted_indices=None, unsorted_indices=None)

Now, we pass the tuple packed_seq_batch to the recurrent modules in Pytorch, such as RNN, LSTM. This only requires 5 + 2=7 computations in the recurrrent module.

In [17]:
lstm = nn.LSTM(input_size=2, hidden_size=3, batch_first=True)
output, (hn, cn) = lstm(packed_seq_batch.float()) # pass float tensor instead long tensor.
output

PackedSequence(data=tensor([[-7.0842e-02,  8.0181e-02, -2.9316e-02],
        [-5.8671e-01,  2.2299e-03, -1.4920e-04],
        [-2.6730e-01,  8.7460e-02, -3.7573e-02],
        [-9.1874e-01,  1.2688e-05, -6.9973e-05],
        [-5.0493e-01,  6.0212e-02, -3.5421e-02],
        [-6.8296e-01,  3.7609e-02, -2.8522e-02],
        [-7.8116e-01,  2.3925e-02, -2.0730e-02]], grad_fn=<CatBackward>), batch_sizes=tensor([2, 2, 1, 1, 1]), sorted_indices=None, unsorted_indices=None)

We need to convert output back to the padded batch of output:

In [18]:
padded_output, output_lens = torch.nn.utils.rnn.pad_packed_sequence(output, batch_first=True, total_length=5)
padded_output

tensor([[[-7.0842e-02,  8.0181e-02, -2.9316e-02],
         [-2.6730e-01,  8.7460e-02, -3.7573e-02],
         [-5.0493e-01,  6.0212e-02, -3.5421e-02],
         [-6.8296e-01,  3.7609e-02, -2.8522e-02],
         [-7.8116e-01,  2.3925e-02, -2.0730e-02]],

        [[-5.8671e-01,  2.2299e-03, -1.4920e-04],
         [-9.1874e-01,  1.2688e-05, -6.9973e-05],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00]]],
       grad_fn=<TransposeBackward0>)

as you can see the model did not computer any gradients or weights for the pad elements