# Sequence-to-sequence RNN
In this exercise, we implement a sequence-to-sequence RNN (without attention).

In [1]:
import torch
import torch.nn as nn

We first define our hyperparameters.

In [2]:
embedding_dim = 10
hidden_dim = 20
num_layers = 2
bidirectional = True
sequence_length = 5
batch_size = 3

Create a bidirectional [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) with 2 layers.

In [3]:
lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers, batch_first=False, bidirectional=bidirectional)

We create an example input `x`.

In [4]:
x = torch.randn(sequence_length, batch_size, embedding_dim)

What should the initial hidden and cell state be?

In [12]:
num_directions = 2 if bidirectional else 1
h0 = torch.zeros(num_directions * num_layers, batch_size, hidden_dim)
c0 = torch.zeros(num_directions * num_layers, batch_size, hidden_dim)

print(h0.shape)
print(c0.shape)

torch.Size([4, 3, 20])
torch.Size([4, 3, 20])


Now we run our LSTM. Look at the output. Explain each dimension of the output.

In [None]:
output, (hn, cn) = lstm(x, (h0, c0))

# output is the final layer's hidden states
print("output shape", output.shape) # sequence length x batch size x 2 times the hidden state
print("hidden shape", hn.shape)
print("cell state shape", cn.shape)

output shape torch.Size([5, 3, 40])
hidden shape torch.Size([4, 3, 20])
cell state shape torch.Size([4, 3, 20])


All outputs are from the last (2nd) layer of the LSTM. If we want to have access to the hidden states of layer 1 as well, we have to run the `LSTMCell`s ourselves.

When we take the above LSTM as the encoder, what is its output that serves as the input to the decoder?

In [14]:
encoder = lstm

# dim of h[2]: 3 x 20
# dim of h[3]: 3 x 20
# concatenate along hidden dimenstion (=last dimension)
encoder_output = torch.cat([hn[2], hn[3]], dim=-1) # concatenated final hidden states of second layer => 3 x 40 shape
print(encoder_output.shape)

torch.Size([3, 40])


Create a decoder LSTM with 2 layers. Why can't it be bidirectional as well? What is the hidden dimension of the decoder LSTM when you want to initialize it with the encoder output?

=> Because we don't know the whole input sequence when we start decoding. We generate the output one token at a time, and we need to know the previous token to generate the next one.

In [21]:
decoder_hidden_dim = num_directions * hidden_dim
decoder = nn.LSTM(input_size=embedding_dim, hidden_size=decoder_hidden_dim, num_layers=num_layers, batch_first=False, bidirectional=False)

Run your decoder LSTM on an example sequence. Condition it with the encoder representation of the sequence. How do we get the correct shape for the initial hidden state?

**Hint:** Take a look at [Torch's tensor operations](https://pytorch.org/docs/stable/tensors.html) and compare `Torch.repeat`, `Torch.repeat_interleave` and `Tensor.expand`.

In [None]:
output_seq_len = 8
y = torch.randn(output_seq_len, batch_size, embedding_dim)
h0_dec = encoder_output.unsqueeze(0).expand(2, -1, -1) # 3 x 40 => 2 x 3 x 40
c0_dec = torch.zeros(num_layers, batch_size, decoder_hidden_dim)
decoder_output, (hn_dec, cn_dec) = decoder(y, (h0_dec, c0_dec))

print(decoder_output.shape)

torch.Size([8, 3, 40])


In most RNNs, the final encoder hidden state is used as the first hidden state of the decoder RNN. In some variants, it has also been concatenated with the hidden state of the previous time step at each decoder time step. In PyTorch's `nn.LSTM` implementation, we cannot easily do that, so we would have to resort to the lower-level `nn.LSTMCell` class again.

Put it all together in a seq2seq LSTM model.

In [None]:
class Seq2seqLSTM(nn.Module):
    """ Sequence-to-sequence LSTM. """
    
    def __init__(self, embedding_dim, hidden_dim, num_encoder_layers, num_decoder_layers, bidirectional):
        super().__init__()
        
        self.num_directions = 2 if bidirectional else 1
        self.num_encoder_layers = num_encoder_layers
        self.num_decoder_layers = num_decoder_layers
        self.hidden_dim = hidden_dim
        self.decoder_hidden_dim = num_directions * hidden_dim

        self.encoder = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_encoder_layers, batch_first=False, bidirectional=bidirectional)
        self.decoder = nn.LSTM(input_size=embedding_dim, hidden_size=self.decoder_hidden_dim, num_layers=num_decoder_layers, batch_first=False, bidirectional=False)
    
    def forward(self, x, y):
        assert x.dim() == 3, "Expected input of shape [sequence length, batch size, embedding dim]"
        batch_size = x.size(1)

        # encoder forward
        h0_enc = torch.zeros(self.num_directions * self.num_encoder_layers, batch_size, self.hidden_dim)
        c0_enc = torch.zeros(self.num_directions * self.num_encoder_layers, batch_size, self.hidden_dim)
        encoder_output, (hn_enc, hn_enc) = self.encoder(x, (h0_enc, c0_enc))

        # decoder forward
        h0_dec = torch.cat([hn_enc[-2], hn_enc[-1]], dim=-1) if bidirectional else hn_enc[-1]
        h0_dec = h0_dec.unsqueeze(0).expand(self.num_decoder_layers, -1, -1)
        c0_dec = torch.zeros(self.num_decoder_layers, batch_size, self.decoder_hidden_dim)
        decoder_output, (hn_dec, cn_dec) = decoder(y, (h0_dec, c0_dec))

        return decoder_output

Test your seq2seq LSTM with an input sequence `x` and a ground truth output sequence `y` that the decoder tries to predict.

In [40]:
num_directions = 2 if bidirectional else 1
decoder_hidden_dim = num_directions * hidden_dim
seq2seq_lstm = Seq2seqLSTM(embedding_dim, hidden_dim, num_layers, num_layers, bidirectional)
x = torch.randn(10, 23, embedding_dim)
y = torch.randn(9, 23, embedding_dim)
outputs = seq2seq_lstm(x, y)
assert outputs.dim() == 3 and list(outputs.size()) == [9, 23, decoder_hidden_dim], "Wrong output shape"