# Sequence-to-sequence RNN
In this exercise, we implement a sequence-to-sequence RNN (without attention).

In [11]:
import torch
import torch.nn as nn
from thinc.backends.numpy_ops import lstm_forward_training

We first define our hyperparameters.

In [12]:
embedding_dim = 10
hidden_dim = 20
num_layers = 2
bidirectional = True
sequence_length = 5
batch_size = 3

Create a bidirectional [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) with 2 layers.

In [13]:
model = nn.LSTM(embedding_dim, hidden_dim, num_layers, bidirectional=bidirectional)

We create an example input `x`.

In [14]:
x = torch.randn(sequence_length, batch_size, embedding_dim)

What should the initial hidden and cell state be?

In [15]:
h0 = torch.zeros(num_layers * 2, batch_size, hidden_dim)
c0 = torch.zeros(num_layers * 2, batch_size, hidden_dim)

Now we run our LSTM. Look at the output. Explain each dimension of the output.

In [17]:
output, (hn, cn) = model(x, (h0, c0))

In [18]:
print(output.shape)
print(hn.shape)
print(cn.shape)

torch.Size([5, 3, 40])
torch.Size([4, 3, 20])
torch.Size([4, 3, 20])


All outputs are from the last (2nd) layer of the LSTM. If we want to have access to the hidden states of layer 1 as well, we have to run the `LSTMCell`s ourselves.

When we take the above LSTM as the encoder, what is its output that serves as the input to the decoder?

In [19]:
encoder = model

encoder_output = torch.cat([hn[2], hn[3]], dim=-1)
print(encoder_output.shape)

torch.Size([3, 40])


Create a decoder LSTM with 2 layers. Why can't it be bidirectional as well? What is the hidden dimension of the decoder LSTM when you want to initialize it with the encoder output?

In [21]:
decoder_hidden_dim = hidden_dim * 2
decoder = nn.LSTM(embedding_dim, decoder_hidden_dim, num_layers)

Run your decoder LSTM on an example sequence. Condition it with the encoder representation of the sequence. How do we get the correct shape for the initial hidden state?

**Hint:** Take a look at [Torch's tensor operations](https://pytorch.org/docs/stable/tensors.html) and compare `Torch.repeat`, `Torch.repeat_interleave` and `Tensor.expand`.

In [38]:
output_seq_length = 7
y = torch.randn(output_seq_length, batch_size, embedding_dim)
h0_dec = encoder_output.unsqueeze(0).expand(num_layers, -1, -1) # only adds new view! does not copy the tensor
# h0_dec = encoder_output.repeat(2, 1, 1)
c0_dec = torch.zeros(num_layers, batch_size, decoder_hidden_dim)

decoder_output, (hn_dec, cn_dec) = decoder(y, (h0_dec, c0_dec))

print(decoder_output.shape)
print(hn_dec.shape)
print(cn_dec.shape)

torch.Size([7, 3, 40])
torch.Size([2, 3, 40])
torch.Size([2, 3, 40])


In most RNNs, the final encoder hidden state is used as the first hidden state of the decoder RNN. In some variants, it has also been concatenated with the hidden state of the previous time step at each decoder time step. In PyTorch's `nn.LSTM` implementation, we cannot easily do that, so we would have to resort to the lower-level `nn.LSTMCell` class again.

Put it all together in a seq2seq LSTM model.

In [47]:
class Seq2seqLSTM(nn.Module):
    """ Sequence-to-sequence LSTM. """
    
    def __init__(self, embedding_dim, hidden_dim, num_encoder_layers, num_decoder_layers, bidirectional):
        super().__init__()
        self.num_directions = 2 if bidirectional else 1
        self.bidirectional = bidirectional
        self.encoder = nn.LSTM(embedding_dim, hidden_dim, num_encoder_layers, bidirectional=bidirectional)
        self.decoder = nn.LSTM(embedding_dim, self.num_directions * hidden_dim, num_decoder_layers)
    
    def forward(self, x, y):
        assert x.dim() == 3, "Expected input of shape [sequence length, batch size, embedding dim]"
        batch_size = x.size(1)
        
        # Encoder
        h0_en = torch.zeros(self.num_directions * self.encoder.num_layers, batch_size, self.encoder.hidden_size)
        c0_en = torch.zeros(self.num_directions * self.encoder.num_layers, batch_size, self.encoder.hidden_size)
        encoder_outputs, (hn_en, cn_en) = self.encoder(x, (h0_en, c0_en))
        
        # Decoder
        encoder_output = torch.cat((hn_en[-2], hn_en[-1]), dim=-1) if self.bidirectional else hn_en[-1]
        h0_dec = encoder_output.unsqueeze(0).expand(self.decoder.num_layers, -1, -1)
        c0_dec = torch.zeros(self.decoder.num_layers, batch_size, self.decoder.hidden_size)
        decoder_outputs, _ = self.decoder(y, (h0_dec, c0_dec))
        return decoder_outputs
        
        

Test your seq2seq LSTM with an input sequence `x` and a ground truth output sequence `y` that the decoder tries to predict.

In [48]:
num_directions = 2 if bidirectional else 1
decoder_hidden_dim = num_directions * hidden_dim
seq2seq_lstm = Seq2seqLSTM(embedding_dim, hidden_dim, num_layers, num_layers, bidirectional)
x = torch.randn(10, 2, embedding_dim)
y = torch.randn(9, 2, embedding_dim)
outputs = seq2seq_lstm(x, y)
assert outputs.dim() == 3 and list(outputs.size()) == [9, 2, decoder_hidden_dim], "Wrong output shape"