# Sequence-to-sequence RNN
In this exercise, we implement a sequence-to-sequence RNN (without attention).

In [None]:
import torch
import torch.nn as nn

We first define our hyperparameters.

In [None]:
embedding_dim = 10
hidden_dim = 20
num_layers = 2
bidirectional = True
sequence_length = 5
batch_size = 3

Create a bidirectional [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) with 2 layers.

We create an example input `x`.

In [None]:
x = torch.randn(sequence_length, batch_size, embedding_dim)

What should the initial hidden and cell state be?

Now we run our LSTM. Look at the output. Explain each dimension of the output.

All outputs are from the last (2nd) layer of the LSTM. If we want to have access to the hidden states of layer 1 as well, we have to run the `LSTMCell`s ourselves.

When we take the above LSTM as the encoder, what is its output that serves as the input to the decoder?

Create a decoder LSTM with 2 layers. Why can't it be bidirectional as well? What is the hidden dimension of the decoder LSTM when you want to initialize it with the encoder output?

Run your decoder LSTM on an example sequence. Condition it with the encoder representation of the sequence. How do we get the correct shape for the initial hidden state?

**Hint:** Take a look at [Torch's tensor operations](https://pytorch.org/docs/stable/tensors.html) and compare `Torch.repeat`, `Torch.repeat_interleave` and `Tensor.expand`.

In most RNNs, the final encoder hidden state is used as the first hidden state of the decoder RNN. In some variants, it has also been concatenated with the hidden state of the previous time step at each decoder time step. In PyTorch's `nn.LSTM` implementation, we cannot easily do that, so we would have to resort to the lower-level `nn.LSTMCell` class again.

Put it all together in a seq2seq LSTM model.

In [None]:
class Seq2seqLSTM(nn.Module):
    """ Sequence-to-sequence LSTM. """
    
    def __init__(self, embedding_dim, hidden_dim, num_encoder_layers, num_decoder_layers, bidirectional):
        super().__init__()
        
        # TODO: initialize encoder and decoder
    
    def forward(self, x, y):
        assert x.dim() == 3, "Expected input of shape [sequence length, batch size, embedding dim]"
        batch_size = x.size(1)
        
        # TODO: implement encoder and decoder forward passes

Test your seq2seq LSTM with an input sequence `x` and a ground truth output sequence `y` that the decoder tries to predict.

In [None]:
num_directions = 2 if bidirectional else 1
decoder_hidden_dim = num_directions * hidden_dim
seq2seq_lstm = Seq2seqLSTM(embedding_dim, hidden_dim, num_layers, num_layers, bidirectional)
x = torch.randn(10, 2, embedding_dim)
y = torch.randn(9, 2, embedding_dim)
outputs = seq2seq_lstm(x, y)
assert outputs.dim() == 3 and list(outputs.size()) == [9, 2, decoder_hidden_dim], "Wrong output shape"