# Sequence-to-sequence RNN
In this exercise, we implement a sequence-to-sequence RNN (without attention).

In [2]:
import torch
import torch.nn as nn

We first define our hyperparameters.

In [3]:
embedding_dim = 10
hidden_dim = 20
num_layers = 2
bidirectional = True
sequence_length = 5
batch_size = 3

Create a bidirectional [`nn.LSTM`](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) with 2 layers.

In [4]:
lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers, bidirectional=bidirectional)

We create an example input `x`.

In [5]:
x = torch.randn(sequence_length, batch_size, embedding_dim)

What should the initial hidden and cell state be?

In [9]:
num_directions = 2 if bidirectional else 1
h0 = torch.randn(num_layers * num_directions, batch_size, hidden_dim)
c0 = torch.randn(num_layers * num_directions, batch_size, hidden_dim)

Now we run our LSTM. Look at the output. Explain each dimension of the output.

In [None]:
output, (hn, cn) = lstm(x, (h0, c0))

print(output.shape) #shape: (sequence_length, batch_size, num_directions * hidden_dim)
print(hn.shape) #shape: (num_layers * num_directions, batch_size, hidden_dim). All the hiddenstate of all directions in the last layer
print(cn.shape)

torch.Size([5, 3, 40])
torch.Size([4, 3, 20])
torch.Size([4, 3, 20])


All outputs are from the last (2nd) layer of the LSTM. If we want to have access to the hidden states of layer 1 as well, we have to run the `LSTMCell`s ourselves.

When we take the above LSTM as the encoder, what is its output that serves as the input to the decoder?

In [11]:
encoder = lstm

encoder_output = torch.cat((hn[2], hn[3]), dim=1) #shape: 3x40
#torch.stack((hn[2], hn[3]), dim=1) #shape: 3x2x20

print(encoder_output.shape)
print(encoder_output)

torch.Size([3, 40])
tensor([[-0.0540, -0.1253,  0.0244,  0.0167, -0.1227,  0.1365, -0.0936,  0.0492,
         -0.0372, -0.0079, -0.1196, -0.0604, -0.0784,  0.0433,  0.0422, -0.0247,
         -0.0988, -0.0530, -0.0968,  0.1095,  0.0709, -0.2326,  0.0149, -0.0769,
         -0.0119, -0.0247,  0.1440,  0.2190,  0.0541, -0.0787,  0.1827,  0.0479,
          0.0584,  0.0388, -0.1106, -0.0130, -0.1142,  0.1593,  0.1667,  0.0448],
        [-0.1083, -0.1941, -0.0036,  0.0275, -0.0780,  0.1821, -0.1785,  0.0010,
         -0.0091, -0.0912, -0.0746, -0.0709, -0.1513,  0.0474,  0.0148,  0.0684,
         -0.1067, -0.0838, -0.0310,  0.2014, -0.0151, -0.0247, -0.0462, -0.0524,
          0.0555, -0.1418,  0.1199,  0.0928,  0.0257, -0.1636,  0.1063,  0.0492,
          0.1344,  0.0449,  0.0735, -0.1845, -0.0890,  0.2883,  0.1243, -0.0300],
        [-0.0495, -0.1347, -0.0264, -0.0526, -0.0854,  0.1002, -0.1687, -0.0203,
          0.0117,  0.0149, -0.1350, -0.1023, -0.1300, -0.0378,  0.0242,  0.0178,
      

Create a decoder LSTM with 2 layers. Why can't it be bidirectional as well? What is the hidden dimension of the decoder LSTM when you want to initialize it with the encoder output?

In [20]:
decoder_hidden_dim = num_directions * hidden_dim
decoder = nn.LSTM(input_size=embedding_dim, hidden_size=decoder_hidden_dim, num_layers=num_layers)

Run your decoder LSTM on an example sequence. Condition it with the encoder representation of the sequence. How do we get the correct shape for the initial hidden state?

**Hint:** Take a look at [Torch's tensor operations](https://pytorch.org/docs/stable/tensors.html) and compare `Torch.repeat`, `Torch.repeat_interleave` and `Tensor.expand`.

In [21]:
output_seq_length = 8

decoder_input = torch.randn(output_seq_length, batch_size, embedding_dim)
h0_decoder = encoder_output.unsqueeze(0).expand(2, -1, -1)
c0_decoder = torch.zeros(num_layers, batch_size, decoder_hidden_dim)

decoder_output, (hn_decoder, cn_decoder) = decoder(decoder_input, (h0_decoder, c0_decoder))

print(decoder_output.shape)
print(hn_decoder.shape)
print(cn_decoder.shape)


torch.Size([8, 3, 40])
torch.Size([2, 3, 40])
torch.Size([2, 3, 40])


In most RNNs, the final encoder hidden state is used as the first hidden state of the decoder RNN. In some variants, it has also been concatenated with the hidden state of the previous time step at each decoder time step. In PyTorch's `nn.LSTM` implementation, we cannot easily do that, so we would have to resort to the lower-level `nn.LSTMCell` class again.

Put it all together in a seq2seq LSTM model.

In [42]:
class Seq2seqLSTM(nn.Module):
    """ Sequence-to-sequence LSTM. """
    
    def __init__(self, embedding_dim, hidden_dim, num_encoder_layers, num_decoder_layers, bidirectional):
        super().__init__()
        
        #initialize encoder and decoder
        self.encoder = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_encoder_layers, bidirectional=bidirectional)
        num_directions = 2 if bidirectional else 1
        self.decoder = nn.LSTM(input_size=embedding_dim, hidden_size=num_directions * hidden_dim, num_layers=num_decoder_layers)
    
    def forward(self, x, y):
        assert x.dim() == 3, "Expected input of shape [sequence length, batch size, embedding dim]"
        batch_size = x.size(1) # x shape: [sequence length, batch size, embedding dim]
        
        #encoder forward
        h0_encoder = torch.zeros(self.encoder.num_layers * num_directions, batch_size, self.encoder.hidden_size)
        c0_encoder = torch.zeros(self.encoder.num_layers * num_directions, batch_size, self.encoder.hidden_size)

        encoder_output, (hn_encoder, cn_encoder) = self.encoder(x, (h0_encoder, c0_encoder))

        #decoder forward
        encoder_output = torch.cat((hn_encoder[-2], hn_encoder[-1]), dim=-1) if bidirectional else hn_encoder[-1]
        h0_decoder = encoder_output.unsqueeze(0).expand(self.decoder.num_layers, -1, -1)
        c0_decoder = torch.zeros(self.decoder.num_layers, batch_size, self.decoder.hidden_size)

        decoder_output, (hn_decoder, cn_decoder) = self.decoder(y, (h0_decoder, c0_decoder))

        return decoder_output

Test your seq2seq LSTM with an input sequence `x` and a ground truth output sequence `y` that the decoder tries to predict.

In [43]:
num_directions = 2 if bidirectional else 1
decoder_hidden_dim = num_directions * hidden_dim
seq2seq_lstm = Seq2seqLSTM(embedding_dim, hidden_dim, num_layers, num_layers, bidirectional)
x = torch.randn(10, 2, embedding_dim)
y = torch.randn(9, 2, embedding_dim)
outputs = seq2seq_lstm(x, y)
assert outputs.dim() == 3 and list(outputs.size()) == [9, 2, decoder_hidden_dim], "Wrong output shape"