# RNN Toy example of text generation with PyTorch

- We know basics of RNNs. 
- Now we'll look at a toy example for character-level text generation using RNNs. 
- Recall that given a sequence of characters, character-level text generation is the task of modeling probability distribution of the next character in the sequence. 

### A toy "hello" RNN 
- Suppose we want to train a character-level RNN on sequence "hello". 
- The vocabulary is 4 and we want our model to learn the following: 
    - "e" should be likely given "h" 
    - "l" should be likely given "he" 
    - "l" should be likely given "hel" 
    - "o" should be likely given "hell"     

![](img/RNN_char_generation_train.png)

<!-- <center> -->
<!-- <img src="img/RNN_char_generation_train.png" height="500" width="500">  -->
<!-- <center>     -->

[Source](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

### Shapes of input, hidden, and output weight matrices
- Shape of $W_{xh}$ ($W$) is going to be: $4 \times 3$
- Shape of $W_{hh}$ ($U$) is going to be: $3 \times 3$
- Shape of $W_{hy}$ ($V$) is going to be: $3 \times 4$
$$
h_t = g(h_{t-1}U + x_tW + b_1)\\
\hat{y}_t = \text{softmax}(h_tV + b_2)
$$ 

![](img/RNN_char_generation_train.png)

<!-- <center> -->
<!-- <img src="img/RNN_char_generation_train.png" height="600" width="600">  -->
<!-- <center>   -->

[Source](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

Let's build a simple RNN for this using `PyTorch`. 

In [17]:
torch.manual_seed(123)

<torch._C.Generator at 0x108150e50>

- Let's define a mapping between indices and characters. 

In [18]:
idx2char = ["h", "e", "l", "o"]

We need some representation for the input. Let's use one-hot representation. 

In [19]:
one_hot_lookup = [
    [1, 0, 0, 0],  # h
    [0, 1, 0, 0],  # e
    [0, 0, 1, 0],  # l
    [0, 0, 0, 1],  # o
]

Next let's create one-hot representation of `X`. 

In [20]:
X = [0, 1, 2, 2]  # indices for the input "hell"
X_one_hot = [one_hot_lookup[x] for x in X]
inputs = torch.Tensor(X_one_hot)
inputs

tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 1., 0.]])

In [21]:
y = [1, 2, 2, 3]
labels = torch.LongTensor(y)
labels

tensor([1, 2, 2, 3])

### Defining some variables 

In [22]:
num_classes = 4  # size of vocab
EPOCHS = 10  # number of epochs
input_size = 4  # size of vocab or one-hot size
hidden_size = 3  # output from the RNN.
batch_size = 1  # we are not batching in this toy example.
sequence_length = 1  # we are processing characters one by one in this toy example
num_layers = 1  # one-layer rnn

In [23]:
class ToyRNN(nn.Module):
    def __init__(self, debug=False):
        super(ToyRNN, self).__init__()

        # PyTorch core RNN module
        self.rnn = nn.RNN(
            input_size=input_size, hidden_size=hidden_size, batch_first=True
        )

        # Fully connected layer for the output
        self.fc = nn.Linear(hidden_size, num_classes)

        # Debugging flag
        self.debug = debug

    def forward(self, hidden, x):
        x = x.view(batch_size, sequence_length, input_size)  # reshape the input
        if self.debug:
            print("\n\n")
            print("Input shape = ", x.size())

        out, hidden = self.rnn(x, hidden)
        if self.debug:
            print("out shape = ", out.size())
            print("Hidden shape = ", hidden.size())

        out = out.reshape(out.shape[0], -1)  # reshape to pass before the output layer
        if self.debug:
            print("out shape after reshaing = ", out.size())

        out = self.fc(out)
        if self.debug:
            print("out shape after passing through fc = ", out.size())

        return hidden, out

    def init_hidden(self):
        return torch.zeros(num_layers, batch_size, hidden_size)

### Instantiate the model

In [24]:
model = ToyRNN()
print(model)

# Set loss and optimizer function
# Loss increases as the predicted probability diverges from the actual label.
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

ToyRNN(
  (rnn): RNN(4, 3, batch_first=True)
  (fc): Linear(in_features=3, out_features=4, bias=True)
)


### Train the model

In [25]:
for epoch in range(EPOCHS):
    optimizer.zero_grad()
    loss = 0
    hidden = model.init_hidden()

    pred = ""
    for inp, label in zip(inputs, labels):
        hidden, output = model(hidden, inp)
        val, idx = output.max(1)
        pred += idx2char[idx.data[0]]
        loss += criterion(output, torch.LongTensor([label]))
    print("Epoch: %d, loss: %1.3f, preidcted: %s" % (epoch + 1, loss, pred))

    loss.backward()
    optimizer.step()

Epoch: 1, loss: 6.082, preidcted: oeoe
Epoch: 2, loss: 5.008, preidcted: olol
Epoch: 3, loss: 4.393, preidcted: llll
Epoch: 4, loss: 4.155, preidcted: llll
Epoch: 5, loss: 3.991, preidcted: llll
Epoch: 6, loss: 3.697, preidcted: llll
Epoch: 7, loss: 3.280, preidcted: llll
Epoch: 8, loss: 2.864, preidcted: ello
Epoch: 9, loss: 2.459, preidcted: ello
Epoch: 10, loss: 2.020, preidcted: ello


![](img/RNN_char_generation_train.png)

<!-- <center> -->
<!-- <img src="img/RNN_char_generation_train.png" height="600" width="600">  -->
<!-- <center>     -->

[Source](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

- We have our toy RNN for text generation! 
- For realistic text generation using RNNs, usually their variants such as LSTMs or GRUs are trained on large corpora such as Wikipedia.   
- Note that training such models require significant computational resources, including powerful GPUs or TPUs, and substantial memory capacity. So don't try it on your laptop.  