![](https://i.imgur.com/eBRPvWB.png)

# Practical PyTorch: Generating Shakespeare with a Character-Level RNN

[In the RNN classification tutorial](https://github.com/spro/practical-pytorch/blob/master/char-rnn-classification/char-rnn-classification.ipynb) we used a RNN to classify text one character at a time. This time we'll generate text one character at a time.

```
> python generate.py -n 500

PAOLTREDN:
Let, yil exter shis owrach we so sain, fleas,
Be wast the shall deas, puty sonse my sheete.

BAUFIO:
Sirh carrow out with the knonuot my comest sifard queences
O all a man unterd.

PROMENSJO:
Ay, I to Heron, I sack, againous; bepear, Butch,
An as shalp will of that seal think.

NUKINUS:
And house it to thee word off hee:
And thou charrota the son hange of that shall denthand
For the say hor you are of I folles muth me?
```

This one might make you question the series title &mdash; "is that really practical?" However, these sorts of generative models form the basis of machine translation, image captioning, question answering and more. See the [Sequence to Sequence Translation tutorial](https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb) for more on that topic.

# Recommended Reading

I assume you have at least installed PyTorch, know Python, and understand Tensors:

* http://pytorch.org/ For installation instructions
* [Deep Learning with PyTorch: A 60-minute Blitz](https://github.com/pytorch/tutorials/blob/master/Deep%20Learning%20with%20PyTorch.ipynb) to get started with PyTorch in general
* [jcjohnson's PyTorch examples](https://github.com/jcjohnson/pytorch-examples) for an in depth overview
* [Introduction to PyTorch for former Torchies](https://github.com/pytorch/tutorials/blob/master/Introduction%20to%20PyTorch%20for%20former%20Torchies.ipynb) if you are former Lua Torch user

It would also be useful to know about RNNs and how they work:

* [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) shows a bunch of real life examples
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) is about LSTMs specifically but also informative about RNNs in general

Also see these related tutorials from the series:

* [Classifying Names with a Character-Level RNN](https://github.com/spro/practical-pytorch/blob/master/char-rnn-classification/char-rnn-classification.ipynb) uses an RNN for classification
* [Generating Names with a Conditional Character-Level RNN](https://github.com/spro/practical-pytorch/blob/master/conditional-char-rnn/conditional-char-rnn.ipynb) builds on this model to add a category as input

In [1]:
!pip install unidecode

Collecting unidecode
  Downloading Unidecode-1.3.8-py3-none-any.whl (235 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m235.5/235.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: unidecode
Successfully installed unidecode-1.3.8


# Prepare data

The file we are using is a plain text file. We turn any potential unicode characters into plain ASCII by using the `unidecode` package (which you can install via `pip` or `conda`).

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [16]:
import unidecode
import string
import random
import re

all_characters = string.printable
n_characters = len(all_characters)

file = unidecode.unidecode(open('/content/Luceafarul.txt').read())
file_len = len(file)
print('file_len =', file_len)

file_len = 9868


To make inputs out of this big string of data, we will be splitting it into chunks.

In [17]:
chunk_len = 200

def random_chunk():
    start_index = random.randint(0, file_len - chunk_len)
    end_index = start_index + chunk_len + 1
    return file[start_index:end_index]

print(random_chunk())

ma,
Si soarele e tatal meu,
Iar noaptea-mi este muma;

O, vin', odorul meu nespus,
Si lumea ta o lasa;
Eu sunt luceafarul de sus,
Iar tu sa-mi fii mireasa.

O, vin', in parul tau balai
S-anin cununi de


# Build the Model

This model will take as input the character for step $t_{-1}$ and is expected to output the next character $t$. There are three layers - one linear layer that encodes the input character into an internal state, one GRU layer (which may itself have multiple layers) that operates on that internal state and a hidden state, and a decoder layer that outputs the probability distribution.

In [18]:
import torch
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers=1):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers

        self.encoder = nn.Embedding(input_size, hidden_size) # We use embeddings instead of one-hot encoding
        self.rnn = nn.RNN(hidden_size, hidden_size, n_layers) # RNN network
        self.decoder = nn.Linear(hidden_size, output_size) # Linear output layer

    def forward(self, input, hidden):
        input = self.encoder(input.view(1, -1))
        output, hidden = self.rnn(input.view(1, 1, -1), hidden)
        output = self.decoder(output.view(1, -1))
        return output, hidden

    def init_hidden(self):
        return torch.zeros(self.n_layers, 1, self.hidden_size) # we initialize the first hidden stat with zeros

# Inputs and Targets

Each chunk will be turned into a tensor, specifically a `LongTensor` (used for integer values), by looping through the characters of the string and looking up the index of each character in `all_characters`.

In [19]:
# Turn string into list of longs
def char_tensor(string):
    tensor = torch.zeros(len(string)).long()
    for c in range(len(string)):
        tensor[c] = all_characters.index(string[c])
    return tensor

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


Finally we can assemble a pair of input and target tensors for training, from a random chunk. The input will be all characters *up to the last*, and the target will be all characters *from the first*. So if our chunk is "abc" the input will correspond to "ab" while the target is "bc".

In [20]:
def random_training_set():
    chunk = random_chunk()
    inp = char_tensor(chunk[:-1])
    target = char_tensor(chunk[1:])
    return inp, target

In [21]:
random_training_set()

(tensor([96, 38, 24, 21, 24, 74, 23, 94, 25, 10, 21, 10, 29, 14, 94, 13, 14, 94,
         22, 10, 27, 16, 14, 10, 23, 96, 55, 14, 74, 24, 18, 94, 13, 30, 12, 14,
         94, 31, 14, 10, 12, 30, 27, 18, 94, 22, 30, 21, 29, 14, 73, 96, 54, 18,
         94, 29, 24, 10, 29, 10, 94, 21, 30, 22, 14, 10, 74, 23, 94, 24, 12, 14,
         10, 23, 96, 39, 14, 94, 29, 18, 23, 14, 94, 24, 94, 28, 74, 10, 28, 12,
         30, 21, 29, 14, 75, 63, 96, 96, 74, 94, 73, 73, 50, 73, 94, 14, 28, 29,
         18, 94, 15, 27, 30, 22, 24, 28, 73, 94, 12, 30, 22, 94, 23, 30, 22, 10,
         74, 23, 94, 31, 18, 28, 96, 56, 23, 94, 18, 23, 16, 14, 27, 94, 28, 14,
         94, 10, 27, 10, 29, 10, 73, 96, 39, 10, 27, 10, 94, 25, 14, 94, 12, 10,
         21, 14, 10, 94, 12, 14, 74, 10, 18, 94, 13, 14, 28, 12, 17, 18, 28, 96,
         49, 74, 24, 18, 94, 22, 14, 27, 16, 14, 94, 23, 18, 12, 18, 24, 13, 10,
         29, 10]),
 tensor([38, 24, 21, 24, 74, 23, 94, 25, 10, 21, 10, 29, 14, 94, 13, 14, 94, 22,
         

# Evaluating

To evaluate the network we will feed one character at a time, use the outputs of the network as a probability distribution for the next character, and repeat. To start generation we pass a priming string to start building up the hidden state, from which we then generate one character at a time.

In [22]:
# Temperature is a variable that controls the randomness of the selection process from the multinomial distribution

def evaluate(prime_str='A', predict_len=100, temperature=0.8):
    hidden = decoder.init_hidden()
    prime_input = char_tensor(prime_str)
    predicted = prime_str

    # Use priming string to "build up" hidden state
    for p in range(len(prime_str) - 1):
        _, hidden = decoder(prime_input[p], hidden)
    inp = prime_input[-1]

    for p in range(predict_len):
        output, hidden = decoder(inp, hidden)

        # Sample from the network as a multinomial distribution
        output_dist = output.data.view(-1).div(temperature).exp()
        top_i = torch.multinomial(output_dist, 1)[0]

        # Add predicted character to string and use as next input
        predicted_char = all_characters[top_i]
        predicted += predicted_char
        inp = char_tensor(predicted_char)

    return predicted

# Training

A helper to print the amount of time passed:

In [23]:
import time, math

def time_since(since):
    s = time.time() - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

The main training function

In [24]:
def train(inp, target):
    hidden = decoder.init_hidden()
    decoder.zero_grad()
    loss = 0

    for c in range(chunk_len):
        output, hidden = decoder(inp[c], hidden)
        loss += criterion(output, target[c].unsqueeze(0))

    loss.backward()
    decoder_optimizer.step()

    return loss.data.item() / chunk_len

Then we define the training parameters, instantiate the model, and start training:

In [None]:
n_epochs = 20000
print_every = 100
plot_every = 10
hidden_size = 100
n_layers = 1
lr = 0.005

decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()

start = time.time()
all_losses = []
loss_avg = 0

for epoch in range(1, n_epochs + 1):
    loss = train(*random_training_set())
    loss_avg += loss

    if epoch % print_every == 0:
        print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
        print(evaluate('Wh', 100), '\n')

    if epoch % plot_every == 0:
        all_losses.append(loss_avg / plot_every)
        loss_avg = 0

[0m 13s (100 0%) 2.0272]
Wh,
Un astii measa-t ni ae n pume vina starini soste ni voi dea cate aralesa s-umatasa ce si de pe de  

[0m 26s (200 1%) 1.9493]
Wheaiariga , sarii ti nat luminda.

Da mostrati
Ma vula
Pare intalun lunesa vorasareaga-u ,,Caci nea t 

[0m 39s (300 1%) 1.9518]
Whi se truntor sire,
Cupoaste lotos,
Si se primisi dor luma,
Eu nesa o sapa.


O, nele a ademi fare,
S 

[0m 52s (400 2%) 1.7209]
Whascum osus,
Caci ocheme,
Eu got osumoar cu vrea,
Nrepati
Si padatati
Si vinil fi schii s de sa serbi 

[1m 4s (500 2%) 1.6612]
Whivele ea de aprinici in adea de-nde.

O, vin', orizodri, ca sunt pe mine prepte-ntreapteapari vie co 

[1m 17s (600 3%) 1.7564]
Whe, ma privesc sunta sa vina trivescat pe canca meorm sin soari su-mi poamne
Si miepus,
Iar cu gremur 

[1m 30s (700 3%) 1.1545]
Whii tinici mai camant
Si vrei trecul unde si treguri nu goate nicitos, lunemi de-oi tremuritor,
Si do 

[1m 43s (800 4%) 1.5856]
Whipzii
Ei mariand num si ga-n codat
Ce cu mea fere
Sub dorbe.

Da

# Plotting the Training Losses

Plotting the historical loss from all_losses shows the network learning:

In [None]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline

plt.figure()
plt.plot(all_losses)

# Evaluating at different "temperatures"

In the `evaluate` function above, every time a prediction is made the outputs are divided by the "temperature" argument passed. Using a higher number makes all actions more equally likely, and thus gives us "more random" outputs. Using a lower value (less than 1) makes high probabilities contribute more. As we turn the temperature towards zero we are choosing only the most likely outputs.

We can see the effects of this by adjusting the `temperature` argument.

In [None]:
print(evaluate('Pe sufletele ', 200, temperature=0.5))

Lower temperatures are less varied, choosing only the more probable outputs:

In [None]:
print(evaluate('Th', 200, temperature=0.2))

Higher temperatures more varied, choosing less probable outputs:

In [None]:
print(evaluate('Th', 200, temperature=1))

# Exercises

* Train with your own dataset, e.g.
    * Text from another author
    * Blog posts
    * Code
* Increase number of layers and network size to get better results

**Next**: [Generating Names with a Conditional Character-Level RNN](https://github.com/spro/practical-pytorch/blob/master/conditional-char-rnn/conditional-char-rnn.ipynb)