Generate text using a character based RNN. Given a sequence of chars, train an RNN model to predict the most
probable next char in the sequence.

While some of the sentences are grammatical, most do not make sense. The model has not learned the meaning of words, but here are some things to consider:
- The model is character-based. When training started, the model did not know how to spell an English word, or that words were even a unit of text.
- The structure of the output resembles a play—blocks of text generally begin with a speaker name, in all capital letters similar to the dataset.
- As demonstrated below, the model is trained on small batches of text (100 characters each), and is still able to generate a longer sequence of text with coherent structure.

In [2]:
import os
import time
import numpy as np
import tensorflow as tf

Download the Shakespeare dataset.

In [3]:
path = tf.keras.utils.get_file(
    "shakespeare.txt",
    origin="https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt",
)

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
[1m1115394/1115394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1us/step


In [10]:
text = open(path, "rb").read().decode(encoding='utf-8')
print(len(text))
# print first 250 chars
print(text[:250])

1115394
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



Let's check to see how many unique characters are in our corpus/document.

In [13]:
vocab = sorted(set(text))
print(f"Number of unique chars: {len(vocab)}")

Number of unique chars: 65


## Process the text
### Vectorize the text
Before training, you need to convert the strings to a numerical representation.

Using tf.keras.layers.StringLookup layer can convert each character into a numeric ID. It just needs the text to be split into tokens first.