# Text Generation with Recurrent Neural Networks (RNN)

In this notebook, we create a character-based Recurrent Neural Network (RNN) using TensorFlow. The RNN is trained on a portion of the Shakespeare dataset.

The main steps of the project are:

1. **Data Preparation:** 
    - Load the Shakespeare dataset from TensorFlow datasets.
    - Preprocess the data by mapping strings to a numerical representation.

2. **Model Definition:** 
    - Define a custom model class `MyModel` which inherits from `tf.keras.Model`. 
    - The model consists of three layers: an Embedding layer, a GRU layer, and a Dense layer.

3. **Model Training:** 
    - Compile and train the model using the `fit` method. 
    - We also use a custom training loop to control the training process.

4. **Text Generation:** 
    - After training, the model is used to generate new text. 
    - We use the `OneStep` model to generate text character by character.


In [1]:
import tensorflow as tf
from tensorflow.keras.layers.experimental import preprocessing
import numpy as np
import os
import time

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

In [3]:
# Read the data
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

# Length of text is the number of characters in it
print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


In [4]:
# Have a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [5]:
# The unique characters in the file
vocab = sorted(set(text))
print(f'{len(vocab)} unique characters')

65 unique characters


In [6]:
example_texts = ['abcdefg', 'xyz']

chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8')
ids_from_chars = preprocessing.StringLookup(vocabulary=list(vocab), mask_token=None)

ids = ids_from_chars(chars)

In [7]:
chars_from_ids = tf.keras.layers.experimental.preprocessing.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

chars = chars_from_ids(ids)

In [8]:
tf.strings.reduce_join(chars, axis=-1).numpy()

array([b'abcdefg', b'xyz'], dtype=object)

In [9]:
def text_from_ids(ids):
  return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

In [10]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)

sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

for seq in sequences.take(1):
  print(chars_from_ids(seq))

tf.Tensor(
[b'F' b'i' b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':'
 b'\n' b'B' b'e' b'f' b'o' b'r' b'e' b' ' b'w' b'e' b' ' b'p' b'r' b'o'
 b'c' b'e' b'e' b'd' b' ' b'a' b'n' b'y' b' ' b'f' b'u' b'r' b't' b'h'
 b'e' b'r' b',' b' ' b'h' b'e' b'a' b'r' b' ' b'm' b'e' b' ' b's' b'p'
 b'e' b'a' b'k' b'.' b'\n' b'\n' b'A' b'l' b'l' b':' b'\n' b'S' b'p' b'e'
 b'a' b'k' b',' b' ' b's' b'p' b'e' b'a' b'k' b'.' b'\n' b'\n' b'F' b'i'
 b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':' b'\n' b'Y'
 b'o' b'u' b' '], shape=(101,), dtype=string)


In [11]:
# For each sequence, duplicate and shift it to form the input and target text
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

In [12]:
# For each sequence, duplicate and shift it to form the input and target text
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

In [13]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
BUFFER_SIZE = 10000

# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<_PrefetchDataset element_spec=(TensorSpec(shape=(64, 100), dtype=tf.int64, name=None), TensorSpec(shape=(64, 100), dtype=tf.int64, name=None))>

In [14]:
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True, 
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else: 
      return x

model = MyModel(
    # Be sure the vocabulary size matches the `StringLookup` layers.
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

In [15]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 66) # (batch_size, sequence_length, vocab_size)


In [16]:
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  16896     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  67650     
                                                                 
Total params: 4,022,850
Trainable params: 4,022,850
Non-trainable params: 0
_________________________________________________________________


In [17]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss)

In [18]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [19]:
EPOCHS = 20
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [22]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "" or "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    predicted_logits, states = self.model(inputs=input_ids, states=states, 
                                          return_state=True)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    # Apply the prediction mask: prevent "" or "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    # Convert from token ids to characters
    predicted_chars = self.chars_from_ids(predicted_ids)

    # Return the characters and model state.
    return predicted_chars, states

In [23]:
one_step_model = OneStep(model, chars_from_ids, ids_from_chars)

In [25]:
start = time.time()
states = None
next_char = tf.constant(['ROMEO:'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()

print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)

print(f"\nRun time: {end - start}")

ROMEO:
O the king's covil, speaking so: we
will not prove a trunk return. But the argrument
To use the glory of thy absence.
What, will I be head?

MIRANDA:
O, let me live. I prithee, do you know.

GLOUCESTER:
My lord?

KING RICHARD III:
Ay, you rascless there! he's mutionman?
O bowes nothing but from my soul awhile,
And that be middow be much up, I pray,
Anony on her: this cheer he knew the
rest.

JOHN OF GAUNT:
Would then far from hence their loss.

First Murderer:
I part the Capitol; who set the lipsion,
And then be it as the comforts from Lord
Angelo poison any hereafter.

MENENIUS:
Dingle--
Displace my raged with him to crave by thee:
If I could speak again, or I'll keep him company.
O, but not smile't one that shouldst thou fly!
Where is the land the meat! Horse! Some durst on Antimble
Writ in your authority command!'
The suit we lie aside, and set down-sons'd.

CLAUDIO:
And come; what work is the forth and whereby he
The fountast of them here and strays for vain.

YORK:
O easy, 

In [26]:
start_string = "HAMLET:"
states = None
next_char = tf.constant([start_string])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)

HAMLET:
I joy with maid how the ground the forfeits house:
And soother he that have you braced bedembrown.

Second Gentleman:
No, my lord.

KING RICHARD III:
Stanley, let's attend these news: the prepent device
We shall be sworn trust, for he may call them;
For there were none afone with thee and me.

JULIET:
I'll do thee jeason; that I leave yourself
Lord march-take to my body to the queen;
And yet no more than wives, these mother
Hath brain'd the field in justice: then we see,
I mean no memeed it theirs; 'the penitent drum
Of our sortly, hath more confession absend;
Such news, that you are in arms!
If you do he, that our general: good night!
Lead not your worship, for all thy edged,
That form. Come hither, Master
Froth do, and that thou wouldst disguised kisses; friend,
Which the trim 'priam Lady? Happy stand; alack,
That I, being gold their fears: is this folly in;
When he is banish'd, I'll give him to the speech:
One Pish false and your deceit, secure, free
Of our own part, he hath

In [27]:
start_string = "To be or not to be, that is the question:"
states = None
next_char = tf.constant([start_string])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)

To be or not to be, that is the question:
Nothing the muchinard. Hese are the crown too: whether you lie!

HORTENSIO:
You're passing grandam too:
To see her, fiery when I sin--for solicy?

JULIET:
But that you not pass to the Duke of Clifford?
Happy and father, for thy lord,
That thou go along with me that he doth enough.

LORD WILLOUGHBY:
Have done your lordship?

ANTONIO:
My injucal tames are we leave us to't.

BRUTUS:
Yes, but he cannot cheque him on
A way as is the justice of a fomblice:
I am so brief of mercy of the ministers,
Repitation should be happite for thy hand,
It is as happy by the secret mighty,
Ere yea obey'dow, and took your honour,
I'll prove a bark of janes but Bringbroke
Fifth winds, two youth will show
What's yet ungovern'd in puts base.

OXFORD:
Where doth he straight? and doth not, think no more: away!

GONZALO:
I'll wait upon your wanton wish: is pain'd
The admired often person advanceful coals;
For this is calved for the duke with graces, all the
spigest, with 

> **Note:**
>
> The output of the model might not always make perfect sense, but it often has coherent phrases and sometimes even complete sentences that seem Shakespearean, which is a testament to the power of RNNs for sequence generation tasks.
>
> This model is stochastic, meaning each time you run the model you'll get slightly different results as it involves randomness in its predictions. You can experiment with different seed strings, adjust the temperature parameter in the OneStep model (a higher temperature results in more random output, a lower temperature results in more predictable output), or even train the model for more epochs to see how it affects the generated output.
