# Play Generator using RNN

## Munsif Raza

This project will show how to generate text using a character-based RNN. We shall work with a dataset of Shakespeare's writing from Andrej Karpathy's The Unreasonable Effectiveness of Recurrent Neural Networks. Given a sequence of characters from this data ("Shakespear"), train a model to predict the next character in the sequence ("e"). Longer sequences of text can be generated by calling the model repeatedly.

In [1]:
# Importing the dependencies
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import numpy as np
import os
import time

# Loading dataset.

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

# Read, then decode.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


In [3]:
# Let's take a look of first 200 characters.
print(text[:200])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you


# Encoding the data
Since we are having data in text form so we have to encode that data into numericals so that we can use that data in our model.

In [4]:
vocab = sorted(set(text))
# creating a mapping from unique characters to indices.
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

def text_2_int(text):
    return np.array([char2idx[c] for c in text])

text_as_int = text_2_int(text)

In [5]:
# Let's look at how part of our text is encoded
print("Text: ", text[:13])
print("Encoded: ", text_2_int(text[:13]))

Text:  First Citizen
Encoded:  [18 47 56 57 58  1 15 47 58 47 64 43 52]


In [6]:
# Let's create a function to convert numeric values to text.

def int_to_text(ints):
    try:
        ints = ints.numpy()
    except:
        pass
    return ''.join(idx2char[ints])

print(int_to_text(text_as_int[:13]))

First Citizen


# Create training examples and targets
Our task is to feed the model a sequence and have it return to us the next character. This means we need to split our text data from above into many shorter sequences that we can pass to the model as training examples.

The training examples we will prepare will use a seq_lenth sequence as input and a seq_length sequence as output where that sequence is the original sequence shifted one letter to the right.

For example:
input: Hell  | output: ello

In [7]:
seq_length = 100 # length of sequence for a training example
examples_per_epoch = len(text)//(seq_length+1)

#create training examples and targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

In [8]:
# Now we shall use batch method to turn above stream of characters into batches of desired length.
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

Now we need to use these sequences of length 101 and split them into input and output.

In [9]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

# Making Training batches

In [10]:
# Batch size
BATCH_SIZE = 64
# Buffer size to shuffle the dataset
BUFFER_SIZE = 10000
# Length of the vocabulary in chars
vocab_size = len(vocab)
# The embedding dimension
embedding_dim = 256
# Number of RNN units
rnn_units = 1024

data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

# Building the Model
Now it is time to build the model. We will use an embedding layer a LSTM and one dense layer that contains a node for each unique character in our training data. The dense layer will give us a probability distribution over all nodes.

In [11]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size,None]),
        tf.keras.layers.LSTM(rnn_units,
                            return_sequences=True,
                            stateful=True,
                            recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (64, None, 256)           16640     
                                                                 
 lstm (LSTM)                 (64, None, 1024)          5246976   
                                                                 
 dense (Dense)               (64, None, 65)            66625     
                                                                 
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


# Creating a Loss function
We shall create our own loss function for this problem. This is because our model will output a(64, sequence_length, 65) shaped tensor that represents the probability distribution of each character at each timestep for every sequence in the batch.

But before doing so let's have a look at a sample input and the output from our untrained model. This is so we can understand what the model is actually giving us.

In [12]:
for input_example_batch, target_example_batch in data.take(1):
    example_batch_predictions = model(input_example_batch) # ask our model for a prediction on our first batch of training data.
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)") #print out the output shape.

(64, 100, 65) # (batch_size, sequence_length, vocab_size)


In [13]:
# we can see that the prediction is an array of 64 arrays, one for each entry in the batch.
print(len(example_batch_predictions))
print(example_batch_predictions)

64
tf.Tensor(
[[[ 3.3030570e-03 -1.1071606e-03 -3.6277538e-03 ... -1.7492869e-03
   -1.7463093e-03  3.8825851e-03]
  [ 4.1547064e-03 -2.1383921e-03 -1.9592862e-03 ... -1.7663521e-03
   -5.3195129e-03 -7.7475584e-04]
  [ 2.8423558e-03  4.9129506e-03 -4.7926572e-03 ... -4.4284805e-04
    8.6221844e-04  4.8089800e-03]
  ...
  [ 1.5304250e-03 -3.9841887e-03 -3.5096523e-03 ... -8.6496510e-03
   -3.1791902e-03  7.0173256e-03]
  [-5.4921478e-04 -6.2407050e-03 -1.1929049e-03 ... -8.2275979e-03
   -6.1346870e-03  3.3686613e-03]
  [-7.5821229e-04  3.7215091e-04 -4.6457332e-03 ... -5.2503920e-03
   -3.6886916e-04  8.3410284e-03]]

 [[ 2.8432768e-03 -3.1085783e-03 -3.2435048e-03 ... -2.3282261e-03
   -1.2451167e-03  5.3842328e-03]
  [ 1.3363342e-03  3.0851564e-03 -5.2842554e-03 ...  4.2375585e-05
    3.6941355e-03  8.9971442e-03]
  [-3.0351104e-05  5.1136743e-03 -3.2478152e-04 ...  5.3707685e-04
    7.8762211e-03  7.3518092e-03]
  ...
  [ 4.0474633e-04  2.1822997e-03 -1.0723732e-03 ... -3.6382393e

In [15]:
# Let's examine one prediction
pred = example_batch_predictions[0]
print(len(pred))
print(pred)

100
tf.Tensor(
[[ 0.00330306 -0.00110716 -0.00362775 ... -0.00174929 -0.00174631
   0.00388259]
 [ 0.00415471 -0.00213839 -0.00195929 ... -0.00176635 -0.00531951
  -0.00077476]
 [ 0.00284236  0.00491295 -0.00479266 ... -0.00044285  0.00086222
   0.00480898]
 ...
 [ 0.00153043 -0.00398419 -0.00350965 ... -0.00864965 -0.00317919
   0.00701733]
 [-0.00054921 -0.00624071 -0.0011929  ... -0.0082276  -0.00613469
   0.00336866]
 [-0.00075821  0.00037215 -0.00464573 ... -0.00525039 -0.00036887
   0.00834103]], shape=(100, 65), dtype=float32)


In [16]:
# finally, we shall look at a prediction at the first timestep
time_pred = pred[0]
print(len(time_pred))
print(time_pred)

65
tf.Tensor(
[ 3.3030570e-03 -1.1071606e-03 -3.6277538e-03  2.1965443e-03
  3.6413874e-03 -6.0924434e-04  2.0925887e-04  7.2735753e-03
 -5.6942571e-03  4.9264589e-04 -5.2659428e-03  3.8392788e-03
 -2.0124977e-03  1.2391595e-03 -2.4042481e-03  3.4154661e-03
 -4.4815401e-03  1.2579446e-03  2.3076476e-03 -1.9140950e-03
  3.5879072e-03  1.5083029e-03  7.4051775e-04 -2.0926362e-03
  4.0329928e-03 -8.7317417e-04  2.5907182e-03 -2.2937590e-03
  1.4039350e-04  1.8625897e-03 -7.5759101e-03 -3.0768879e-03
 -4.0437742e-03  6.7974702e-03  1.3974275e-03  6.0083410e-03
  2.2615853e-04  3.1840504e-04 -6.2402785e-03  1.6571288e-03
 -5.5779249e-04 -2.6716462e-03 -1.1738047e-03  3.6628258e-03
 -1.5419591e-03  3.0133503e-03  8.4048463e-04  4.1450942e-03
  3.0540922e-03  1.6578543e-03 -5.6817527e-03  4.0602414e-03
  5.2390620e-05 -1.3360474e-03  6.0387654e-05  3.2074514e-03
 -1.3509847e-03  1.3782713e-03 -6.4101079e-03 -2.6534023e-03
  2.8328458e-04 -3.4323118e-03 -1.7492869e-03 -1.7463093e-03
  3.882585

In [17]:
# If we want to determinne the predicted character we need to sample the output distribution(pick a value based on probability)
sampled_indices = tf.random.categorical(pred, num_samples=1)

# Now we can reshape that array and convert all the integers to numbers to see the actuall characters
sampled_indices = np.reshape(sampled_indices, (1,-1))[0]
predicted_chars = int_to_text(sampled_indices)

predicted_chars

'X!mBWzO$3,uJtMa;hwFIJs;vmNCSbJHd$hzlH?:.owk&YwbjSiJoYK?aMyMa\nvzCNJC iMHsTLS,:LpRKAV$CCuGBnQq,HEkQAX:'

So, now we need ot create a loss function tht can compare that output to the expected output and give us some numeric value representing how close the two were.

In [18]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

# Compiling the Model
Now we can think of our problem as a classification problem where the model predicts the probability of each unique letter coming next.

In [19]:
model.compile(optimizer='adam', loss=loss)

# Creating Checkpoints
Now we are going to setup and configure our model to save checkpoints as it trains. This will allow us to load our model from a checkpoint and continue training it.

In [20]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath = checkpoint_prefix,
    save_weights_only = True)

# Training the model
Finally it's time to train our model.

In [21]:
history = model.fit(data, epochs=30, callbacks=[checkpoint_callback])

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


# Loading the model
We shall rebuild the from a checkpoint using a batch_size of 1 so that we can feed one piece of text to the model and have it make a prediction

In [22]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

once the model is finished training we can find the latest checkpoint that stores the models weights using the following line.

In [23]:
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

WE can load any checkpoint we want by specifying the exact file to load.

In [None]:
checkpoint_num = 10
model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
model.build(tf.TensorShape([1, None]))

# Generating Text
Now we can use the function provided by tensorflow to generate some text using any starting string we'd like.

In [25]:
def generate_text(model, start_string):
    #Number of characters to generatae
    num_generate = 800
    # converting our start string into numbers
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)
    
    # Empty string to store our results
    text_generated = []
    
    # Low temperatures result in more predictable text
    # Higher temp results in more surprising text
    # Experiment to find the best setting
    temperature = 1.0
    
    # Here batch size == 1
    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
        # remove the batch dimensions
        predictions = tf.squeeze(predictions, 0)
        
        # using a categorical distribution to predict the character returned by the model.
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        
        #we pass the predicted character as the next input to the model
        # along with the previous hidden state
        input_eval = tf.expand_dims([predicted_id], 0)
        
        text_generated.append(idx2char[predicted_id])
        
    return (start_string + ''.join(text_generated))

In [26]:
inp = input("Type a starting string: ")
print(generate_text(model, inp))

Type a starting string: QUEEN
QUEENLIVBRLY:
This no of such de, castle is for pot,
And make me wip the county moody pirst, are they powder.

MENENIUS:
Now at once.

MENENIUS:
You'll know, Come; est alone.

Second Messenger:
Say their safety makes them se strokes. For pleasure!
Po honest windous trial of him:
Should in the Capulets abroad--

MENENIUS:
Consider full of the sun.

Second Gentleman:
I thought it would wish him?

LORD FITZWATER:
How far he must know me and to hear of certain;
Repartis number midnight!

ANGELO:
I will not do't.

ISABELLA:
O prince, I conjure thee, as now a joy again,
And undertake to wait upon thy cheek-march'd fortune,
Which grieves my tongue?

ANTONIO:
No; forsward, I will never die to-morrow:
Tell me the queen, repeal'd to the observaver:
And yet the nevil peace to Mantua,
Where the devil is th


# Conclusion
We got the dataset. We transformed it so that we can use it. Then we build a model trained it and then generated a sequence using that model.