# RNN Play Generator

The task is to build a model to predict the next character in a sequence.

What we will do is to train the model on some sequences of the play Romeo and Jolliet, and we take the output which is the next character, and keep feeding the output with the previous input to the model as much as we want to predict a whole play eventually.

In [None]:
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

### Downloading the dataset

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


In [4]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters


In [5]:
# Take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



### Encoding

The text is not processed earlier, we need to preprocess it.

In [6]:
vocab = sorted(set(text)) # Sort all the unique characters in the text
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)} # the output: letter:index
idx2char = np.array(vocab) # reverse mapping, index:character

# Defining a function to encode the text: convert to integer representation
def text_to_int(text):
  return np.array([char2idx[c] for c in text])

text_as_int = text_to_int(text)

In [7]:
# lets look at how part of our text is encoded
print("Text:", text[:13])
print("Encoded:", text_to_int(text[:13]))

Text: First Citizen
Encoded: [18 47 56 57 58  1 15 47 58 47 64 43 52]


### Decoding

In [8]:
def int_to_text(ints):
  # convert it to numpy array if it is not one already
  try:
    ints = ints.numpy()
  except:
    pass
  return ''.join(idx2char[ints])

print(int_to_text(text_as_int[:13]))

First Citizen


### Creating training examples

It is not feazible to pass all the data to our model at once, we need to pass something meaningfull.

We will pass a sequence of a length (100) to the model, and the output will be the same sequence shifted one letter to the right.

*Example:* Input:Hell, Output:ello

The output is the input without the first charecter in addition to the last charecter (which is the predection) 

In [9]:
seq_length = 100  #the length of each batch # length of sequence for a training example
# for eveery training example we need to creat a 100 sequence long as an input 
# and a 100 sequence long as an output. Thus there is 101 character are used for every 
# training example
examples_per_epoch = len(text)//(seq_length+1)  

# Create training examples / targets
# Converting the text we have to charecters
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

In [10]:
# create the batches for the training 
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

In [11]:
# convert the sequences into train and test dataset
def split_input_target(chunk):  # for the example: hello
    input_text = chunk[:-1]  # hell
    target_text = chunk[1:]  # ello
    return input_text, target_text  # hell, ello

dataset = sequences.map(split_input_target)  # we use map to apply the above function to every entry

In [12]:
for x, y in dataset.take(2):
  print("\n\nEXAMPLE\n")
  print("INPUT")
  print(int_to_text(x))
  print("\nOUTPUT")
  print(int_to_text(y))



EXAMPLE

INPUT
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You

OUTPUT
irst Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You 


EXAMPLE

INPUT
are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you 

OUTPUT
re all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you k


In [13]:
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab)  # vocab is number of unique characters
EMBEDDING_DIM = 256
RNN_UNITS = 1024

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

# Data is Batched and Shuffled
data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

### Building the model

In [14]:
# define a function to creat the model
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]), 
                               # None here refers to the length of each entry (in the training data we know that each entry has a length of 64, but we do not know in the prediction)
    tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        # the above line means to return the intermediate step, 
                        # if it was False then the model will return what the model found on the last step only
                        # But we want the output of each step
                        stateful=True,
                        recurrent_initializer='glorot_uniform'), # glorot_uniform: a good default to start with
    tf.keras.layers.Dense(vocab_size) #output, number of nodes = number of vocabs, contain the probability for each charecter
  ])
  return model

# we build the model with 64 Batch size which means that it takes 64 training example and gives a 64 output
model = build_model(VOCAB_SIZE,EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (64, None, 256)           16640     
                                                                 
 lstm (LSTM)                 (64, None, 1024)          5246976   
                                                                 
 dense (Dense)               (64, None, 65)            66625     
                                                                 
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


### Loss function

#### Applying the model without training on an example of the training data

In [15]:
for input_example_batch, target_example_batch in data.take(1):
  example_batch_predictions = model(input_example_batch)  # ask our model for a prediction on our first batch of training data (64 entries)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")  # print out the output shape

(64, 100, 65) # (batch_size, sequence_length, vocab_size)


In [16]:
# we can see that the predicition is an array of 64 arrays, one for each entry in the batch
print(len(example_batch_predictions))
print(example_batch_predictions)

64
tf.Tensor(
[[[ 4.20983182e-03  8.02002288e-03 -4.81667346e-04 ... -3.38387676e-03
   -6.78760058e-04  4.20863414e-03]
  [-3.42880696e-04  1.03519168e-02 -6.09154766e-03 ...  1.52908126e-03
    3.00851557e-03  7.17437733e-03]
  [ 4.40114969e-03  8.14041123e-03 -6.80756150e-03 ...  3.58796376e-03
    4.90041124e-03  2.43758271e-03]
  ...
  [ 6.23601722e-03  6.69120764e-03 -2.35187399e-05 ...  9.47748590e-03
    5.31150587e-03 -1.73910556e-03]
  [ 2.00737570e-03  3.78023344e-03  8.57006852e-03 ...  7.02480972e-03
   -3.66263418e-03 -6.08335808e-03]
  [ 4.57030442e-03  3.75389750e-03  6.86121685e-03 ...  3.13449861e-03
   -1.88101374e-03 -9.51782917e-04]]

 [[-2.32595345e-03  9.12027142e-04  9.17745114e-04 ... -5.35471551e-03
    5.01336763e-03  2.92073772e-03]
  [ 2.75188126e-03  6.12949952e-03 -4.08854662e-03 ... -1.53548535e-05
    2.80959811e-03  4.04890088e-05]
  [ 2.93527450e-03 -8.84137291e-04 -5.94140089e-04 ... -4.25737031e-04
    3.45557788e-03 -4.42400668e-03]
  ...
  [-2.590

In [17]:
# lets examine one prediction at all the timesteps
pred = example_batch_predictions[0]
print(len(pred))
print(pred)
# notice this is a 2d array of length 100, where each interior array is the prediction for the next character at each time step

100
tf.Tensor(
[[ 4.2098318e-03  8.0200229e-03 -4.8166735e-04 ... -3.3838768e-03
  -6.7876006e-04  4.2086341e-03]
 [-3.4288070e-04  1.0351917e-02 -6.0915477e-03 ...  1.5290813e-03
   3.0085156e-03  7.1743773e-03]
 [ 4.4011497e-03  8.1404112e-03 -6.8075615e-03 ...  3.5879638e-03
   4.9004112e-03  2.4375827e-03]
 ...
 [ 6.2360172e-03  6.6912076e-03 -2.3518740e-05 ...  9.4774859e-03
   5.3115059e-03 -1.7391056e-03]
 [ 2.0073757e-03  3.7802334e-03  8.5700685e-03 ...  7.0248097e-03
  -3.6626342e-03 -6.0833581e-03]
 [ 4.5703044e-03  3.7538975e-03  6.8612169e-03 ...  3.1344986e-03
  -1.8810137e-03 -9.5178292e-04]], shape=(100, 65), dtype=float32)


For every single training example we get an output with the same length of the trainig example.

In [18]:
# and finally we'll look at a prediction at the first timestep
# it contains the propapility of the occurence of every charecter at the first time step.
time_pred = pred[0]
print(len(time_pred))
print(time_pred)
# and of course its 65 values representing the probabillity of each character occuring next

65
tf.Tensor(
[ 0.00420983  0.00802002 -0.00048167 -0.00579414  0.00295614  0.00603179
 -0.00179811 -0.00146926  0.0032317   0.00214365 -0.00529083 -0.0031162
 -0.00213845  0.00276268  0.007439    0.00456556  0.00602576  0.00366999
  0.00468932 -0.00479632 -0.00343378  0.00349788 -0.00104607 -0.00250326
  0.00560063 -0.00284219 -0.00445524 -0.00041743  0.00503827 -0.00266773
  0.00115404  0.0003496   0.00012188  0.00230692  0.00024695 -0.00245997
 -0.00106231  0.00094773 -0.00039294  0.00021231  0.00286791 -0.00435733
  0.0013087   0.00200386  0.00222673 -0.00573003 -0.00290429  0.0002569
  0.00172754  0.00458243 -0.00210922 -0.00088042  0.00163365  0.00128896
 -0.00465017 -0.00178163  0.00445363  0.00352368  0.0051429   0.00147067
 -0.0022966  -0.00439372 -0.00338388 -0.00067876  0.00420863], shape=(65,), dtype=float32)


In [None]:
# If we want to determine the predicted character we need to sample the output distribution (pick a value based on probabillity)
# sampling is picking a value based on the propapility distribution, and it does not guarante that we've picked the greatest probability
sampled_indices = tf.random.categorical(pred, num_samples=1)

# now we can reshape that array and convert all the integers to numbers to see the actual characters
sampled_indices = np.reshape(sampled_indices, (1, -1))[0]
predicted_chars = int_to_text(sampled_indices)

predicted_chars  # and this is what the model predicted for training sequence 1

In [20]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

### Compiling the model

In [21]:
model.compile(optimizer='adam', loss=loss)

### Creating Checkpoint

In [22]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### Training

In [23]:
history = model.fit(data, epochs=50, callbacks=[checkpoint_callback])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


### Rebuilding the model
After training the model we need to rebuild it with different number of batch size. We had already build it with batch size = 64 which means that we had to pass a 64 entries at a time. Now we'll rebuild it with batch_size =1 and load the weights from the training. 

In [24]:
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)

In [25]:
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir)) # load the points from the latest checkpoints
model.build(tf.TensorShape([1, None])) # batch size = 1 with unknown length

In [None]:
# The following code to get a vertain checkpoint
# checkpoint_num = 10
# model.load_weights(tf.train.load_checkpoint("./training_checkpoints/ckpt_" + str(checkpoint_num)))
# model.build(tf.TensorShape([1, None]))

### Generating Text

In [None]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 800

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)  
  # we are expanding the dimensions because the model is expecting a input with 1 dimension

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)

      # remove the batch dimension
      # the predictions will be in neasted lists
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      # convert the text back to string
      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [None]:
inp = input("Type a starting string: ")
print(generate_text(model, inp))