# Generating Shakespearean Text with Character Based RNNs

Problem Statement:
Given a character or sequence of characters, we want to predict the next character at each time step. Model is trained to follow a language similar to the works of Shakespeare. The tinyshakespear dataset is used for training.

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import nltk
import os
import time

## Loading the Data

In [None]:
from google.colab import files
uploaded = files.upload()
#Dataset source:https://github.com/karpathy/char-rnn/tree/master/data/tinyshakespeare 
#Did not use the original stanford dataset (https://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt); too long- increases processing time

Saving shakespeare.txt to shakespeare.txt


In [None]:
#check if decoding is needed: text may need to be decoded as utf-8
text = open('shakespeare.txt', 'r').read() 
print(text[:200])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you


In [None]:
#Find Vocabulary (set of characters)
vocabulary = sorted(set(text))
print('No. of unique characters: {}'.format(len(vocabulary)))

No. of unique characters: 65


 ## Preprocessing Text

In [None]:
#character to index mapping
char2index = {c:i for i,c in enumerate(vocabulary)}
int_text = np.array([char2index[i] for i in text])

#Index to character mapping
index2char = np.array(vocabulary)


In [None]:
#Testing
print("Character to Index: \n")
for char,_ in zip(char2index, range(65)):
    print('  {:4s}: {:3d}'.format(repr(char), char2index[char]))

print("\nInput text to Integer: \n")
print('{} mapped to {}'.format(repr(text[:20]),int_text[:20])) #use repr() for debugging

Character to Index: 

  '\n':   0
  ' ' :   1
  '!' :   2
  '$' :   3
  '&' :   4
  "'" :   5
  ',' :   6
  '-' :   7
  '.' :   8
  '3' :   9
  ':' :  10
  ';' :  11
  '?' :  12
  'A' :  13
  'B' :  14
  'C' :  15
  'D' :  16
  'E' :  17
  'F' :  18
  'G' :  19
  'H' :  20
  'I' :  21
  'J' :  22
  'K' :  23
  'L' :  24
  'M' :  25
  'N' :  26
  'O' :  27
  'P' :  28
  'Q' :  29
  'R' :  30
  'S' :  31
  'T' :  32
  'U' :  33
  'V' :  34
  'W' :  35
  'X' :  36
  'Y' :  37
  'Z' :  38
  'a' :  39
  'b' :  40
  'c' :  41
  'd' :  42
  'e' :  43
  'f' :  44
  'g' :  45
  'h' :  46
  'i' :  47
  'j' :  48
  'k' :  49
  'l' :  50
  'm' :  51
  'n' :  52
  'o' :  53
  'p' :  54
  'q' :  55
  'r' :  56
  's' :  57
  't' :  58
  'u' :  59
  'v' :  60
  'w' :  61
  'x' :  62
  'y' :  63
  'z' :  64

Input text to Integer: 

'First Citizen:\nBefor' mapped to [18 47 56 57 58  1 15 47 58 47 64 43 52 10  0 14 43 44 53 56]


## Create Training Data

In [None]:
seq_length= 150 #max number of characters that can be fed as a single input
examples_per_epoch = len(text)

#converts text (vector) into character index stream
#Reference: https://www.tensorflow.org/api_docs/python/tf/data/Dataset
char_dataset = tf.data.Dataset.from_tensor_slices(int_text) 

In [None]:
#Create sequences from the individual characters. Our required size will be seq_length + 1 (character RNN)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

In [None]:
#Testing
print("Character Stream: \n")
for i in char_dataset.take(10):
  print(index2char[i.numpy()])  

print("\nSequence: \n")
for i in sequences.take(10):
  print(repr(''.join(index2char[i.numpy()])))  #use repr() for more clarity. str() keeps formatting it

Character Stream: 

F
i
r
s
t
 
C
i
t
i

Sequence: 

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are all resolved rather to die than to famish?\n\nAl'
"l:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you know Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us k"
"ill him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be done: away, away!\n\nSecond Citizen:\nOne word, good "
'citizens.\n\nFirst Citizen:\nWe are accounted poor citizens, the patricians good.\nWhat authority surfeits on would relieve us: if they\nwould yield us but '
'the superfluity, while it were\nwholesome, we might guess they relieved us humanely;\nbut they think we are too dear: the leanness that\nafflicts us, the '
'object of our misery, is as an\ninventory to particularise their abundance; our\nsufferance is a gain to them Let us revenge this with\nour pi

Target value: for each sequence of characters, we return that sequence, shifted one position to the right, along with the new character that is predicted to follow the sequence.

To create training examples of (input, target) pairs, we take the given sequence. The input is sequence with last word removed. Target is sequence with first word removed. 
Example: 
sequence: abc d ef
input: abc d e
target: bc d ef

In [None]:
def create_input_target_pair(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(create_input_target_pair)

In [None]:
#Testing
for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(index2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(index2char[target_example.numpy()])))

Input data:  'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are all resolved rather to die than to famish?\n\nA'
Target data: 'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou are all resolved rather to die than to famish?\n\nAl'


In [None]:
#Creating batches

BATCH_SIZE = 64

# Buffer used to shuffle the dataset 
# Reference: https://stackoverflow.com/questions/46444018/meaning-of-buffer-size-in-dataset-map-dataset-prefetch-and-dataset-shuffle
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 150), (64, 150)), types: (tf.int64, tf.int64)>

# GRU

## Building the Model- GRU

In [None]:
vocab_size = len(vocabulary)
embedding_dim = 256
rnn_units= 1024 

3 Layers used:
1. Input Layer: Maps character to 256 dimension vector
2. GRU Layer: RNN of size 1024
3. Dense Layer: Output with same size as vocabulary

Since it is a character level RNN, we can use keras.Sequential model (All layers have single input and single output).

In [None]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units, 
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

# Reference for GRU: https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU
# Reference for theory: https://jhui.github.io/2017/03/15/RNN-LSTM-GRU/

In [None]:
model = build_model(
  vocab_size = vocab_size,
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

In [None]:
#Testing: shape
for input_example_batch, target_example_batch in dataset.take(1):
    example_prediction = model(input_example_batch)
    assert (example_prediction.shape == (BATCH_SIZE, seq_length, vocab_size)), "Shape error"
    #print(example_prediction.shape)

In [None]:
#model.summary() 
#check shapes if necessary

In [None]:
#Sampling the distribution- gives predicted next character at every timestamp (Untrained)
sampled_indices = tf.random.categorical(example_prediction[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()


## Model Training

In [None]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

#Loss Function reference: https://www.dlology.com/blog/how-to-use-keras-sparse_categorical_crossentropy/

example_loss  = loss(target_example_batch, example_prediction)
print("Prediction shape: ", example_prediction.shape)
print("Loss:      ", example_loss.numpy().mean())

Prediction shape:  (64, 150, 65)
Loss:       4.174315


In [None]:
model.compile(optimizer='adam', loss=loss)

In [None]:
#Save model after every epoch (training was time consuming). 
#Reference: https://medium.com/@italojs/saving-your-weights-for-each-epoch-keras-callbacks-b494d9648202

dir_checkpoints= './training_checkpoints'
checkpoint_prefix = os.path.join(dir_checkpoints, "checkpt_{epoch}") #name
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix,save_weights_only=True)

In [None]:
EPOCHS=50 #increase number of epochs for better results

In [None]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [None]:
tf.train.latest_checkpoint(dir_checkpoints) #check last checkpoint

'./training_checkpoints/checkpt_50'

## Prediction- GRU
(Using batch size=1)

In [None]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(dir_checkpoints))
model.build(tf.TensorShape([1, None]))

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 256)            16640     
_________________________________________________________________
gru_1 (GRU)                  (1, None, 1024)           3938304   
_________________________________________________________________
dense_1 (Dense)              (1, None, 65)             66625     
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________


In [None]:
#Generate text from model
def generate_text(model, start_string):
  num_generate = 1000 #Number of characters to be generated

  input_eval = [char2index[s] for s in start_string] #vectorising input
  input_eval = tf.expand_dims(input_eval, 0)

  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 0.5

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(index2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [None]:
#Testing
#print(generate_text(model, start_string=u"ROMEO: "))

In [None]:
#test = input("Enter your starting string: ")
#print(generate_text(model, start_string=test))

In [None]:
#Predication with User Input
gru_test = input("Enter your starting string: ")
print(generate_text(model, start_string=gru_test))

Enter your starting string: The 
The outward war with him!

Third Servingman:
Why, then the widow of the senate, that
in some store of law to him and Margaret:
But if you have a helple suitors holds from her,
Betroth'd and labourinader, have frown crafts
To Sly man and her a secret country's crown,
Which I do last pronounce, it is not mortal to
him.

Clown:
Is there a man whose bolder dried blood that lies
From many a gentleman that bear the war;
Bear her most sweet work, you are the measure
As you intended to thy crown; come, take my mind,
And in my houses of the news then the world
Were foundage in me.

FRIAR LAURENCE:
That's my good son: he is coming hither.

PETRUCHIO:
Then thus: I have fed upon me: I have reason;
And there I am committed.

Provost:
I cannot bloody tride or no rage doth grow.

CATESBY:
Fie, and but I will marry you. Go you see, I think,
Thou mayst thine ears and restluction of a guilty house!
Mean me in prothecount that you love my brother.

LUCIO:

First Senator:


# LSTM

## Building the Model- LSTM

In [None]:
vocab_size = len(vocabulary)
embedding_dim = 256
rnn_units= 1024 

In [None]:
def build_model_lstm(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units, 
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

In [None]:
lstm_model = build_model_lstm(
  vocab_size = vocab_size,
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)



In [None]:
#Testing: shape
for input_example_batch, target_example_batch in dataset.take(1):
    example_prediction = lstm_model(input_example_batch)
    assert (example_prediction.shape == (BATCH_SIZE, seq_length, vocab_size)), "Shape error"
    #print(example_prediction.shape)

In [None]:
sampled_indices = tf.random.categorical(example_prediction[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

In [None]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

#Loss Function reference: https://www.dlology.com/blog/how-to-use-keras-sparse_categorical_crossentropy/

example_loss  = loss(target_example_batch, example_prediction)
print("Prediction shape: ", example_prediction.shape)
print("Loss:      ", example_loss.numpy().mean())

Prediction shape:  (64, 150, 65)
Loss:       4.1740937


In [None]:
lstm_model.compile(optimizer='adam', loss=loss)

In [None]:
lstm_dir_checkpoints= './training_checkpoints_LSTM'
checkpoint_prefix = os.path.join(lstm_dir_checkpoints, "checkpt_{epoch}") #name
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix,save_weights_only=True)

In [None]:
EPOCHS=60 #increase number of epochs for better results (lesser loss)

In [None]:
history = lstm_model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60


In [None]:
tf.train.latest_checkpoint(lstm_dir_checkpoints)

'./training_checkpoints_LSTM/checkpt_60'

## Prediction

In [None]:
lstm_model = build_model_lstm(vocab_size, embedding_dim, rnn_units, batch_size=1)
lstm_model.load_weights(tf.train.latest_checkpoint(lstm_dir_checkpoints))
lstm_model.build(tf.TensorShape([1, None]))

lstm_model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (1, None, 256)            16640     
_________________________________________________________________
lstm_1 (LSTM)                (1, None, 1024)           5246976   
_________________________________________________________________
dense_3 (Dense)              (1, None, 65)             66625     
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________


In [None]:
def generate_text(model, start_string):
  num_generate = 1000 #Number of characters to be generated

  input_eval = [char2index[s] for s in start_string] #vectorising input
  input_eval = tf.expand_dims(input_eval, 0)

  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 0.5

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(index2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [None]:
#Testing
#print(generate_text(lstm_model, start_string=u"ROMEO: "))

In [None]:
#test = input("Enter your starting string: ")
#print(generate_text(lstm_model, start_string=test))

In [None]:
#Prediction with User Input
lstm_test = input("Enter your starting string: ")
print(generate_text(lstm_model, start_string=lstm_test))

Enter your starting string: The
The are thy hearts to me as a face o' the year of the
shepherd; it may be so highness and his dishonour dies,
Or my shamed life in quarrel of his. I awain
That would be to be earthe nothing but one of your not venturous.

ANGELO:
Believe me, on mine honour,
I'll undertake to past her friends: you have
not be so loud.

BENVOLIO:
At this same authority: if you refuse
Why, then both proud and not answer to his majesty.
Farewell: she shall not seem to be thought upon
The people are inclined to his character, much in praters.

AUTOLYCUS:

Clown:
I will to your court, she had as lief
Than ever the vantage of his looks I give the lord.

BRAKENBURY:
What says he?

NORTHUMBERLAND:
Nay, mark mad you mean no herself approaches.

MENENIUS:
An e'er speak with you.

PROSPERO:
This better enter, to make a lusty women
Ere he the blood of MEnour prodigalities,
And say 'tis noble and my promise
That long shut thee, to the whole body. The king shall do it.
The Eath is brot