# Text generation using NLP for Shakespeare plays

#### For opensource text corpus https://www.gutenberg.org/  is a good reference
#### Refer blog for more details about RNNs : http://karpathy.github.io/2015/05/21/rnn-effectiveness/

## Steps 

* STEP 1 : Load text
    - Load text corpus. Ideally the text corpus should be in millions of characters for reliable predictions
    - Understand the characters and style of the text
    - Find out vocabulary length for the text input
    
* STEP 2 : Text processing
    - Vectorize the character symbols(vocabulory). (In this example there are around 84 distinct characters)
    - Encode the entire text   
    
* STEP 3 : Sequence/Batch generation
    - Identify any patter in the text and choose a time sequence length (here 120 characters)
    - Convert the text in to tensorflow dataset sequences (batch_size of 128, dataset size = 128,120, 84)
    - Shuffle the batches so that the consecutive batches are generated from different parts of the text
    
* STEP 4: Model creation
    - Define loss function - sparse_categorical_loss since the output labels large and one-hot encoded
    - Create model with Embedding layer -> GRU -> Dense
    - Embedding vector size can be choosen based on the number of encoding the text.(Taking embeddings size 64) 
    - Choose large number of GRU units (around 1024 for this type of examples)
    - Dense layer out put size should be of vocab size(84)
    - (Epochs are around 30 in this examples)
 
* STEP 5: Model training 
    - Train the model and save its weigths for re-use
    - Since the trainable parameters in this example are very huge (~3.5 million), use of GPU or google colab notes is recommended

* STEP 6 : Text generator
    - Load model weights and create a model which takes single text sequence at a time (batch_size of 1)
    - Predict model output and choose a single output character based on the probability score and temperature from possible outputs (in this case 84) 
    - Iterate and keep appending the newly predicted output till the required size
    
    
    

In [705]:
import numpy as np
import pandas as pd

In [706]:
import tensorflow as tf

## STEP 1 : Load text

In [707]:
text = open('01_nlp_text_gen_shakespeare_data.txt', 'r').read()

In [708]:
type(text)

str

In [709]:
#text

In [710]:
#sample text
print(text[:1000])


                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st waste in niggarding:
    Pity the world, or else this glutton be,
    To eat the world's due, by the grave and thee.


                     2
  When forty winters shall besiege thy brow,
  And dig deep trenches in thy beauty's field,
  Thy youth's proud livery so gazed on now,
  Will be a tattered weed of small worth held:  
  Then being asked, where all thy beauty lies,
  Where all the treasure of thy lusty days;
  To say within thine own deep su

#### number of encodings

In [711]:
vocab = sorted(set(text))

In [712]:
vocab_len = len(vocab)

In [713]:
vocab_len

84

## STEP 2: Text processing - vectorization

#### Vectorizing text

In [714]:
char_to_ind = {char: ind for ind, char in enumerate(vocab)}

In [715]:
ind_to_char = np.array(vocab)

In [716]:
char_to_ind

{'\n': 0,
 ' ': 1,
 '!': 2,
 '"': 3,
 '&': 4,
 "'": 5,
 '(': 6,
 ')': 7,
 ',': 8,
 '-': 9,
 '.': 10,
 '0': 11,
 '1': 12,
 '2': 13,
 '3': 14,
 '4': 15,
 '5': 16,
 '6': 17,
 '7': 18,
 '8': 19,
 '9': 20,
 ':': 21,
 ';': 22,
 '<': 23,
 '>': 24,
 '?': 25,
 'A': 26,
 'B': 27,
 'C': 28,
 'D': 29,
 'E': 30,
 'F': 31,
 'G': 32,
 'H': 33,
 'I': 34,
 'J': 35,
 'K': 36,
 'L': 37,
 'M': 38,
 'N': 39,
 'O': 40,
 'P': 41,
 'Q': 42,
 'R': 43,
 'S': 44,
 'T': 45,
 'U': 46,
 'V': 47,
 'W': 48,
 'X': 49,
 'Y': 50,
 'Z': 51,
 '[': 52,
 ']': 53,
 '_': 54,
 '`': 55,
 'a': 56,
 'b': 57,
 'c': 58,
 'd': 59,
 'e': 60,
 'f': 61,
 'g': 62,
 'h': 63,
 'i': 64,
 'j': 65,
 'k': 66,
 'l': 67,
 'm': 68,
 'n': 69,
 'o': 70,
 'p': 71,
 'q': 72,
 'r': 73,
 's': 74,
 't': 75,
 'u': 76,
 'v': 77,
 'w': 78,
 'x': 79,
 'y': 80,
 'z': 81,
 '|': 82,
 '}': 83}

In [717]:
ind_to_char

array(['\n', ' ', '!', '"', '&', "'", '(', ')', ',', '-', '.', '0', '1',
       '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '>', '?',
       'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
       'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
       '[', ']', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
       'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
       'w', 'x', 'y', 'z', '|', '}'], dtype='<U1')

In [718]:
char_to_ind['s'], ind_to_char[74]

(74, 's')

#### Encode entire text

In [719]:
encoded_txt = np.array([char_to_ind[ch] for ch in text])

In [720]:
encoded_txt

array([ 0,  1,  1, ..., 30, 39, 29])

#### Note : There is a '\n' character at the begining of the text block - length can be 1 char less 

In [721]:
encoded_txt_len = len(encoded_txt)
encoded_txt_len

5445609

In [722]:
encoded_txt[:100]

array([ 0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1, 12,  0,  1,  1, 31, 73, 70, 68,  1, 61, 56, 64,
       73, 60, 74, 75,  1, 58, 73, 60, 56, 75, 76, 73, 60, 74,  1, 78, 60,
        1, 59, 60, 74, 64, 73, 60,  1, 64, 69, 58, 73, 60, 56, 74, 60,  8,
        0,  1,  1, 45, 63, 56, 75,  1, 75, 63, 60, 73, 60, 57, 80,  1, 57,
       60, 56, 76, 75, 80,  5, 74,  1, 73, 70, 74, 60,  1, 68, 64])

## Step 3: Creating batches

#### Observe pattern for choosing a sequence length

In [723]:
print(text[:500])


                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bu


#### There is a pattern in ending words in every alternate lines
* increase-decease, eyes-lies, fuel - cruel
* Taking three lines as a sequence

In [724]:
stanza = """From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,"""

In [725]:
len(stanza)

131

#### Taking sequence of 120 characters for estimating the next characters 

In [726]:
seq_len = 120

#### Create a batch of size 120 characters from text corpus

In [727]:
dataset = tf.data.Dataset.from_tensor_slices(encoded_txt)

In [728]:
dataset

<TensorSliceDataset shapes: (), types: tf.int32>

In [729]:
# Take forst 4 elements in the encoded text using tensorflow dataset
for element in dataset.take(4):
    print(element)

tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)


#### Create i/p and target text sequences  
* i/p and target Sequences of size 120
* For creating seq size of 120, we need 121 characters outof which 1-120 are for i/p, 2-121 for target
     

In [730]:
#create sequences in bacthes of size 120, Dropping last few characters
#Combines consecutive elements of this dataset into batches.
seq_dataset = dataset.batch(batch_size = seq_len +1 , drop_remainder=True)

In [731]:
num_seq = encoded_txt_len//(seq_len+1)
num_seq

45005

In [732]:
seq_dataset.take(1)

<TakeDataset shapes: (121,), types: tf.int32>

#### Create i/p and target sequences from the extracted dataset

In [733]:
def create_seq(dataset):
    ip_seq = dataset[:-1]
    target_seq = dataset[1:]
    return ip_seq, target_seq

In [734]:
sequences = seq_dataset.map(create_seq)

#### Display two consecutive sequences

In [735]:
for ip, op in sequences.take(2):
    #ip
    print(ip.numpy())
    print('\n')
    print("".join(ind_to_char[ip.numpy()]))
    print('\n')

    #op
    print(op.numpy())
    print('\n')
    print("".join(ind_to_char[op.numpy()]))
    print('\n')

[ 0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 12  0
  1  1 31 73 70 68  1 61 56 64 73 60 74 75  1 58 73 60 56 75 76 73 60 74
  1 78 60  1 59 60 74 64 73 60  1 64 69 58 73 60 56 74 60  8  0  1  1 45
 63 56 75  1 75 63 60 73 60 57 80  1 57 60 56 76 75 80  5 74  1 73 70 74
 60  1 68 64 62 63 75  1 69 60 77 60 73  1 59 64 60  8  0  1  1 27 76 75]



                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But


[ 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 12  0  1
  1 31 73 70 68  1 61 56 64 73 60 74 75  1 58 73 60 56 75 76 73 60 74  1
 78 60  1 59 60 74 64 73 60  1 64 69 58 73 60 56 74 60  8  0  1  1 45 63
 56 75  1 75 63 60 73 60 57 80  1 57 60 56 76 75 80  5 74  1 73 70 74 60
  1 68 64 62 63 75  1 69 60 77 60 73  1 59 64 60  8  0  1  1 27 76 75  1]


                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But 


[56 74  1 75 63 60

#### Shuffle dataset sequences 

In [736]:
buffer_size = 10000 # for internal use
sequences = sequences.shuffle(buffer_size = buffer_size)

In [737]:
for ip, op in sequences.take(1):
    #ip
    print(ip.numpy())
    print('\n')
    print("".join(ind_to_char[ip.numpy()]))
    print('\n')

    #op
    print(op.numpy())
    print('\n')
    print("".join(ind_to_char[op.numpy()]))
    print('\n')

[ 1 45 63 56 75  5 74  1 68 80  1 57 73 56 77 60  1 57 70 80 10  0  1  1
 47 40 37 46 38 39 34 26 10  1 30 77 60 69  1 63 60  8  1 80 70 76 73  1
 78 64 61 60  8  1 75 63 64 74  1 67 56 59 80  8  1 56 69 59  1 68 80 74
 60 67 61  8  0  1  1  1  1 26 73 60  1 74 76 64 75 70 73 74  1 75 70  1
 80 70 76 10  0  1  1 28 40 43 34 40 37 26 39 46 44 10  1 34  1 57 60 74]


 That's my brave boy.
  VOLUMNIA. Even he, your wife, this lady, and myself,
    Are suitors to you.
  CORIOLANUS. I bes


[45 63 56 75  5 74  1 68 80  1 57 73 56 77 60  1 57 70 80 10  0  1  1 47
 40 37 46 38 39 34 26 10  1 30 77 60 69  1 63 60  8  1 80 70 76 73  1 78
 64 61 60  8  1 75 63 64 74  1 67 56 59 80  8  1 56 69 59  1 68 80 74 60
 67 61  8  0  1  1  1  1 26 73 60  1 74 76 64 75 70 73 74  1 75 70  1 80
 70 76 10  0  1  1 28 40 43 34 40 37 26 39 46 44 10  1 34  1 57 60 74 60]


That's my brave boy.
  VOLUMNIA. Even he, your wife, this lady, and myself,
    Are suitors to you.
  CORIOLANUS. I bese




#### Combine sequences in to batches for training
* Taking batches of batch_size = 128 eamples for training
* There are total num_seq = 45005 number of sequences
* Each sequence of size seq_len = 120

In [738]:
#Combines consecutive elements of this dataset into batches.
batch_size = 128 # Number of examples taken for updating weights in traning
sequences = sequences.batch(batch_size, drop_remainder=True)

In [739]:
sequences

<BatchDataset shapes: ((128, 120), (128, 120)), types: (tf.int32, tf.int32)>

## Steap 4 : Creating the model
#### Recommneded training on GPU or on google colab
* Embedding layer with 64 vectors (input vocab = 84, output embedding vector = 64)
    * Turns positive integers (indexes) into dense vectors of fixed size.
* GRU units 1026
* Dense layer of size vocab = 84
* Loss function 
    * sparse_categorical_crossentropy becasuse labels are integrers and not on-hot encoded
    * Use parameter "from_logits = True" because the exppected output is label not probability distribution
* Epochs = 30


#### Define loss funtion

In [740]:
from tensorflow.keras.losses import sparse_categorical_crossentropy

In [741]:
def loss_sparse_cat(y_true, y_pred):
    return sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

#### Creating model method - with batch size of 128

In [742]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

In [743]:
#vocab_len = 84
#seq_len = 120
#batch_size = 128
#embed_dim = 64
#rnn_units = 1024

In [744]:
def create_model(vocab_len, embed_dim, rnn_units, batch_size):
    model = Sequential()
    model.add(Embedding(vocab_len, embed_dim, batch_input_shape = [batch_size, None]))
    # return_sequences: Whether to return the last output in the output sequence, or the full sequence.
    # stateful: output is fed into following unit
    model.add(GRU(rnn_units, return_sequences = True, stateful = True, recurrent_initializer = 'glorot_uniform'))
    model.add(Dense(vocab_len))
    model.compile(loss = loss_sparse_cat, optimizer = 'adam')
    return model

In [745]:
#vocab_len = 84
#seq_len = 120
#batch_size = 128
embed_dim = 64
rnn_units = 1026
model = create_model(vocab_len, embed_dim, rnn_units, batch_size)

In [746]:
model.summary()

Model: "sequential_23"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_23 (Embedding)     (128, None, 64)           5376      
_________________________________________________________________
gru_23 (GRU)                 (128, None, 1026)         3361176   
_________________________________________________________________
dense_23 (Dense)             (128, None, 84)           86268     
Total params: 3,452,820
Trainable params: 3,452,820
Non-trainable params: 0
_________________________________________________________________


## Step 5 : Training the model
* Since there are more than 3.4 M parameters, recommended to use GPU or google colab 
* Using google colab for training:
    * upload notebook : open googke colab , upload notebook
    * upload data file : files - upload - data file
    * in the begining of the notebook use: %tensorflow_version 2.x
    * make sure GPU is enabled: Edit -> Note book settings
    * run all
 -> GPU

In [747]:
#model.fit(sequences, epochs = 30)

#### save model

In [748]:
#model.save('01_nlp_text_gen_shakespeare.h5')

### Check how the model is performing on a single sequence

#### load model weights and create a model which accepts single batch of input sequence

In [749]:
from tensorflow.keras.models import load_model

In [750]:
#create a model instance
#vocab_len = 84
#seq_len = 120
batch_size = 1
#embed_dim = 64
#rnn_units = 1026
model = create_model(vocab_len, embed_dim, rnn_units, batch_size)

In [751]:
#Load weights from the pre-trained model
model.load_weights('01_nlp_text_gen_shakespeare.h5')

In [752]:
#Builds the model based on input shapes received.
model.build(tf.TensorShape([batch_size, None]))

In [753]:
model.summary()

Model: "sequential_24"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_24 (Embedding)     (1, None, 64)             5376      
_________________________________________________________________
gru_24 (GRU)                 (1, None, 1026)           3361176   
_________________________________________________________________
dense_24 (Dense)             (1, None, 84)             86268     
Total params: 3,452,820
Trainable params: 3,452,820
Non-trainable params: 0
_________________________________________________________________


#### Prepare singe input sequence

In [754]:
for ip_batch, op_batch in sequences.take(1):
    ex_ip = ip_batch[0]
    ex_op = op_batch[0]
    
ex_ip.shape

TensorShape([120])

In [755]:
#Input text 
print(ind_to_char[ex_ip])

['e' 'b' 'e' 'l' 'l' 'i' 'o' 'u' 's' ' ' 't' 'o' ' ' 'h' 'i' 's' ' ' 'a'
 'r' 'm' ',' ' ' 'l' 'i' 'e' 's' ' ' 'w' 'h' 'e' 'r' 'e' ' ' 'i' 't' ' '
 'f' 'a' 'l' 'l' 's' ',' '\n' ' ' ' ' ' ' ' ' ' ' ' ' 'R' 'e' 'p' 'u' 'g'
 'n' 'a' 'n' 't' ' ' 't' 'o' ' ' 'c' 'o' 'm' 'm' 'a' 'n' 'd' '.' ' ' 'U'
 'n' 'e' 'q' 'u' 'a' 'l' ' ' 'm' 'a' 't' 'c' 'h' "'" 'd' ',' '\n' ' ' ' '
 ' ' ' ' ' ' ' ' 'P' 'y' 'r' 'r' 'h' 'u' 's' ' ' 'a' 't' ' ' 'P' 'r' 'i'
 'a' 'm' ' ' 'd' 'r' 'i' 'v' 'e' 's' ',' ' ' 'i']


In [756]:
#True output
print(ind_to_char[ex_op])

['b' 'e' 'l' 'l' 'i' 'o' 'u' 's' ' ' 't' 'o' ' ' 'h' 'i' 's' ' ' 'a' 'r'
 'm' ',' ' ' 'l' 'i' 'e' 's' ' ' 'w' 'h' 'e' 'r' 'e' ' ' 'i' 't' ' ' 'f'
 'a' 'l' 'l' 's' ',' '\n' ' ' ' ' ' ' ' ' ' ' ' ' 'R' 'e' 'p' 'u' 'g' 'n'
 'a' 'n' 't' ' ' 't' 'o' ' ' 'c' 'o' 'm' 'm' 'a' 'n' 'd' '.' ' ' 'U' 'n'
 'e' 'q' 'u' 'a' 'l' ' ' 'm' 'a' 't' 'c' 'h' "'" 'd' ',' '\n' ' ' ' ' ' '
 ' ' ' ' ' ' 'P' 'y' 'r' 'r' 'h' 'u' 's' ' ' 'a' 't' ' ' 'P' 'r' 'i' 'a'
 'm' ' ' 'd' 'r' 'i' 'v' 'e' 's' ',' ' ' 'i' 'n']


#### Predict output for the given sequence

In [757]:
#reshape the input sequence as per expected model input shape
ex_ip_model = tf.expand_dims(ex_ip, axis = 0)
ex_ip_model

<tf.Tensor: id=713166, shape=(1, 120), dtype=int32, numpy=
array([[60, 57, 60, 67, 67, 64, 70, 76, 74,  1, 75, 70,  1, 63, 64, 74,
         1, 56, 73, 68,  8,  1, 67, 64, 60, 74,  1, 78, 63, 60, 73, 60,
         1, 64, 75,  1, 61, 56, 67, 67, 74,  8,  0,  1,  1,  1,  1,  1,
         1, 43, 60, 71, 76, 62, 69, 56, 69, 75,  1, 75, 70,  1, 58, 70,
        68, 68, 56, 69, 59, 10,  1, 46, 69, 60, 72, 76, 56, 67,  1, 68,
        56, 75, 58, 63,  5, 59,  8,  0,  1,  1,  1,  1,  1,  1, 41, 80,
        73, 73, 63, 76, 74,  1, 56, 75,  1, 41, 73, 64, 56, 68,  1, 59,
        73, 64, 77, 60, 74,  8,  1, 64]])>

In [758]:
ex_pred = model.predict(ex_ip_model)
ex_pred

array([[[  0.35623282,   0.42526245,   1.0287836 , ...,   1.6154257 ,
          -1.814107  ,  -2.450402  ],
        [  1.1557689 ,   2.2304497 ,   4.386735  , ...,  -0.47502446,
         -10.679666  ,  -5.18043   ],
        [  3.753955  ,   5.29618   ,   1.3525339 , ...,  -4.1217804 ,
         -10.454757  ,  -9.871591  ],
        ...,
        [ 10.282167  ,  14.723573  ,   2.896478  , ...,  -7.7263856 ,
          -5.9493713 ,  -7.7705812 ],
        [ -5.94896   ,   2.662288  ,  -6.838647  , ...,  -5.8988333 ,
         -13.018035  , -12.699368  ],
        [ -7.7277865 ,  -3.082776  ,  -4.9345417 , ...,  -3.2850256 ,
         -14.239203  ,  -8.754406  ]]], dtype=float32)

#### Choose top result and form the output sequence

In [759]:
ex_pred.shape

(1, 120, 84)

In [760]:
# reshape the prediced output sequence for label selection
ex_pred = tf.squeeze(ex_pred)
ex_pred.shape

TensorShape([120, 84])

In [761]:
#Draws samples from a categorical distribution.
ex_pred = tf.random.categorical(ex_pred, num_samples=1)
ex_pred

<tf.Tensor: id=713685, shape=(120, 1), dtype=int64, numpy=
array([[77],
       [ 2],
       [74],
       [ 9],
       [ 5],
       [69],
       [76],
       [74],
       [ 0],
       [59],
       [56],
       [ 1],
       [75],
       [60],
       [74],
       [ 1],
       [38],
       [59],
       [68],
       [10],
       [ 0],
       [67],
       [70],
       [66],
       [74],
       [74],
       [75],
       [64],
       [60],
       [73],
       [60],
       [ 1],
       [75],
       [75],
       [ 1],
       [58],
       [64],
       [73],
       [67],
       [74],
       [ 1],
       [ 0],
       [ 1],
       [ 1],
       [ 1],
       [ 1],
       [26],
       [ 1],
       [ 1],
       [60],
       [56],
       [67],
       [74],
       [64],
       [64],
       [69],
       [58],
       [67],
       [75],
       [73],
       [ 1],
       [68],
       [70],
       [69],
       [68],
       [64],
       [69],
       [59],
       [ 8],
       [ 0],
       [27],
       [71],
     

In [762]:
ex_pred.shape

TensorShape([120, 1])

In [763]:
ex_pred = tf.squeeze(ex_pred).numpy()

In [764]:
ex_pred

array([77,  2, 74,  9,  5, 69, 76, 74,  0, 59, 56,  1, 75, 60, 74,  1, 38,
       59, 68, 10,  0, 67, 70, 66, 74, 74, 75, 64, 60, 73, 60,  1, 75, 75,
        1, 58, 64, 73, 67, 74,  1,  0,  1,  1,  1,  1, 26,  1,  1, 60, 56,
       67, 74, 64, 64, 69, 58, 67, 75, 73,  1, 68, 70, 69, 68, 64, 69, 59,
        8,  0, 27, 71, 74, 73, 76, 64, 67,  1, 69, 60, 69, 75, 60,  8, 59,
        8,  0,  1,  1,  1,  1, 33, 26, 38, 70, 73, 64, 80, 76, 74,  1, 56,
       74,  1, 28, 64, 70, 56, 68,  5, 61, 70, 70, 77, 56, 69,  8,  1, 68,
        5], dtype=int64)

In [765]:
ex_op.numpy()

array([57, 60, 67, 67, 64, 70, 76, 74,  1, 75, 70,  1, 63, 64, 74,  1, 56,
       73, 68,  8,  1, 67, 64, 60, 74,  1, 78, 63, 60, 73, 60,  1, 64, 75,
        1, 61, 56, 67, 67, 74,  8,  0,  1,  1,  1,  1,  1,  1, 43, 60, 71,
       76, 62, 69, 56, 69, 75,  1, 75, 70,  1, 58, 70, 68, 68, 56, 69, 59,
       10,  1, 46, 69, 60, 72, 76, 56, 67,  1, 68, 56, 75, 58, 63,  5, 59,
        8,  0,  1,  1,  1,  1,  1,  1, 41, 80, 73, 73, 63, 76, 74,  1, 56,
       75,  1, 41, 73, 64, 56, 68,  1, 59, 73, 64, 77, 60, 74,  8,  1, 64,
       69])

In [766]:
#predicted output
print("".join(ind_to_char[ex_pred]))

v!s-'nus
da tes Mdm.
loksstiere tt cirls 
    A  ealsiincltr monmind,
Bpsruil nente,d,
    HAMoriyus as Cioam'foovan, m'


In [767]:
#expected output
print("".join(ind_to_char[ex_op]))

bellious to his arm, lies where it falls,
      Repugnant to command. Unequal match'd,
      Pyrrhus at Priam drives, in


## Step 6 : Text generation
* Load model weights and create a model that accepts single input sequence
* Keep on generting next character in the sequence and appending it to the result text

In [768]:
#Instantiate model
model = create_model(vocab_len, embed_dim, rnn_units, batch_size = 1)
model.load_weights('01_nlp_text_gen_shakespeare.h5')
model.build(input_shape = tf.TensorShape([1, None]))
model.summary()

Model: "sequential_25"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_25 (Embedding)     (1, None, 64)             5376      
_________________________________________________________________
gru_25 (GRU)                 (1, None, 1026)           3361176   
_________________________________________________________________
dense_25 (Dense)             (1, None, 84)             86268     
Total params: 3,452,820
Trainable params: 3,452,820
Non-trainable params: 0
_________________________________________________________________


#### Generating each character at time

In [769]:
def generate_text(model, start_seed, gen_size = 500, temperature = 1.0):
    """ 
    model: Trained Model to Generate Text
    start_seed: Intial Seed text in string form
    gen_size: Number of characters to generate
    temperature : is used to effect probability of next characters. Temperature effects randomness in our resulting text

    Basic idea behind this function is to take in some seed text, format it so
    that it is in the correct shape for our network, then loop the sequence as
    we keep adding our own predicted characters. Similar to our work in the RNN
    time series problems.
    """
    text_generated = []
    input_eval = [char_to_ind[ch] for ch in start_seed] 
    input_eval = tf.expand_dims(input_eval, axis = 0) # input shape as per the model requirements
    
    model.reset_states() # clear hidden states of the network 
    for i in range(gen_size):
        predictions = model.predict(input_eval) # prediction probabilities for each logit
        predictions = tf.squeeze( predictions, axis = 0) # since batch size is 1, remove this dimension
        predictions = predictions/temperature #use temperature variable to manipulate prediction probabilities
        predicted_id = tf.random.categorical(predictions, num_samples = 1)#choose top probability output for each logit
        predicted_id = predicted_id[-1, 0].numpy() #collect last logit output
        #print(predicted_id)
        input_eval = tf.expand_dims([predicted_id], axis = 0)  # Pass the predicted charracter for the next input
        text_generated.append(ind_to_char[predicted_id])
        #print(text_generated)
    
    return (start_seed + "".join(text_generated))

In [770]:
print(generate_text(model, "flower", gen_size = 1000))

flowers.
  LEONER. My duty in the nige,
    Th' other has I his tongue- stopp'd her moved,
    As blenting lappingly lies under the
    gone with this plaina lik'd, Caesar will so quickly turn, good friend will she take each one that
    drown'd-and washes.
  SPEED. how thou didst swear to luck growantly
    Would stand me diare their bosoms to inherit.
  SHRLONT. And thus:
    Brow up, and fear no blushing. Let me know her wrath
    And say she workn in thee.
    Who hath done this
    Will speak your skith what he was
    As by a forfeit of this, girl; she my sost ink.
  SILVIA. There, take them from a ucause of his love,
  And flourish'd from me, and more worthiness,
    Direction of that lord whereof thou mightst
    That I men draw a well-deeplexion.
  CLOWN. Good ev'n fooling, who shall shepherd see the jump away

                                   Re-enter CHERPITEN

  CHIRON. Meen the Clarence to thee; good Master Touch.
    The which he fix upon this present,
         Into the

#### Good Bye !!