<pre>
QUEENE:
I had thought thou hadst a Roman; for the oracle,
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
Your children were in your holy love,
And the precipitation through the bleeding throne.

BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?

ESCALUS:
The cause why then we are all resolved more sons.

VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.

QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.

PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m
</pre>

In [1]:
import tensorflow as tf

import numpy as np
import os
import time

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


In [3]:
# Read text file and decode
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


In [4]:
vocab = sorted(set(text))
print(f'{len(vocab)} unique characters')

65 unique characters


In [5]:
#Create StringLookup layer
chars_to_ids = tf.keras.layers.StringLookup(vocabulary=vocab,mask_token=None)
ids_to_chars = tf.keras.layers.StringLookup(vocabulary=chars_to_ids.get_vocabulary(),invert=True,mask_token=None)

all_ids = chars_to_ids(tf.strings.unicode_split(text, 'UTF-8'))
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

#Until this step, you will obtain a tensorflow dataset of character IDs

for ids in ids_dataset.take(10):
    print(ids_to_chars(ids).numpy().decode('UTF-8'))

F
i
r
s
t
 
C
i
t
i


In [6]:
#Define sequence length and example per epoch
seq_length = 100
example_per_epoch = len(text)//(seq_length+1) 

sequences = ids_dataset.batch(seq_length+1,drop_remainder=True)

for sequence in sequences.take(1):
    print(ids_to_chars(sequence))

tf.Tensor(
[b'F' b'i' b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':'
 b'\n' b'B' b'e' b'f' b'o' b'r' b'e' b' ' b'w' b'e' b' ' b'p' b'r' b'o'
 b'c' b'e' b'e' b'd' b' ' b'a' b'n' b'y' b' ' b'f' b'u' b'r' b't' b'h'
 b'e' b'r' b',' b' ' b'h' b'e' b'a' b'r' b' ' b'm' b'e' b' ' b's' b'p'
 b'e' b'a' b'k' b'.' b'\n' b'\n' b'A' b'l' b'l' b':' b'\n' b'S' b'p' b'e'
 b'a' b'k' b',' b' ' b's' b'p' b'e' b'a' b'k' b'.' b'\n' b'\n' b'F' b'i'
 b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':' b'\n' b'Y'
 b'o' b'u' b' '], shape=(101,), dtype=string)


In [7]:
#Function that joins the characters together
def text_from_ids(ids):
  return tf.strings.reduce_join(ids_to_chars(ids), axis=-1)
    
for sequence in sequences.take(1):
    print(text_from_ids(sequence))

#Function that creates input and label, label is one step ahead of input
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

tf.Tensor(b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou ', shape=(), dtype=string)


In [8]:
#Print out first batch of input and label
for input_example, target_example in dataset.take(1):
    print("Input :", text_from_ids(input_example).numpy())
    print("Target:", text_from_ids(target_example).numpy())

Input : b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target: b'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


In [9]:
BATCH_SIZE = 64
BUFFER_SIZE = 10000

dataset_prefetch = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE,drop_remainder=True).prefetch(tf.data.experimental.AUTOTUNE)

#Data preparation is completed

In [10]:
#Some hyperparameters
vocab_size = len(vocab)
embedding_dim = 256
rnn_units = 1024

#Create model
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True,
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

model = MyModel(len(chars_to_ids.get_vocabulary()),embedding_dim,rnn_units)
    

In [11]:
for input_batch, target_batch in dataset_prefetch.take(1):
    example_batch_prediction = model(input_batch)
    print(example_batch_prediction)
    print(example_batch_prediction.shape)
    sampled_indices = tf.random.categorical(example_batch_prediction[0], num_samples=1)
    sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()
    print(sampled_indices)
    print("Input: ", text_from_ids(input_batch[0]).numpy(),"\n")
    print("Output:", text_from_ids(sampled_indices).numpy())

model.summary()

loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',loss=loss)

tf.Tensor(
[[[ 0.00488965  0.00068259 -0.00153353 ...  0.00733591 -0.00974749
   -0.0007874 ]
  [ 0.00972322 -0.00383652  0.00700272 ...  0.0055703  -0.00410091
    0.00991068]
  [ 0.01258326 -0.01383366  0.0041693  ...  0.00134813 -0.00333804
    0.01888752]
  ...
  [ 0.01108962 -0.00550548 -0.00387047 ...  0.00599941 -0.0080848
   -0.00770336]
  [ 0.01093642 -0.00764689 -0.01293551 ...  0.00728874  0.00206466
   -0.00856296]
  [ 0.00045355 -0.00731023  0.00479691 ...  0.01368865  0.0095529
    0.00959322]]

 [[ 0.00126451  0.00193415 -0.00446218 ...  0.00039899 -0.00154395
    0.01256501]
  [ 0.01072659  0.00094822 -0.01116775 ...  0.00160091 -0.00388591
    0.00408083]
  [-0.00493678  0.00144258 -0.01021042 ...  0.01258428  0.0064707
   -0.0105223 ]
  ...
  [ 0.00743714  0.00641606 -0.00200697 ... -0.00436386 -0.0149189
    0.01352312]
  [-0.00624997  0.00427362 -0.00547133 ...  0.00881577  0.00075311
   -0.00655191]
  [ 0.00418323 -0.01199234 -0.00256209 ...  0.00600938  0.00248669

In [12]:
checkpoint_dir = "/content/sample_data"
checkpoint_prefix = os.path.join(checkpoint_dir,"ckpt_{epoch}")
ckpt_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix,save_weights_only=True)

In [13]:
#Execute training
EPOCH = 5
model.fit(dataset_prefetch,epochs=EPOCH,callbacks=[ckpt_callback])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fa4c32dd110>

In [15]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init__()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    # Create a mask to prevent "[UNK]" from being generated.
    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        # Put a -inf at each bad index.
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        # Match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    # Convert strings to token IDs.
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    # Run the model.
    # predicted_logits.shape is [batch, char, next_char_logits]
    predicted_logits, states = self.model(inputs=input_ids, states=states,
                                          return_state=True)
    # Only use the last prediction.
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    # Apply the prediction mask: prevent "[UNK]" from being generated.
    predicted_logits = predicted_logits + self.prediction_mask

    # Sample the output logits to generate token IDs.
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    # Convert from token ids to characters
    predicted_chars = self.chars_from_ids(predicted_ids)

    # Return the characters and model state.
    return predicted_chars, states

one_step_model = OneStep(model, ids_to_chars, chars_to_ids)

In [16]:
start = time.time()
states = None
next_char = tf.constant(['ROMEO:'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n' + '_'*80)
print('\nRun time:', end - start)

ROMEO:
And as their bed verce, why hear, thus join'd at ome
were intent! these I amside the eors'
Hast too bewn to me father
Nurseing thee both swift foolish man one.

BUPTINTA:
I know me you! no tears thy head, and he will
Holding in a bering tanets;
Now on it for him and being connoration
Censunath, there is death!

EDWARD:
Pardon my love, put my brother, thou know'tt that hast to ease,
If they have so done.
Therefore; in thy daughter, but nifer him was,
Here's no bitest hate, no, Even the fime
Coriole, but intembed to appleen to prize
Ay sit expect auried speer of it; looks underseaven here
avacken princely shake longet to here to peril; the dogh shall pley you
Do not fardwell, let the pial for ever see
A please gar end-mannest to-death.

ROMEO:
Farsheen stoods? For on heaven call
To at Lancaster words in heart,
Romeo? die anried, the ending truagh
Upon the deaths behed: for nothing on thy strecks,
But I am no morin, where thee
For Kate, canster horses,
To be the stuffician so shoul