<a href="https://colab.research.google.com/github/banjodayo39/quotes-generation/blob/master/quote_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Import Libaries 

In [2]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

import numpy as np
import os
import io
import time
import json

TensorFlow 2.x selected.


Download the quotes dataset

In [4]:
from google.colab import files
uploaded = files.upload()

Saving quotes.json to quotes.json


In [5]:

file_name = "quotes.json"
io.StringIO(uploaded[file_name].decode("utf-8"))
json.loads(uploaded[file_name].decode("utf-8"))

[{'Author': 'Dr. Seuss',
  'Category': 'life',
  'Popularity': 0.15566615566615566,
  'Quote': "Don't cry because it's over, smile because it happened.",
  'Tags': ['attributed-no-source',
   'cry',
   'crying',
   'experience',
   'happiness',
   'joy',
   'life',
   'misattributed-dr-seuss',
   'optimism',
   'sadness',
   'smile',
   'smiling ']},
 {'Author': 'Dr. Seuss',
  'Category': 'happiness',
  'Popularity': 0.15566615566615566,
  'Quote': "Don't cry because it's over, smile because it happened.",
  'Tags': ['attributed-no-source',
   'cry',
   'crying',
   'experience',
   'happiness',
   'joy',
   'life',
   'misattributed-dr-seuss',
   'optimism',
   'sadness',
   'smile',
   'smiling ']},
 {'Author': 'Marilyn Monroe',
  'Category': 'love',
  'Popularity': 0.12912212912212911,
  'Quote': "I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve 

In [0]:
dataset = json.loads(uploaded[file_name].decode("utf-8"))

In [0]:
text_list = []
for item in dataset:
  text_list.append(item['Quote'])

In [8]:
text_list[:10]

["Don't cry because it's over, smile because it happened.",
 "Don't cry because it's over, smile because it happened.",
 "I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.",
 "I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.",
 "I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.",
 'Be yourself; everyone else is already taken.',
 "Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.",
 "Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.",
 "Two thin

In [0]:
# using list comprehension convert list to string

text = ' '.join([str(elem) for elem in text_list]) 

In [10]:
# Take a look at the first 250 characters in text
print(text[:250])

Don't cry because it's over, smile because it happened. Don't cry because it's over, smile because it happened. I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me


In [11]:
# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

550 unique characters


## Process the text

### Vectorize the text

Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.

In [0]:
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

In [13]:
print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
  ' ' :   0,
  '!' :   1,
  '"' :   2,
  '#' :   3,
  '$' :   4,
  '%' :   5,
  '&' :   6,
  "'" :   7,
  '(' :   8,
  ')' :   9,
  '*' :  10,
  '+' :  11,
  ',' :  12,
  '-' :  13,
  '.' :  14,
  '/' :  15,
  '0' :  16,
  '1' :  17,
  '2' :  18,
  '3' :  19,
  ...
}


In [14]:
# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

"Don't cry bec" ---- characters mapped to int ---- > [36 78 77  7 83  0 66 81 88  0 65 68 66]


###  Prediction Task

### Create training examples and targets

Next divide the text into example sequences. Each input sequence will contain `seq_length` characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

In [15]:
# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

D
o
n
'
t


In [16]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

"Don't cry because it's over, smile because it happened. Don't cry because it's over, smile because it"
" happened. I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at "
"times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me "
"at my best. I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at"
" times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me"


For each sequence, duplicate and shift it to form the input and target text by using the map method to apply a simple function to each batch:

In [0]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

In [18]:
for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  "Don't cry because it's over, smile because it happened. Don't cry because it's over, smile because i"
Target data: "on't cry because it's over, smile because it happened. Don't cry because it's over, smile because it"


In [19]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 36 ('D')
  expected output: 78 ('o')
Step    1
  input: 78 ('o')
  expected output: 77 ('n')
Step    2
  input: 77 ('n')
  expected output: 7 ("'")
Step    3
  input: 7 ("'")
  expected output: 83 ('t')
Step    4
  input: 83 ('t')
  expected output: 0 (' ')


Create Traing Batch



In [20]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

Build the model

In [0]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [0]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

In [0]:
model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

Run the model a see as it behaves 

In [24]:
for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 550) # (batch_size, sequence_length, vocab_size)


In [25]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           140800    
_________________________________________________________________
gru (GRU)                    (64, None, 1024)          3938304   
_________________________________________________________________
dense (Dense)                (64, None, 550)           563750    
Total params: 4,642,854
Trainable params: 4,642,854
Non-trainable params: 0
_________________________________________________________________


In [0]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

In [27]:
sampled_indices

array([375, 309, 278, 479, 438, 431, 377,   8, 121, 503,   1,  67, 459,
       277, 110, 185, 398,  67, 153, 295, 167, 396, 284, 503, 284,  10,
       543, 210, 185, 133, 478,  78, 511, 431, 549, 397, 426, 442, 395,
       154, 251, 186, 237, 300,  93, 418, 341, 235, 219, 520, 433,  23,
        59, 205,  11, 360, 520, 229, 329, 531, 459, 451, 122, 337, 476,
       218, 344,  37, 510,  90, 418, 158, 328, 360,  23, 229, 408, 342,
       184, 429, 153, 133, 260,  71, 433, 339, 456, 150, 185, 164, 181,
       215, 519,  11, 179, 297, 532, 145, 248, 176])

In [28]:
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

Input: 
 've everyone. I hate no one. Regardless of their race, religion, their proclivities, the desire of th'

Next Char Predictions: 
 'एאգლদছख(îέ!dৃբáΛऱdƃշɯयիέի*♫τΛăკo‘ছﬁरখফमƆрΤвռ~ঃغаВ′ঞ7[ο+ْ′Сخ◡ৃসïضთБقE―{ঃƸحْ7СीـΔঙƃăщhঞظীŻΛɟ̵ό\u202c+́չ☜œнˈ'


Train the model

In [29]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 550)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       6.309878


In [0]:
model.compile(optimizer='adam', loss=loss)

In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [32]:
EPOCHS=10
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Generate Text

We'll run the model with a different 'batch_size' of 1 to keep the prediction simple, rebuild the model and restore the weights from the checkpoint.

To run the model with a different `batch_size`, we need to rebuild the model and restore the weights from the checkpoint.

Restore the last checking point

In [33]:
tf.train.latest_checkpoint(checkpoint_dir)

'./training_checkpoints/ckpt_10'

In [0]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

In [35]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 256)            140800    
_________________________________________________________________
gru_1 (GRU)                  (1, None, 1024)           3938304   
_________________________________________________________________
dense_1 (Dense)              (1, None, 550)            563750    
Total params: 4,642,854
Trainable params: 4,642,854
Non-trainable params: 0
_________________________________________________________________


The Prediction

In [0]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 700

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 0.27

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the word returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [72]:
print(generate_text(model, start_string=u"Strenth comes from with the mind, reach out from the strength in me"))

Strenth comes from with the mind, reach out from the strength in mercy. I don't want to be a person who does not attempt to be a professional consciousness. I will not be able to be a professional world. I want you to know that I am not a single beginning and a free will. I want to be a professional writer and then the only thing that is the best of all things to be happy. I want you to know that I am not a bad memory of the most powerful thing in the world. Be a profound fact. I will be a person who can say is to be a possession of the earth. In a world where there is no such thing as a poet. The secret of life is the power of love and the universe is the way to get the charges. Do you think I have a promises. I think we can change the world of empty consc
