##### Copyright 2018 The TensorFlow Hub Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [0]:
# Copyright 2018 The TensorFlow Hub Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

## Produce Lyrics with Cloud TPUs and Keras

## Overview

This example uses [tf.keras](https://www.tensorflow.org/guide/keras) to build a *language model* and train it on a Cloud TPU. This language model predicts the next character of text given the text so far. The trained model can generate new snippets of text that read in a similar style to the text training data.

The model trains for 10 epochs and completes in approximately 5 minutes.

This notebook is hosted on GitHub. To view it in its original repository, after opening the notebook, select **File > View on GitHub**.

## Learning objectives

In this Colab, you will learn how to:
*   Build a two-layer, forward-LSTM model.
*   Convert a `tf.keras` model to an equivalent TPU version and then use the standard Keras methods to train: `fit`, `predict`, and `evaluate`.
*   Use the trained model to make predictions and generate your own Shakespeare-esque play.






## Instructions

<h3>  &nbsp;&nbsp;Train on TPU&nbsp;&nbsp; <a href="https://cloud.google.com/tpu/"><img valign="middle" src="https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png" width="50"></a></h3>

   1. On the main menu, click Runtime and select **Change runtime type**. Set "TPU" as the hardware accelerator.
   1. Click Runtime again and select **Runtime > Run All**. You can also run the cells manually with Shift-ENTER. 

TPUs are located in Google Cloud, for optimal performance, they read data directly from Google Cloud Storage (GCS)

## Data, model, and training

### Download data

Download *The Complete Works of William Shakespeare* as a single text file from [Project Gutenberg](https://www.gutenberg.org/). You use snippets from this file as the *training data* for the model. The *target* snippet is offset by one character.

In [0]:
!wget --show-progress --continue -O /content/lyrics.txt https://storage.googleapis.com/mr-lyrics-autocomplete-data/lyrics.txt

--2019-01-23 18:14:07--  https://storage.googleapis.com/mr-lyrics-autocomplete-data/lyrics.txt
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.202.128, 2607:f8b0:4001:c01::80
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.202.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7894384 (7.5M) [text/plain]
Saving to: ‘/content/lyrics.txt’


2019-01-23 18:14:08 (67.5 MB/s) - ‘/content/lyrics.txt’ saved [7894384/7894384]



### Build the data generator

In [0]:
import numpy as np
import six
import tensorflow as tf
import time
import os

# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']

SHAKESPEARE_TXT = '/content/lyrics.txt'

tf.logging.set_verbosity(tf.logging.INFO)

def transform(txt, pad_to=None):
  # drop any non-ascii characters
  output = np.asarray([ord(c) for c in txt if ord(c) < 255], dtype=np.int32)
  if pad_to is not None:
    output = output[:pad_to]
    output = np.concatenate([
        np.zeros([pad_to - len(txt)], dtype=np.int32),
        output,
    ])
  return output

def training_generator(seq_len=100, batch_size=1024):
  """A generator yields (source, target) arrays for training."""
  with tf.gfile.GFile(SHAKESPEARE_TXT, 'r') as f:
    txt = f.read()

  tf.logging.info('Input text [%d] %s', len(txt), txt[:50])
  source = transform(txt)
  while True:
    offsets = np.random.randint(0, len(source) - seq_len, batch_size)

    # Our model uses sparse crossentropy loss, but Keras requires labels
    # to have the same rank as the input logits.  We add an empty final
    # dimension to account for this.
    yield (
        np.stack([source[idx:idx + seq_len] for idx in offsets]),
        np.expand_dims(
            np.stack([source[idx + 1:idx + seq_len + 1] for idx in offsets]),
            -1),
    )

six.next(training_generator(seq_len=10, batch_size=1))

INFO:tensorflow:Input text [7893815] 

 baby im yours and ill be yours until the stars 


(array([[101,  32, 121,  97, 108, 108,  32, 104, 101,  97]], dtype=int32),
 array([[[ 32],
         [121],
         [ 97],
         [108],
         [108],
         [ 32],
         [104],
         [101],
         [ 97],
         [114]]], dtype=int32))

### Build the model

The model is defined as a two-layer, forward-LSTM—with two changes from the `tf.keras` standard LSTM definition:

1. Define the input `shape` of the model to comply with the [XLA compiler](https://www.tensorflow.org/performance/xla/)'s static shape requirement.
2. Use `tf.train.Optimizer` instead of a standard Keras optimizer (Keras optimizer support is still experimental).

In [0]:
EMBEDDING_DIM = 512

def lstm_model(seq_len=100, batch_size=None, stateful=True):
  """Language model: predict the next word given the current word."""
  source = tf.keras.Input(
      name='seed', shape=(seq_len,), batch_size=batch_size, dtype=tf.int32)

  embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM)(source)
  lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
  lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
  predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)
  model = tf.keras.Model(inputs=[source], outputs=[predicted_char])
#   model_optimizer = tf.train.RMSPropOptimizer(learning_rate=0.01)
  model_optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001)
  model.compile(
      optimizer=model_optimizer,
      loss='sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'])
  return model

### Train the model

The `tf.contrib.tpu.keras_to_tpu_model` function converts a `tf.keras` model to an equivalent TPU version. You then use the standard Keras methods to train: `fit`, `predict`, and `evaluate`.

In [0]:
import time

start = time.time()

tf.keras.backend.clear_session()

training_model = lstm_model(seq_len=100, batch_size=128, stateful=False)

tpu_model = tf.contrib.tpu.keras_to_tpu_model(
    training_model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))

tpu_model.load_weights('/content/lyrics_model.h5')

# steps_per_epoch = DATA_LEN / (seq_len * batch_size)

tpu_model.fit_generator(
    training_generator(seq_len=100, batch_size=1024),
    steps_per_epoch=100,
    epochs=10,
)
tpu_model.save_weights('/content/lyrics_model.h5', overwrite=True)

elapsed = time.time() - start
print(' {:.3f} minutes'.format(elapsed / 60))

INFO:tensorflow:Querying Tensorflow master (b'grpc://10.8.222.162:8470') for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 7746463823360997788)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 5895810441155439380)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 8970486844155138041)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 1196240473684309929)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 7420280826846883607)
INFO:tensorflow:*** Available Device: _DeviceAt

### Eval the model

In [0]:
loss, sparse_categorical_accuracy = tpu_model.evaluate_generator(
    training_generator(seq_len=100, batch_size=1024),
    steps=4
)

print('Testing set metrics:')
print("\tLoss: {:5.2f} value".format(loss))
print("\tAccuracy: {:.2%} value".format(sparse_categorical_accuracy))

INFO:tensorflow:Input text [7893815] 

 baby im yours and ill be yours until the stars 
INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(128,), dtype=tf.int32, name='core_id_10'), TensorSpec(shape=(128, 100), dtype=tf.int32, name='seed_10'), TensorSpec(shape=(128, 100, 1), dtype=tf.float32, name='time_distributed_target_30')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for seed
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 6.172021150588989 secs
Testing set metrics:
	Loss:  0.74 value
	Accuracy: 77.84% value


### Make predictions with the model

Use the trained model to make predictions and generate your own Shakespeare-esque play.
Start the model off with a *seed* sentence, then generate 250 characters from it. The model makes five predictions from the initial seed.

In [0]:
BATCH_SIZE = 5
PREDICT_LEN = 450

# Keras requires the batch size be specified ahead of time for stateful models.
# We use a sequence length of 1, as we will be feeding in one character at a 
# time and predicting the next character.
prediction_model = lstm_model(seq_len=1, batch_size=BATCH_SIZE, stateful=True)
prediction_model.load_weights('/content/lyrics_model.h5')

# We seed the model with our initial string, copied BATCH_SIZE times

# seed_txt = 'Looks it not like the king?  Verily, we must go! '
# seed_txt = 'I am blind; the truth is screaming at me'
seed_txt = 'Go.'
# seed_txt = "Cause everybody knows"
seed = transform(seed_txt)
seed = np.repeat(np.expand_dims(seed, 0), BATCH_SIZE, axis=0)

# First, run the seed forward to prime the state of the model.
prediction_model.reset_states()
for i in range(len(seed_txt) - 1):
  prediction_model.predict(seed[:, i:i + 1])

# Now we can accumulate predictions!
predictions = [seed[:, -1:]]
for i in range(PREDICT_LEN):
  last_word = predictions[-1]
  next_probits = prediction_model.predict(last_word)[:, 0, :]
  
  # sample from our output distribution
  next_idx = [
      np.random.choice(256, p=next_probits[i])
      for i in range(BATCH_SIZE)
  ]
  predictions.append(np.asarray(next_idx, dtype=np.int32))
  

for i in range(BATCH_SIZE):
  print('PREDICTION %d\n\n' % i)
  p = [predictions[j][i] for j in range(PREDICT_LEN)]
  generated = ''.join([chr(c) for c in p])
  print(generated)
  print()
  assert len(generated) == PREDICT_LEN, 'Generated text too short'

PREDICTION 0


.
Hit the time
Do you know I bet they suckary torn between 
Hear the clothes, she goes out to school knew kany

And I tried to death rather
Free (One more)
(Christmas)
For you 


Ludal
Letters (Hey)
Don't get jack like a vegebr
Why can't you be the one with our hearts believed
Touchane talkin' need a Thing
Jottle

I want you
I'll taste, you see, you lead your aim
I've draw this
Seek and found in you, I see my reflect body 

 i surre among 

PREDICTION 1


.


And do me down, too, boot, ring my room to get up, happiest that a willie
Elis and patché

Says "ouns little in Junent

Crazy, you oughta know that body sweet, and you should be
and I need you
I'd like to go gotta get rid
But I take time, sing
I fucked they 'cause i aim buh and I got for what Knoconsits but I feel the rain
You think you mean the Summertimatized, sit up, I called, shut your hair down

I still remember now, misled why do I

PREDICTION 2


.
Gotta make it rain on the floor
Tough the light