## Predict AIGramp text with Cloud TPUs and Keras

### Download data

Download textfiles from AIGramp. You use snippets from this file as the *training data* for the model. The *target* snippet is offset by one character.

In [6]:
!wget --show-progress  -O /content/merged_492books.txt http://aigramp.com/texts/merged_492books.txt

--2019-03-21 01:02:41--  http://aigramp.com/texts/merged_492books.txt
Resolving aigramp.com (aigramp.com)... 68.183.102.88
Connecting to aigramp.com (aigramp.com)|68.183.102.88|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 220270191 (210M) [text/plain]
Saving to: ‘/content/merged_492books.txt’


2019-03-21 01:02:43 (86.0 MB/s) - ‘/content/merged_492books.txt’ saved [220270191/220270191]



  imports

In [0]:
import sys
import numpy as np
import six
import tensorflow as tf
import time
import os


### Build the data generator

In [8]:
# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']

THE_TEXT = '/content/merged_492books.txt'

tf.logging.set_verbosity(tf.logging.INFO)

def transform(txt, pad_to=None):
  # drop any non-ascii characters
  output = np.asarray([ord(c) for c in txt if ord(c) < 255], dtype=np.int32)
  if pad_to is not None:
    output = output[:pad_to]
    output = np.concatenate([
        np.zeros([pad_to - len(txt)], dtype=np.int32),
        output,
    ])
  return output

def training_generator(seq_len=100, batch_size=1024):
  """A generator yields (source, target) arrays for training."""
  #with tf.gfile.GFile(THE_TEXT, 'r') as f:
  #  txt = f.read()

  
  with open(THE_TEXT,'rb') as f:
    txt=f.read().decode('utf8',errors='ignore')

    
  tf.logging.info('Input text [%d] %s', len(txt), txt[:50])
  source = transform(txt)
  while True:
    offsets = np.random.randint(0, len(source) - seq_len, batch_size)

    # Our model uses sparse crossentropy loss, but Keras requires labels
    # to have the same rank as the input logits.  We add an empty final
    # dimension to account for this.
    yield (
        np.stack([source[idx:idx + seq_len] for idx in offsets]),
        np.expand_dims(
            np.stack([source[idx + 1:idx + seq_len + 1] for idx in offsets]),
            -1),
    )

six.next(training_generator(seq_len=10, batch_size=1))

INFO:tensorflow:Input text [220270163] James S. A. Corey
THE VITAL ABYSS
An Expanse Nov


(array([[115,  46,  32,  66, 114, 111, 116, 104, 101, 114]], dtype=int32),
 array([[[ 46],
         [ 32],
         [ 66],
         [114],
         [111],
         [116],
         [104],
         [101],
         [114],
         [ 32]]], dtype=int32))

### Build the model

The model is defined as a two-layer, forward-LSTM—with two changes from the `tf.keras` standard LSTM definition:

1. Define the input `shape` of the model to comply with the [XLA compiler](https://www.tensorflow.org/performance/xla/)'s static shape requirement.
2. Use `tf.train.Optimizer` instead of a standard Keras optimizer (Keras optimizer support is still experimental).

In [0]:
EMBEDDING_DIM = 512

def lstm_model(seq_len=100, batch_size=None, stateful=True):
  """Language model: predict the next word given the current word."""
  source = tf.keras.Input(
      name='seed', shape=(seq_len,), batch_size=batch_size, dtype=tf.int32)

  embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM)(source)
  lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
  lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
  lstm_3 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_2)
  lstm_4 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_3)
  lstm_5 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_4)
  lstm_6 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_5)
  lstm_7 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_6)
  lstm_8 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_7)
  
  predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_4)
  model = tf.keras.Model(inputs=[source], outputs=[predicted_char])
  model.compile(
      optimizer=tf.train.RMSPropOptimizer(learning_rate=0.01),
      loss='sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'])
  return model

### Train the model

The `tf.contrib.tpu.keras_to_tpu_model` function converts a `tf.keras` model to an equivalent TPU version. You then use the standard Keras methods to train: `fit`, `predict`, and `evaluate`.

In [10]:
tf.keras.backend.clear_session()

training_model = lstm_model(seq_len=100, batch_size=128, stateful=False)

tpu_model = tf.contrib.tpu.keras_to_tpu_model(
    training_model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))


#continue training the previously saved model
if (os.path.exists('/tmp/bard.h5')):
  print ("loading network weights from file")
  tpu_model.load_weights('/tmp/bard.h5')
else:
  print ("no network weights file is found")


tpu_model.fit_generator(
    training_generator(seq_len=100, batch_size=1024),
    steps_per_epoch=100,
    epochs=20,
)
print ("saving network weights file")
tpu_model.save_weights('/tmp/bard.h5', overwrite=True)


Instructions for updating:
Colocations handled automatically by placer.

For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

INFO:tensorflow:Querying Tensorflow master (grpc://10.32.182.226:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 4793861663647305805)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 13953338554209488010)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 10115911813233287846)
INFO:tensorflow:*** Ava

KeyboardInterrupt: ignored

In [11]:
#using this block when have interrupted the training of the network
#print ("saving network weights file Interruption happened")
#tpu_model.save_weights('/tmp/bard.h5', overwrite=True)


saving network weights file Interruption happened
INFO:tensorflow:Copying TPU weights to the CPU


### Make predictions with the model

Use the trained model to make predictions and generate your texts.
Start the model off with a *seed* sentence, then generate 250 characters from it. The model makes predictions from the initial seed.

In [12]:
BATCH_SIZE = 10
PREDICT_LEN = 300

# Keras requires the batch size be specified ahead of time for stateful models.
# We use a sequence length of 1, as we will be feeding in one character at a 
# time and predicting the next character.
prediction_model = lstm_model(seq_len=1, batch_size=BATCH_SIZE, stateful=True)
prediction_model.load_weights('/tmp/bard.h5')

# We seed the model with our initial string, copied BATCH_SIZE times

seed_txt = 'Tom bought an apartment and a car'
seed = transform(seed_txt)
seed = np.repeat(np.expand_dims(seed, 0), BATCH_SIZE, axis=0)

# First, run the seed forward to prime the state of the model.
prediction_model.reset_states()
for i in range(len(seed_txt) - 1):
  prediction_model.predict(seed[:, i:i + 1])

# Now we can accumulate predictions!
predictions = [seed[:, -1:]]
for i in range(PREDICT_LEN):
  last_word = predictions[-1]
  next_probits = prediction_model.predict(last_word)[:, 0, :]
  
  # sample from our output distribution
  next_idx = [
      np.random.choice(256, p=next_probits[i])
      for i in range(BATCH_SIZE)
  ]
  predictions.append(np.asarray(next_idx, dtype=np.int32))
  

for i in range(BATCH_SIZE):
  print('Generated text %d\n\n:' % i)
  p = [predictions[j][i] for j in range(PREDICT_LEN)]
  generated = ''.join([chr(c) for c in p])
  print(generated)
  print()
  assert len(generated) == PREDICT_LEN, 'Generated text too short'

Generated text 0

:
r he ar untice. He taking of byse bated hour fire. Af he ase the ride walking of lain.
fhe car oncow on any a vome on own of mowe. He asked that no, he hearted stilled to dexing in their hising betceen tele other aw weeg.
So alfo to conside so there was not or holding the facato of the shell bayne

Generated text 1

:
re. You kointed the race-and Edelise me were bockle to meaning? 'You, Irred in the "ame on the medmy been on, infuil of white how and wise tio, "then, well of it. The rest lack at that it kowerbs with the pleases. A saw for Dusie of the desmnol sing floced, his a cared him wollies it make ten me? Th

Generated text 2

:
rrowker qame eyes a remaining.
Ethe choiu. The been soon fammied, grimed amoney put ber, beshey is for a foun to see the sone away?"
"Ned afrain. I do's now believe to beard, that body to prodness you, yet cheet homed man, no ucked upervood had learn to than oxed.
Uf duesticy of well, reeded from

Generated text 3

:
re the chamber