# TPUEstimator

[korean description](https://cloud.google.com/tpu/docs/using-estimator-api)

[Colab](https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/classification_iris_data_with_tpuestimator.ipynb#scrollTo=phzyD8iCAzcp)

TPUEstimator transforms a global batch size in params to a per-shard batch size when calling the `input_fn` and `model_fn`. Users should specify global batch size in constructor, and then get the batch size for each shard in `input_fn` and `model_fn` by params['batch_size'].

* For training, `model_fn` gets per-core batch size; `input_fn` may get per-core or per-host batch size depending on `per_host_input_for_training` in `TPUConfig` (See docstring for TPUConfig for details).

* For evaluation and prediction, `model_fn` gets per-core batch size and `input_fn` get per-host batch size.

## Evaluation

`model_fn` should return `TPUEstimatorSpec`, which expects the `eval_metrics` for TPU evaluation. If eval_on_tpu is False, the evaluation will execute on CPU or GPU; in this case the following discussion on TPU evaluation does not apply.

`TPUEstimatorSpec.eval_metrics` is a tuple of `metric_fn` and `tensors`, where `tensors` could be a list of any nested structure of `Tensors` (See TPUEstimatorSpec for details). `metric_fn` takes the `tensors` and returns a dict from metric string name to the result of calling a metric function, namely a `(metric_tensor, update_op)` tuple.



## Example (MNIST)

In [4]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import json
import os
import pandas as pd
import pprint
import tensorflow as tf
import time

In [5]:
tf.__version__

'1.15.0'

In [7]:
tf.enable_eager_execution()
tf.executing_eagerly()

True

In [8]:
with tf.Session() as session:
    print ('List of devices:')
    pprint.pprint(session.list_devices())

List of devices:
[_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 17389461837660832726),
 _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 8343187236624609562),
 _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 282632860156132489),
 _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 5809045504, 4996483312309324838)]


In [9]:
# Model specific parameters
use_tpu = False
# TPU address
# tpu_address = TF_MASTER

# Estimators model_dir
# model_dir = MODEL_DIR
# model_dir = 'gs://swyoo_bucket/tpuestimator-dnn/2020-01-20-11-10-12'
model_dir = './models/'

# This is the global batch size, not the per-shard batch.
batch_size = 128

# Total number of training steps.
train_steps = 1000

# Total number of evaluation steps. If '0', evaluation after training is skipped
eval_steps = 4

# Number of iterations per TPU training loop
iterations = 500

## Get input data and define input functions

In [10]:
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth',
                    'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

# example
PREDICTION_INPUT_DATA = {
    'SepalLength': [6.9, 5.1, 5.9],
    'SepalWidth': [3.1, 3.3, 3.0],
    'PetalLength': [5.4, 1.7, 4.2],
    'PetalWidth': [2.1, 0.5, 1.5],
}

PREDICTION_OUTPUT_DATA = ['Virginica', 'Setosa', 'Versicolor']

In [11]:
def maybe_download():
    train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL)
    test_path = tf.keras.utils.get_file(TEST_URL.split('/')[-1], TEST_URL)

    return train_path, test_path

def load_data(y_name='Species'):
    """Returns the iris dataset as (train_x, train_y), (test_x, test_y)."""
    train_path, test_path = maybe_download()

    train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0, dtype={'SepalLength': pd.np.float32,
        'SepalWidth': pd.np.float32, 'PetalLength': pd.np.float32, 'PetalWidth': pd.np.float32, 'Species': pd.np.int32})
    train_x, train_y = train, train.pop(y_name)

    test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0, dtype={'SepalLength': pd.np.float32,
        'SepalWidth': pd.np.float32, 'PetalLength': pd.np.float32, 'PetalWidth': pd.np.float32, 'Species': pd.np.int32})
    test_x, test_y = test, test.pop(y_name)

    return (train_x, train_y), (test_x, test_y)

In [12]:
# example
(train_x, train_y), (test_x, test_y) = load_data(y_name='Species')
train_x.shape, train_y.shape, test_x.shape, test_y.shape

((120, 4), (120,), (30, 4), (30,))

In [13]:
train_x[:2]

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0


In [14]:
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_x.keys():
    print(key)
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))
my_feature_columns

SepalLength
SepalWidth
PetalLength
PetalWidth


[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

* shuffle 함수는 고정된 버퍼 크기로 데이터를 섞는데, 데이터가 완전히 랜덤적으로 뒤섞기 위해서는 입력된 데이터 크기보다 큰 수를 입력해 주셔야 합니다.
* repeat라는 함수는 데이터셋을 읽다가 마지막에 도달했을 경우, 다시 처음부터 조회하는 함수입니다. 그리고 batch 함수는 데이터를 읽어올 개수를 지정하는 함수입니다.

In [15]:
def train_input_fn(features, labels, batch_size):
    """An input function for training"""

    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(buffer_size=1000).repeat()

    dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(batch_size))

    # Return the dataset.
    return dataset

In [16]:
input_fn_obj = train_input_fn(features=train_x, labels=train_y, batch_size=batch_size)
input_fn_obj

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.


<DatasetV1Adapter shapes: ({SepalLength: (128,), SepalWidth: (128,), PetalLength: (128,), PetalWidth: (128,)}, (128,)), types: ({SepalLength: tf.float32, SepalWidth: tf.float32, PetalLength: tf.float32, PetalWidth: tf.float32}, tf.int32)>

In [19]:
feat_dict, label = next(iter(input_fn_obj))

In [31]:
for k, v in feat_dict.items():
    print(k, v.shape, end=' ')
print('label(Species)', label.shape)
print(label)

SepalLength (128,) SepalWidth (128,) PetalLength (128,) PetalWidth (128,) label(Species) (128,)
tf.Tensor(
[1 0 2 0 1 1 1 0 2 2 0 0 0 1 0 2 0 1 2 0 0 2 1 2 0 2 2 2 1 0 0 0 1 2 0 0 2
 0 0 1 0 2 2 0 1 0 0 1 2 0 1 0 2 2 1 0 0 1 1 1 1 2 0 2 2 2 2 1 2 0 2 2 1 1
 1 2 1 2 0 0 1 2 1 0 0 2 2 2 2 2 1 1 2 0 1 0 2 1 2 2 2 0 2 1 0 0 0 1 0 0 0
 2 1 1 2 1 1 2 1 0 0 2 0 1 0 1 1 2], shape=(128,), dtype=int32)


In [32]:
def eval_input_fn(features, labels, batch_size):
    """An input function for evaluation"""
    features=dict(features)
    inputs = (features, labels)
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices(inputs)
    dataset = dataset.shuffle(1000).repeat()
    dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(batch_size))
    # Return the dataset.
    return dataset

def predict_input_fn(features, batch_size):
    """An input function for prediction""" 
    # there are no labels
    dataset = tf.data.Dataset.from_tensor_slices(features)
    dataset = dataset.batch(batch_size)
    return dataset

## Model and metric function

In [35]:
params = {'feature_columns': my_feature_columns, 'hidden_units': [10, 10], 'n_classes': 3}

In [47]:
net = tf.feature_column.input_layer(feat_dict, params['feature_columns'])
print(net.shape)
net = tf.layers.dense(net, units=10, activation=tf.nn.relu)
print(net.shape)

(128, 4)
(128, 10)


In [48]:
def metric_fn(labels, logits):
    """Function to return metrics for evaluation"""

    predicted_classes = tf.argmax(logits, 1)
    accuracy = tf.metrics.accuracy(labels=labels,
                                   predictions=predicted_classes,
                                   name='acc_op')
    return {'accuracy': accuracy}

make `model_fn`(...) for a mini-batch  
Ops and objects returned from a `model_fn` and passed to `TPUEstimator`.

In [33]:
def my_model(features, labels, mode, params):
    """DNN with three hidden layers, and dropout of 0.1 probability."""

    # Create three fully connected layers each layer having a dropout
    # probability of 0.1.
    net = tf.feature_column.input_layer(features, params['feature_columns'])
    for units in params['hidden_units']:
        net = tf.layers.dense(net, units=units, activation=tf.nn.relu)

    # Compute logits (1 per class).
    logits = tf.layers.dense(net, params['n_classes'], activation=None)

    # Compute predictions.
    predicted_classes = tf.argmax(logits, 1)
    
    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = {
            'class_ids': predicted_classes[:, tf.newaxis],
            'probabilities': tf.nn.softmax(logits),
            'logits': logits,
        }
        return tf.contrib.tpu.TPUEstimatorSpec(mode, predictions=predictions)

    # Compute loss.
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels,
                                                  logits=logits)

    if mode == tf.estimator.ModeKeys.EVAL:
        return tf.contrib.tpu.TPUEstimatorSpec(
            mode=mode, loss=loss, eval_metrics=(metric_fn, [labels, logits]))

    # Create training op.
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
        if use_tpu:
            optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)
        train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
        return tf.contrib.tpu.TPUEstimatorSpec(mode, loss=loss, train_op=train_op)

## Main

In [49]:
tf.logging.set_verbosity(tf.logging.INFO)
# tf.logging.set_verbosity(tf.logging.WARN)

* `log_device_placement`

> To find out which devices your operations and tensors are assigned to, create the session with `log_device_placement` configuration option set to True.

Which is helpful for debugging. For each of the nodes of your graph, you will see the device it was assigned to.

* `allow_soft_placement`

> If you would like TensorFlow to automatically choose an existing and supported device to run the operations in case the specified one doesn't exist, you can set `allow_soft_placement` to True in the configuration option when creating the session.

Which will help you if you accidentally manually specified the wrong device or a device which does not support a particular op. This is useful if you write a code which can be executed in environments you do not know. You still can provide useful defaults, but in the case of failure a graceful fallback.

In [51]:
# Resolve TPU cluster and runconfig for this.
#     tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(
#             tpu_address)
tpu_cluster_resolver = None

run_config = tf.contrib.tpu.RunConfig(
        model_dir=model_dir,
        cluster=tpu_cluster_resolver,
        session_config=tf.ConfigProto(allow_soft_placement=True, 
                                      log_device_placement=False),
        tpu_config=tf.contrib.tpu.TPUConfig(iterations),
        )

# Build 2 hidden layer DNN with 10, 10 units respectively.
classifier = tf.contrib.tpu.TPUEstimator(
    model_fn=my_model,
    use_tpu=use_tpu,
    train_batch_size=batch_size,
    eval_batch_size=batch_size,
    predict_batch_size=batch_size,
    config=run_config,
    params={
        'feature_columns': my_feature_columns,
        # Two hidden layers of 10 nodes each.
        'hidden_units': [10, 10],
        # The model must choose between 3 classes.
        'n_classes': 3,
        'use_tpu': use_tpu,
    })

INFO:tensorflow:Using config: {'_model_dir': './models/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb6c799ad10>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=500, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, 

In [52]:
 # Train the Model.
classifier.train(
        input_fn = lambda params: train_input_fn(train_x, train_y, params["batch_size"]),
        max_steps=train_steps)

INFO:tensorflow:Skipping training since max_steps has already saved.
INFO:tensorflow:training_loop marked as finished


<tensorflow_estimator.python.estimator.tpu.tpu_estimator.TPUEstimator at 0x7fb6c3b8e0d0>

In [53]:
# Evaluate the model.
eval_result = classifier.evaluate(
    input_fn = lambda params: eval_input_fn(
        test_x, test_y, params["batch_size"]),
    steps=eval_steps)

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

# Generate predictions from the model
predictions = classifier.predict(
    input_fn = lambda params: predict_input_fn(
        PREDICTION_INPUT_DATA, params["batch_size"]))

for pred_dict, expec in zip(predictions, PREDICTION_OUTPUT_DATA):
    template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')

    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print(template.format(SPECIES[class_id],
                          100 * probability, expec))

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running eval on CPU
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-01-20T23:09:07Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./models/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/4]
INFO:tensorflow:Evaluation [2/4]
INFO:tensorflow:Evaluation [3/4]
INFO:tensorflow:Evaluation [4/4]
INFO:tensorflow:Finished evaluation at 2020-01-20-23:09:07
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.9667969, global_step = 1000, loss = 0.057405293
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: ./models/model.ckpt-1000
INFO:tensorflow:evaluation_loop marked as finished

Test set accuracy: 