# using tpus

following along with [this](https://www.tensorflow.org/programmers_guide/using_tpu)

In [1]:
import tensorflow as tf

import utils

  from ._conv import register_converters as _register_converters


they make specific reference to cloud `tpu`s, as if the real deal doesn't exist. maybe that's true, not sure

## `TPUEstimator`

all standard estimator objects are implemented on `cpu` and `gpu` *only*. if you want to use `tpu`s, you have to convert those estimators to an entirely different object: `tf.contrib.tpu.TPUEstimator`

why is that? seems like it has to be on the roadmap for them to make this entirely equivalent to the cpu/gpu paradigm, right? confusing.

In [32]:
tf.contrib.tpu.TPUEstimator?

the difference between this class and the basic estimators is significant enough that they suggest an architectural solution: if you want a model to be runnable under both cpu/gpu and tpu frameworks, make a fundamental abstraction / conceptual separation:

> define the model's inference phase (from inputs to predictions) outside of the `model_fn`. Then maintain separate implementations of the Estimator setup and `model_fn`, both wrapping this inference step

for what it's worth, I looked at their example of "how to do this" and it didn't mean anything to me.

### running a `TPUEstimator` locally

it would suck if you had to have a tpu to develop a tpu project; fortunately for development's sake you can "turn off" tpus for the `TPUEstimator` class by setting `use_tpu=False` and creating a config:

```python
my_tpu_estimator = tf.contrib.tpu.TPUEstimator(
    model_fn=my_model_fn,
    config=tf.contrib.tpu.RunConfig()
    use_tpu=False)
```

### building a `tpu.RunConfig`

speaking of that configuration, the `tf.contrib.tpu.RunConfig()` is a bare-bones configuraiton file for tpu estimator sessions. let's checkit out:

In [44]:
config = tf.contrib.tpu.RunConfig()
[_ for _ in dir(config) if _[:2] != '__']

['_cluster',
 '_cluster_spec',
 '_evaluation_master',
 '_global_id_in_cluster',
 '_init_distributed_setting_from_environment_var',
 '_init_distributed_setting_from_environment_var_with_master',
 '_is_chief',
 '_keep_checkpoint_every_n_hours',
 '_keep_checkpoint_max',
 '_log_step_count_steps',
 '_master',
 '_model_dir',
 '_num_ps_replicas',
 '_num_worker_replicas',
 '_replace',
 '_save_checkpoints_secs',
 '_save_checkpoints_steps',
 '_save_summary_steps',
 '_service',
 '_session_config',
 '_task_id',
 '_task_type',
 '_tf_api_names',
 '_tf_random_seed',
 '_tpu_config',
 '_train_distribute',
 'cluster',
 'cluster_spec',
 'evaluation_master',
 'global_id_in_cluster',
 'is_chief',
 'keep_checkpoint_every_n_hours',
 'keep_checkpoint_max',
 'log_step_count_steps',
 'master',
 'model_dir',
 'num_ps_replicas',
 'num_worker_replicas',
 'replace',
 'save_checkpoints_secs',
 'save_checkpoints_steps',
 'save_summary_steps',
 'service',
 'session_config',
 'task_id',
 'task_type',
 'tf_random_seed',

In [45]:
config.tpu_config

TPUConfig(iterations_per_loop=2, num_shards=None, computation_shape=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None)

it is possible and indeed generally necessary to update the configuration of these tpu estimator objects. the docs here provide a whole walkthrough on how to creat a custom `FLAGS` object, parameterize the attributes of that `FLAG` at runtime, and pass them in as conscious semi-automated parameterization of the most important parts of a `tf.contrib.tpu.RunConfig`. it's simple stuff but so, so, so engineered. I have to imagine they could have defined a `yaml` of `conf` that would have taken care of all of this, which begs the question: why didn't they?

## optimizer

the built-in optimizers don't work on clout tpus, and vice versa. a common pattern is:

```python
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
if FLAGS.use_tpu:
    optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)
```

## model function

you're not done yet. your pretty little `model_fn` has to change, too:

### static shapes

cloud tpu calculations use XLA (Accelerated Linear Algebra) -- an alpha technology, fwiw -- to do it's linear algebar calculations.

XLA requires compile-time knowledge about shapes. you must update your code to have statically-shaped inputs and outputs. bummer

### summaries

remove all references to `tf.Summary`. it's not supported yet.

### metrics

there are differences in the esimator specs that make it such that your metric functions must have different signatures in the two different paradigms.

in regular cpu/gpu estimators, the `metric_fn` returns a an `EstimatorSpec` which requires the user to specify the `eval_metrics_ops`:

```python
my_metrics = {'accuracy': tf.metrics.accuracy(labels, predictions)}

return tf.estimator.EstimatorSpec(
    ...
    eval_metric_ops=my_metrics
)
```

for tpu estimators, the spec object is a `tf.contrib.tpu.TPUEstimatorSpec`. instead of requesting a dictionary of operations `eval_metric_ops` as in the regular estimator case, it requests a *function* for creating that dictionary, and an iterable of the tensor arguments to that function. to generalize the piece above:

```python
def my_metric_fn(labels, predictions):
     return {'accuracy': tf.metrics.accuracy(labels, predictions)}

return tf.contrib.tpu.TPUEstimatorSpec(
    ...
    eval_metrics=(my_metric_fn, [labels, predictions])
)
```

this seems extremely fucking silly. why not just have the same interface? this is bonkers.

### use `TPUEstimatorSpec`

one of the big differences between the `TPUEstimatorSpec` and the cpu/gpu `EstimatorSpec` is the way it expects metric operation to be defined (c.f. prev section). others include:

1. `hooks` (haven't covered yet) are not supported
1. `scaffold` is converted in much the same way as metrics (instead of receiving a dictionary, you promote it to a function that returns a dictionary)

## input functions

all of the changes up above were major changes required to modify the behavior to the code when running on a TPU. the *input* process usually happens on the host computer, so not much must change, right? well, there's still some shit:

### params argument

`Estimator` `input_fn`s *can* have a `params` argument; `TPUEstimator` `input_fn`s *must* have a `params` argument

### static shapes and batch size

as mentioned above, XLA requires known shapes. if the shape inference of your input pipeline / `input_fn` fails to resolve the shape of input tensors, you can mandate it using the `tf.set_shape` function

for batch sizes it's trickier; your dataset might not be an even multiple of your batch size. in these instances, you are advised to use `tf.contrib.data.batch_and_drop_remainder` and deal with it. if you can't deal with it, pad the final batch.

## datasets

you have to jump through hoops to stream data to this cloud service (basically, streaming data is too slow for the TPU calculation, and bandwidth is too big a bottleneck). so upload your stuff in `TFRecord` format to google cloud buckets

## what next

extra documentation

# summary

+ if you want to use google cloud tpu, you probably better make sure
+ you have to make changes to the implementation of your code
+ the interfaces are *not* comparable, so the changes are not trivial. they are easy, but require some hacking.