## TensorFlow Distribute

Library for running a computation across multiple devices. `tf.distribute.Strategy` is a TensorFlow API to distribute training across multiple GPUs, multiple machines or TPUs. Using this API, developer can distribute existing models and training code with minimal code changes. Goal of this API is to

* Target multiple personas - developers, researchers
* Good performance out of the box
* Easy of switching between various strategies.

In [1]:
import tensorflow as tf

### MirroredStrategy

Synchronous training across multiple replicas on one machine. Implemented in class `tf.distribute.MirroredStrategy`. It extends class `tf.distribute.Strategy`.

This strategy is typically used for training on one
  machine with multiple GPUs. For TPUs, use
  `tf.distribute.TPUStrategy`. To use `MirroredStrategy` with multiple workers,
  please refer to `tf.distribute.experimental.MultiWorkerMirroredStrategy`.
  For example, a variable created under a `MirroredStrategy` is a
  `MirroredVariable`. If no devices are specified in the constructor argument of
  the strategy then it will use all the available GPUs. If no GPUs are found, it
  will use the available CPUs. Note that TensorFlow treats all CPUs on a
  machine as a single device, and uses threads internally for parallelism.


In [2]:
mirrored_strategy = tf.distribute.MirroredStrategy()

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)


In [3]:
mirrored_strategy = tf.distribute.MirroredStrategy(
    cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)


In [4]:
multiworker_strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()

INFO:tensorflow:Using MirroredStrategy with devices ('/device:CPU:0',)
INFO:tensorflow:Single-worker MultiWorkerMirroredStrategy with local_devices = ('/device:CPU:0',), communication = CollectiveCommunication.AUTO


#### MultiWorkerMirroredStrategy
A distribution strategy for synchronous training on multiple workers. Inherits from `tf.distribute.Strategy`


##### Collective Communication
`tf.distribute.experimental.CollectiveCommunication`

* AUTO
* NCCL
* RING


In [5]:
multiworker_strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
    tf.distribute.experimental.CollectiveCommunication.NCCL)

INFO:tensorflow:Using MirroredStrategy with devices ('/device:CPU:0',)
INFO:tensorflow:Single-worker MultiWorkerMirroredStrategy with local_devices = ('/device:CPU:0',), communication = CollectiveCommunication.NCCL


#### Central Storage Strategy

We create a CentralStorageStrategy instance which will use all visible GPUs and CPU. Variables updated on all replicas are aggregates before applying to variables

In [6]:
central_storage_strategy = tf.distribute.experimental.CentralStorageStrategy()

INFO:tensorflow:ParameterServerStrategy (CentralStorageStrategy if you are using a single machine) with compute_devices = ['/job:localhost/replica:0/task:0/device:CPU:0'], variable_device = '/job:localhost/replica:0/task:0/device:CPU:0'


In [9]:
strategy = tf.distribute.experimental.CentralStorageStrategy()
# Create a dataset
ds = tf.data.Dataset.range(5).batch(2)
# Distribute that dataset
dist_dataset = strategy.experimental_distribute_dataset(ds)

with strategy.scope():
  @tf.function
  def train_step(val):
    return val + 1

  # Iterate over the distributed dataset
  for x in dist_dataset:
    # process dataset elements
    print(x)
    print(train_step)
    strategy.run(train_step, args=(x,))

INFO:tensorflow:ParameterServerStrategy (CentralStorageStrategy if you are using a single machine) with compute_devices = ['/job:localhost/replica:0/task:0/device:CPU:0'], variable_device = '/job:localhost/replica:0/task:0/device:CPU:0'
tf.Tensor([0 1], shape=(2,), dtype=int64)
<tensorflow.python.eager.def_function.Function object at 0x14dc43ac8>
tf.Tensor([2 3], shape=(2,), dtype=int64)
<tensorflow.python.eager.def_function.Function object at 0x14dc43ac8>
tf.Tensor([4], shape=(1,), dtype=int64)
<tensorflow.python.eager.def_function.Function object at 0x14dc43ac8>
