# TPU Strategy

**TPUStrategy**

tf.distribute.TPUStrategy lets you run your TensorFlow training on Tensor Processing Units (TPUs). TPUs are Google's specialized ASICs designed to dramatically accelerate machine learning workloads. They are available on Google Colab, the TPU Research Cloud, and Cloud TPU.

In terms of distributed training architecture, TPUStrategy is the same MirroredStrategy—it implements synchronous distributed training. TPUs provide their own implementation of efficient all-reduce and other collective operations across multiple TPU cores, which are used in TPUStrategy.


# Imports Libs ⚒️ ⚙️

In [2]:
# Import Required Libs , TensorFlow ,and TensorFlow Datasets

import warnings
warnings.filterwarnings('ignore')


import os, time
import numpy as np
from IPython.display import HTML,display

try:
  # %tensorflow_version only exists .
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow_datasets as tfds
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt

tfds.disable_progress_bar()

# Load Data ⌛

`TFDS` provides a collection of ready-to-use datasets for use with TensorFlow, and other Machine Learning frameworks.

It handles downloading and preparing the data deterministically and constructing a tf.data.Dataset (or np.array).

We are going to use the `fashion_mnist` dataset  which is only split into a `TRAINING` set. We have to use tfds.splits to split this training set into to a training_set, a validation_set, and a test_set. 

In this example we are splitting:
- Train data 
- Test data

In [3]:
# Load the Fashion-MNIST dataset
(train_images, train_labels), (test_images, test_labels)= tf.keras.datasets.fashion_mnist.load_data()


# Explore & visualize Data  🔍 📊 👀

In [4]:
# Get the number of examples in each set from the dataset info.
print('Total Number of Training Images: {}'.format(len(train_images)))
print('Total Number of Test Images: {} \n'.format(len(test_images)))

Total Number of Training Images: 60000
Total Number of Test Images: 10000 



In [5]:
# Get the class names from the dataset
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
num_classes = len(class_names)

print(class_names)
print('Total Number of Classes: {}'.format(num_classes))

['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Total Number of Classes: 10


# Process Data  👀 🧐

In [6]:
train_images = train_images.reshape(-1, 28, 28, 1).astype('float32') / 255.0
test_images = test_images.reshape(-1, 28, 28, 1).astype('float32') / 255.0

# Initialize Strategy 👀 🧐

Now, you define `strategy` using the `TPUtrategy()` class. 

**Note:** 
- If you are running this in Kaggle OT COLAB, make sure you have selected your `Accelerator` to be `TPU` for it to detect it. 
- 8 devices are available.  


- **TPUStrategy**
- **Running [[ TPU  ]] Mode on Kaggle**
- **Providing `8 Accelerators`**

In [21]:
# # Define the strategy to use and print the number of devices found
try:
    # detect and init the TPU
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()
    # instantiate a distribution strategy
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
    # print('Running on TPU ', tpu.cluster_spec().as_dict()['worker']) 
    print ('Number of devices/accelerators: {}'.format(strategy.num_replicas_in_sync))

except ValueError:
  print('TPU failed to initialize.')

INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.


INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.






INFO:tensorflow:Initializing the TPU system: local


INFO:tensorflow:Initializing the TPU system: local


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Found TPU system:


INFO:tensorflow:Found TPU system:


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


Number of devices/accelerators: 8


**P.S💡** Now, we create training and eval examples, define `batch size` and also define `BATCH_SIZE_PER_REPLICA` which is the distribution we are making for each available device. ⌛

In [8]:
BATCH_SIZE_PER_REPLICA = 128
# Use for Mirrored Strategy
GLOBAL_BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync

# Use for No Strategy , when we run on a singel machine
# BATCH_SIZE = BATCH_SIZE_PER_REPLICA * 1

print ('BATCH_SIZE_PER_REPLICA',BATCH_SIZE_PER_REPLICA )
print ('GLOBAL_BATCH_SIZE on the machine (BATCH_SIZE_PER_REPLICA * No. of GPUs) =',GLOBAL_BATCH_SIZE )

BATCH_SIZE_PER_REPLICA 128
GLOBAL_BATCH_SIZE on the machine (BATCH_SIZE_PER_REPLICA * No. of GPUs) = 1024


In [9]:
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle(60000).batch(GLOBAL_BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(GLOBAL_BATCH_SIZE)

# Bulid Model  ⚙️🏗️ 

- For model to follow the strategy, we need to define the model within the strategy's scope using `with strategy.scope():` 

- The important thing to notice and compare is the time taken for each `epoch` to complete.

In [11]:
with strategy.scope():
    model = tf.keras.Sequential([
      tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
      tf.keras.layers.MaxPooling2D(),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      tf.keras.layers.Dense(10)
    ])
    
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'],steps_per_execution=32)


# Train Model 🔥 🌡️

Let the Magic Begin !🔮

In [19]:

with strategy.scope():
    EPOCHS = 10
    start = time.time()
    model.fit(train_dataset, epochs=EPOCHS )
    end = time.time()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


**P.S.💡** The `time` library is being utilized to estimate the duration of the model training process. Specifically, the number of `epochs` is set to `10`, 

In [20]:
duration = round( end - start , 2) 
display(HTML(f"<h5><b >The duration required for the model to train using TPU Mirrored Strategy : </b> <b style='color:red'>{duration} Seconds 🧐  ✨. </b></h5>"))