**XLA (Accelerated Linear Algebra), a compiler for TensorFlow. XLA uses JIT compilation techniques to analyze the TensorFlow graph created by the user at runtime, specialize it for the actual runtime dimensions and types, fuse multiple ops together and emit efficient native machine code for them - for devices like CPUs, GPUs and custom accelerators (e.g. Google’s TPU).**

In [None]:
# Checking Versions to use proper functionalities based on the version
# Use the TPU runtime version that matches the version of TensorFlow with which your model was written
# Refer https://cloud.google.com/tpu/docs/supported-tpu-versions for compatibility

import tensorflow as tf
print ("TensorFlow Version",tf.__version__)

# Checking Python Version
! python --version

# Checking TPU Version
import os
from tensorflow.python.profiler import profiler_client

tpu_profile_service_address = os.environ['COLAB_TPU_ADDR'].replace('8470', '8466')
print(profiler_client.monitor(tpu_profile_service_address, 100, 2))

TensorFlow Version 2.8.0
Python 3.7.12
  Timestamp: 15:48:56
  TPU type: TPU v2
  Utilization of TPU Matrix Units (higher is better): 0.000%


time: 761 ms (started: 2022-03-03 15:48:55 +00:00)


In [None]:
!pip install ipython-autotime # For automatic Time Display
%load_ext autotime

Collecting ipython-autotime
  Downloading ipython_autotime-0.3.1-py2.py3-none-any.whl (6.8 kB)
Installing collected packages: ipython-autotime
Successfully installed ipython-autotime-0.3.1
time: 223 µs (started: 2022-03-02 14:58:04 +00:00)


In [None]:
# Checking if TPU is available - Change Runtime Type to TPU in Google Colab

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='') # For Google Colab
tf.config.experimental_connect_to_cluster(resolver)
# This is the TPU initialization code that has to be at the beginning.
tf.tpu.experimental.initialize_tpu_system(resolver)
device_name = tf.config.list_logical_devices('TPU')
print('Found TPU at: {}'.format(device_name))

INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.


INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.


INFO:tensorflow:Initializing the TPU system: grpc://10.110.171.18:8470


INFO:tensorflow:Initializing the TPU system: grpc://10.110.171.18:8470


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


Found TPU at: [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:0', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:1', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:2', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:3', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:4', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:5', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:6', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type='TPU')]
time: 12.2 s (started: 2022-03-02 14:58:15 +00:00)


In [None]:
@tf.function(jit_compile=True)
def my_func_xla(a,b,c):
  return tf.reduce_sum(a + b * c)

time: 2.36 ms (started: 2022-03-02 14:58:30 +00:00)


In [None]:
with tf.device('/TPU:0'):
  with tf.compat.v1.Session() as sess_w_xla:
    print(sess_w_xla.run(my_func_xla(tf.ones([4,4]),tf.ones([4,4]),tf.ones([4,4]))))

32.0
time: 424 ms (started: 2022-03-02 14:58:34 +00:00)


In [None]:
strategy = tf.distribute.TPUStrategy(resolver)

INFO:tensorflow:Found TPU system:


INFO:tensorflow:Found TPU system:


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


time: 225 ms (started: 2022-03-02 14:58:38 +00:00)


In [None]:
@tf.function(jit_compile=True)
def my_func_xla(a,b,c):
  return tf.reduce_sum(a + b * c)

time: 2.2 ms (started: 2022-03-02 14:58:41 +00:00)


In [None]:
z = strategy.run(my_func_xla, args=(tf.ones([4,4]),tf.ones([4,4]),tf.ones([4,4])))
print(z)

PerReplica:{
  0: tf.Tensor(32.0, shape=(), dtype=float32),
  1: tf.Tensor(32.0, shape=(), dtype=float32),
  2: tf.Tensor(32.0, shape=(), dtype=float32),
  3: tf.Tensor(32.0, shape=(), dtype=float32),
  4: tf.Tensor(32.0, shape=(), dtype=float32),
  5: tf.Tensor(32.0, shape=(), dtype=float32),
  6: tf.Tensor(32.0, shape=(), dtype=float32),
  7: tf.Tensor(32.0, shape=(), dtype=float32)
}
time: 1.11 s (started: 2022-03-02 14:58:43 +00:00)


# **MNIST MODEL WITHOUT TPU AND WITHOUT XLA**

In [None]:
# Checking Versions

import tensorflow as tf
tf.__version__

# Checking Python Version
! python --version

Python 3.7.12
time: 127 ms (started: 2022-03-02 14:58:50 +00:00)


In [None]:
!pip install ipython-autotime # For automatic Time Display
%load_ext autotime

The autotime extension is already loaded. To reload it, use:
  %reload_ext autotime
time: 2.94 s (started: 2022-03-02 14:58:52 +00:00)


In [None]:
# Import Necessary libraries

import tensorflow as tf
import tensorflow_datasets as tfds # TENSORFLOW DATASETS

time: 1.7 ms (started: 2022-03-02 15:06:21 +00:00)


In [None]:
# Data Required
# Preparing Data

# Size of each input image, 28 x 28 pixels
IMAGE_SIZE = 28 * 28
# Number of distinct number labels, [0..9]
NUM_CLASSES = 10

time: 4.53 ms (started: 2022-03-02 15:51:43 +00:00)


In [None]:
# Loads MNIST dataset.

(ds_train, ds_test), ds_info = tfds.load('mnist', split=['train', 'test'], shuffle_files=True,as_supervised = True, with_info = True, try_gcs=True) #try_gcs for TPU

time: 672 ms (started: 2022-03-02 15:51:44 +00:00)


In [None]:
# Build a Training Pipeline
# REF: https://www.tensorflow.org/datasets/keras_example

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
# This transformation applies the map_func (here - normalize_img) to each element of this dataset,
# and returns a new dataset containing the transformed elements, in the same order as they appeared in the input.
# num_parallel_calls - map will use multiple threads to process elements.
# tf.data.AUTOTUNE - will prompt the tf.data runtime to tune the value dynamically at runtime.
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(200)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)
# the transformation uses a background thread and an internal buffer to prefetch elements from the input dataset ahead of the time they are requested.
# The number of elements to prefetch should be equal to (or possibly greater than) the number of batches consumed by a single training step.
# You could either manually tune this value, or set it to tf.data.AUTOTUNE, which will prompt the tf.data runtime to tune the value dynamically at runtime.

time: 59.2 ms (started: 2022-03-02 15:51:47 +00:00)


In [None]:
# Build an evaluation pipeline

ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(200)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

time: 17.9 ms (started: 2022-03-02 15:51:49 +00:00)


In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # A Flatten layer in Keras reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as making a 1d-array of elements
  tf.keras.layers.Dense(128,activation='relu'), # Dense implements the operation: output = activation(dot(input, kernel) + bias) - o/p size = 128
  tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
])

model.summary()
# Input Image Size = 28*28
# Flatten Layer = 28 * 28 = 784
# Dense Layer:
#   Input = 784
#   Output = 128 (As mentioned in code)
#   Param # = 784 * 128 + 128(bias) = 100480
# Similarly Next Layer


Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_7 (Flatten)         (None, 784)               0         
                                                                 
 dense_14 (Dense)            (None, 128)               100480    
                                                                 
 dense_15 (Dense)            (None, 10)                1290      
                                                                 
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
time: 79.6 ms (started: 2022-03-02 15:51:51 +00:00)


In [None]:
# Training without XLA Compiler

# To train a model with fit(), you need to specify a loss function, an optimizer, and optionally, some metrics to monitor.
# For regression models, the commonly used loss function used is mean squared error function
# while for classification models predicting the probability, the loss function most commonly used is cross entropy.
# sparse_categorical_crossentropy: Used as a loss function for multi-class classification model where the output label is assigned integer value (0, 1, 2, 3…).
# A metric is a function that is used to judge the performance of your model. Metric functions are similar to loss functions,
# except that the results from evaluating a metric are not used when training the model.

tf.config.optimizer.set_jit(False) # Start with XLA disabled.

# @tf.function - RuntimeError: Detected a call to `Model.fit` inside a `tf.function`. `Model.fit is a high-level endpoint that manages its own `tf.function`.
# Please move the call to `Model.fit` outside of all enclosing `tf.function`s.
# Note that you can call a `Model` directly on `Tensor`s inside a `tf.function` like: `model(x)`.
# We have to customize the training loop

model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy']
)

time: 17.8 ms (started: 2022-03-02 15:51:55 +00:00)


In [None]:
model.fit(ds_train,
          epochs = 2,
          validation_data=ds_test)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f33f61275d0>

time: 8.76 s (started: 2022-03-02 15:51:57 +00:00)


# **MNIST MODEL WITH TPU AND XLA ENABLED**
**WITH ONE TPU DEVICE**

**USE tf.config.optimizer.set_jit(True) TO ENABLE XLA**

**AS @tf.function(jit_compile = True) NOT POSSIBLE FOR model.fit()**


In [None]:
# Checking Versions

import tensorflow as tf
print ("TensorFlow Version",tf.__version__)

# Checking Python Version
! python --version

TensorFlow Version 2.8.0
Python 3.7.12


In [None]:
!pip install ipython-autotime # For automatic Time Display
%load_ext autotime

time: 360 µs (started: 2022-03-03 15:14:47 +00:00)


In [None]:
# Import Necessary libraries

import tensorflow as tf
import tensorflow_datasets as tfds

time: 337 ms (started: 2022-03-03 15:14:50 +00:00)


In [None]:
# Checking if TPU is available

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(resolver)
# This is the TPU initialization code that has to be at the beginning.
tf.tpu.experimental.initialize_tpu_system(resolver)

device_name = tf.config.list_logical_devices('TPU')
print('Found TPU at: {}'.format(device_name))

INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.


INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.


INFO:tensorflow:Initializing the TPU system: grpc://10.101.138.122:8470


INFO:tensorflow:Initializing the TPU system: grpc://10.101.138.122:8470


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


Found TPU at: [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:0', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:1', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:2', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:3', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:4', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:5', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:6', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type='TPU')]
time: 11.4 s (started: 2022-03-03 15:14:52 +00:00)


In [None]:
# Data Required

# REF: https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html - SOFTMAX FUNCTION
# REF: https://www.tensorflow.org/xla - XLA
# REF: https://www.tensorflow.org/xla/tutorials/jit_compile - TUTORIAL

# Eager execution is a powerful execution environment that evaluates operations immediately.
# It does not build graphs, and the operations return actual values instead of computational graphs to run later.

# Preparing Data

# Size of each input image, 28 x 28 pixels
IMAGE_SIZE = 28 * 28
# Number of distinct number labels, [0..9]
NUM_CLASSES = 10

time: 3.51 ms (started: 2022-03-03 15:15:06 +00:00)


In [None]:
# Loads MNIST dataset.

(ds_train, ds_test), ds_info = tfds.load('mnist', split=['train', 'test'], shuffle_files=True,as_supervised = True, with_info = True, try_gcs=True) #try_gcs for TPU

time: 1.28 s (started: 2022-03-03 15:15:09 +00:00)


In [None]:
print(ds_info.splits['train'].num_examples) # Size of Train Data
print(ds_info.splits['test'].num_examples) # Size of Test Data

60000
10000
time: 6.42 ms (started: 2022-03-03 15:15:13 +00:00)


In [None]:
# Build a Training Pipeline
# REF: https://www.tensorflow.org/datasets/keras_example

  # Only shuffle and repeat the dataset in training. The advantage of having an
  # infinite dataset for training is to avoid the potential last partial batch
  # in each epoch, so that you don't need to think about scaling the gradients
  # based on the actual batch size.

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.repeat()
ds_train = ds_train.batch(200)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

time: 68.3 ms (started: 2022-03-03 15:15:15 +00:00)


In [None]:
# Build an evaluation pipeline

ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.batch(200)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

time: 30.6 ms (started: 2022-03-03 15:15:17 +00:00)


In [None]:
with tf.device('/TPU:0'):
  model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # A Flatten layer in Keras reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as making a 1d-array of elements
  tf.keras.layers.Dense(128,activation='relu'), # Dense implements the operation: output = activation(dot(input, kernel) + bias) - o/p size = 128
  tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
  ])

  model.summary()
# Input Image Size = 28*28
# Flatten Layer = 28 * 28 = 784
# Dense Layer:
#   Input = 784
#   Output = 128 (As mentioned in code)
#   Param # = 784 * 128 + 128(bias) = 100480
# Similarly Next Layer

# Training with XLA Compiler

# To train a model with fit(), you need to specify a loss function, an optimizer, and optionally, some metrics to monitor.
# For regression models, the commonly used loss function used is mean squared error function
# while for classification models predicting the probability, the loss function most commonly used is cross entropy.
# sparse_categorical_crossentropy: Used as a loss function for multi-class classification model where the output label is assigned integer value (0, 1, 2, 3…).
# A metric is a function that is used to judge the performance of your model. Metric functions are similar to loss functions,
# except that the results from evaluating a metric are not used when training the model.

  tf.config.optimizer.set_jit(True) # Enable XLA.

# @tf.function - RuntimeError: Detected a call to `Model.fit` inside a `tf.function`. `Model.fit is a high-level endpoint that manages its own `tf.function`.
# Please move the call to `Model.fit` outside of all enclosing `tf.function`s.
# Note that you can call a `Model` directly on `Tensor`s inside a `tf.function` like: `model(x)`.
# We have to customize the training loop

  model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy']
  )


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 128)               100480    
                                                                 
 dense_1 (Dense)             (None, 10)                1290      
                                                                 
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
time: 161 ms (started: 2022-03-03 15:15:20 +00:00)


In [None]:
# YOU CAN USE THIS CODE SNIPPET FOR DISTRIBUTION OF TRAINING ACROSS AVAILABLE TPUs

strategy = tf.distribute.TPUStrategy(resolver)

with strategy.scope(): # creating the model in the TPUStrategy scope means we will train the model on the TPUs
  model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # A Flatten layer in Keras reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as making a 1d-array of elements
  tf.keras.layers.Dense(128,activation='relu'), # Dense implements the operation: output = activation(dot(input, kernel) + bias) - o/p size = 128
  tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
  ])

  model.summary()
# Input Image Size = 28*28
# Flatten Layer = 28 * 28 = 784
# Dense Layer:
#   Input = 784
#   Output = 128 (As mentioned in code)
#   Param # = 784 * 128 + 128(bias) = 100480
# Similarly Next Layer

# Training with XLA Compiler

# To train a model with fit(), you need to specify a loss function, an optimizer, and optionally, some metrics to monitor.
# For regression models, the commonly used loss function used is mean squared error function
# while for classification models predicting the probability, the loss function most commonly used is cross entropy.
# sparse_categorical_crossentropy: Used as a loss function for multi-class classification model where the output label is assigned integer value (0, 1, 2, 3…).
# A metric is a function that is used to judge the performance of your model. Metric functions are similar to loss functions,
# except that the results from evaluating a metric are not used when training the model.

  tf.config.optimizer.set_jit(True) # Enable XLA.

# @tf.function - RuntimeError: Detected a call to `Model.fit` inside a `tf.function`. `Model.fit is a high-level endpoint that manages its own `tf.function`.
# Please move the call to `Model.fit` outside of all enclosing `tf.function`s.
# Note that you can call a `Model` directly on `Tensor`s inside a `tf.function` like: `model(x)`.
# We have to customize the training loop

  model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy']
  )


INFO:tensorflow:Found TPU system:


INFO:tensorflow:Found TPU system:


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_1 (Flatten)         (None, 784)               0         
                                                                 
 dense_2 (Dense)             (None, 128)               100480    
                                                                 
 dense_3 (Dense)             (None, 10)                1290      
                                                                 
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
time: 412 ms (started: 2022-03-03 15:15:48 +00:00)


In [None]:
# Train Size = 60000
# Batch Size = 200
# No. of Batches = 60000//200 = 300
# No. of Epochs = 2

# Test Size = 10000
# Batch Size = 200
# No. of Batches = 10000//200 = 50
# No. of Epochs = 2

model.fit(ds_train,
          epochs = 2,
          steps_per_epoch=300,
          validation_data=ds_test,
          validation_steps=50)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f130572a490>

time: 16.8 s (started: 2022-03-03 15:15:53 +00:00)


# **MNIST MODEL WITH TPU AND XLA ENABLED**
**WITH ONE TPU DEVICE**

**USE @tf.function(jit_compile = True) TO ENABLE XLA**

**USING CUSTOM TRAINING**

In [None]:
# Checking Versions

import tensorflow as tf
print ("TensorFlow Version",tf.__version__)

# Checking Python Version
! python --version

TensorFlow Version 2.8.0
Python 3.7.12
time: 143 ms (started: 2022-03-03 14:57:19 +00:00)


In [None]:
!pip install ipython-autotime # For automatic Time Display
%load_ext autotime

The autotime extension is already loaded. To reload it, use:
  %reload_ext autotime
time: 6.81 s (started: 2022-03-03 14:57:24 +00:00)


In [None]:
# Import Necessary libraries

import tensorflow as tf
import tensorflow_datasets as tfds

time: 1.23 ms (started: 2022-03-03 14:57:35 +00:00)


In [None]:
# Checking if TPU is available

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(resolver)
# This is the TPU initialization code that has to be at the beginning.
tf.tpu.experimental.initialize_tpu_system(resolver)

device_name = tf.config.list_logical_devices('TPU')
print('Found TPU at: {}'.format(device_name))

INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.


INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.






INFO:tensorflow:Initializing the TPU system: grpc://10.101.138.122:8470


INFO:tensorflow:Initializing the TPU system: grpc://10.101.138.122:8470


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


Found TPU at: [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:0', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:1', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:2', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:3', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:4', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:5', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:6', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type='TPU')]
time: 15.8 s (started: 2022-03-03 14:57:37 +00:00)


In [None]:
# Data Required

# REF: https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html - SOFTMAX FUNCTION
# REF: https://www.tensorflow.org/xla - XLA
# REF: https://www.tensorflow.org/xla/tutorials/jit_compile - TUTORIAL

# Eager execution is a powerful execution environment that evaluates operations immediately.
# It does not build graphs, and the operations return actual values instead of computational graphs to run later.

# Preparing Data

# Size of each input image, 28 x 28 pixels
IMAGE_SIZE = 28 * 28
# Number of distinct number labels, [0..9]
NUM_CLASSES = 10

time: 3.21 ms (started: 2022-03-03 14:57:55 +00:00)


In [None]:
# Loads MNIST dataset.

(ds_train, ds_test), ds_info = tfds.load('mnist', split=['train', 'test'], shuffle_files=True,as_supervised = True, with_info = True, try_gcs=True) #try_gcs for TPU

time: 680 ms (started: 2022-03-03 14:57:59 +00:00)


In [None]:
print(ds_info)

tfds.core.DatasetInfo(
    name='mnist',
    version=3.0.1,
    description='The MNIST database of handwritten digits.',
    homepage='http://yann.lecun.com/exdb/mnist/',
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    total_num_examples=70000,
    splits={
        'test': 10000,
        'train': 60000,
    },
    supervised_keys=('image', 'label'),
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
    redistribution_info=,
)

time: 1.8 ms (started: 2022-03-03 14:43:36 +00:00)


In [None]:
# Viewing the Train dataset

df = tfds.as_dataframe(ds_train.take(5), ds_info)
df

Unnamed: 0,image,label
0,,4
1,,1
2,,0
3,,7
4,,8


time: 1.21 s (started: 2022-03-03 14:43:38 +00:00)


In [None]:
# Viewing the Test dataset

df = tfds.as_dataframe(ds_test.take(5), ds_info)
df

Unnamed: 0,image,label
0,,2
1,,0
2,,4
3,,8
4,,7


time: 1.21 s (started: 2022-03-03 14:43:42 +00:00)


In [None]:
# Build a Training Pipeline
# REF: https://www.tensorflow.org/datasets/keras_example

  # Only shuffle and repeat the dataset in training. The advantage of having an
  # infinite dataset for training is to avoid the potential last partial batch
  # in each epoch, so that you don't need to think about scaling the gradients
  # based on the actual batch size.

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)

ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(200)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

time: 29.6 ms (started: 2022-03-03 14:58:05 +00:00)


In [None]:
# Build an evaluation pipeline

ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)

ds_test = ds_test.cache()
ds_test = ds_test.batch(200)
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

time: 59 ms (started: 2022-03-03 14:58:07 +00:00)


In [None]:
with tf.device('/TPU:0'):
  model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # A Flatten layer in Keras reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as making a 1d-array of elements
  tf.keras.layers.Dense(128,activation='relu',kernel_regularizer=tf.keras.regularizers.l2(l=1e-4)), # Dense implements the operation: output = activation(dot(input, kernel) + bias) - o/p size = 128
  tf.keras.layers.Dense(NUM_CLASSES, kernel_regularizer=tf.keras.regularizers.l2(l=1e-4))
  ])

  model.summary()


Model: "sequential_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_15 (Flatten)        (None, 784)               0         
                                                                 
 dense_30 (Dense)            (None, 128)               100480    
                                                                 
 dense_31 (Dense)            (None, 10)                1290      
                                                                 
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
time: 127 ms (started: 2022-03-03 14:58:11 +00:00)


In [None]:
optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

@tf.function(jit_compile = True)
def train_step(inputs, labels):
  with tf.GradientTape() as tape:
    predictions = model(inputs, training=True)
    regularization_loss=tf.math.add_n(model.losses)
    pred_loss=loss_fn(labels, predictions)
    total_loss=pred_loss + regularization_loss

  gradients = tape.gradient(total_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

time: 6.12 ms (started: 2022-03-03 14:58:16 +00:00)


In [None]:
with tf.device('/TPU:0'):
  for epoch in range(2):
    for inputs, labels in ds_train:
      train_step(inputs, labels)
    print("Finished epoch", epoch)

Finished epoch 0
Finished epoch 1
time: 4.33 s (started: 2022-03-03 14:58:19 +00:00)
