<a href="https://colab.research.google.com/github/ashikshafi08/Learning_Tensorflow/blob/main/tpu_nbs/Learning_to_use_TPU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [10]:
import tensorflow as tf 
import os 
import json 
import pprint
import tensorflow_datasets as tfds 

# Creating TF Records 


- TFRecord format is a simple format for storing a sequence of binary records. 
- Protocol messages are defined by .proto files, these are often 

In [11]:
# Bunch of data processing 
root_dir = "datasets"
tfrecords_dir = "tfrecords"
images_dir = os.path.join(root_dir, "val2017")
annotations_dir = os.path.join(root_dir, "annotations")
annotation_file = os.path.join(annotations_dir, "instances_val2017.json")
images_url = "http://images.cocodataset.org/zips/val2017.zip"
annotations_url = (
    "http://images.cocodataset.org/annotations/annotations_trainval2017.zip"
)

# Download image files
if not os.path.exists(images_dir):
    image_zip = tf.keras.utils.get_file(
        "images.zip", cache_dir=os.path.abspath("."), origin=images_url, extract=True,
    )
    os.remove(image_zip)

# Download caption annotation files
if not os.path.exists(annotations_dir):
    annotation_zip = tf.keras.utils.get_file(
        "captions.zip",
        cache_dir=os.path.abspath("."),
        origin=annotations_url,
        extract=True,
    )
    os.remove(annotation_zip)

print("The COCO dataset has been downloaded and extracted successfully.")

with open(annotation_file, "r") as f:
    annotations = json.load(f)["annotations"]

print(f"Number of images: {len(annotations)}")

The COCO dataset has been downloaded and extracted successfully.
Number of images: 36781


In [18]:
pprint.pprint(annotations[60])

{'area': 367.89710000000014,
 'bbox': [265.67, 222.31, 26.48, 14.71],
 'category_id': 72,
 'id': 34096,
 'image_id': 525083,
 'iscrowd': 0,
 'segmentation': [[267.51,
                   222.31,
                   292.15,
                   222.31,
                   291.05,
                   237.02,
                   265.67,
                   237.02]]}


In [19]:
len_annotations = len(annotations)  # number of samples in the dataset
 
# Number of data samples on each tf records 
num_samples = 4096 

# Total number of tfrecords we will create 
num_tfrecords = len_annotations // num_samples
print(f'Total number of tfrecords we will be creating {num_tfrecords}')

Total number of tfrecords we will be creating 8


In [20]:
# If any samples missing create a tf record for it 
if annot % num_samples: 
  num_tfrecords += 1 # add one record if there are remaining samples left 

num_tfrecords

9

In [21]:
import os 
tfrecords_dir = "tfrecords"

# If there is no directory of tfrecords, thenn create it 
if not os.path.exists(tfrecords_dir):
  os.makedirs(tfrecords_dir)

## Writing out TFRecords Helper Functions 

- Our data should be serialized (encoded as byte string) before bering written with a TFRecord. 
- The most convenient way of serializing our data is to wrap them with `tf.Example`.
- Its more or less like a `dict` with some type of annotations. 


### Serialization 

- TFRecord -> is a kind of file that TensorFlow uses to store binary data. 
- TFRecords contain sequences of byte-strings. 


In [24]:
# Specifying the path for tfrecords 
path = 'data/data.tfrecord'



with tf.io.TFRecordWriter(path = path) as f:
   f.write(b'123') # write one record 
   f.write(b'xyzb123') # another record 


# Opening the file we've just written 
with open(path , 'rb') as f:
  print(f.read())

b'\x03\x00\x00\x00\x00\x00\x00\x00\xb0\x99I\x0e123\xce\x0b\xe7\x01\x07\x00\x00\x00\x00\x00\x00\x00\xbb\xd7\x9f\x11xyzb123\xee\x88\x11v'


# Training with TPU's 

#### TPU Initilization 

- TPU's are typically cloud TPU workers. 
- To work with TPU first we gotta initialize and connect to the remote cluster. We can do this by using `tf.distribute.cluster_resolver.TPUClusterResolver`

In [9]:
# Setting up the TPU 
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='') # responsible for making TPU clusters with GCP
tf.config.experimental_connect_to_cluster(resolver) # connecting to the cloud instance

# Initializing the TPU 
tf.tpu.experimental.initialize_tpu_system(resolver)

# Listing out all the TPU's available
print(f"All Devices: {tf.config.list_logical_devices('TPU')}")

INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.


INFO:tensorflow:Deallocate tpu buffers before initializing tpu system.






INFO:tensorflow:Initializing the TPU system: grpc://10.125.18.82:8470


INFO:tensorflow:Initializing the TPU system: grpc://10.125.18.82:8470


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


All Devices: [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:0', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:1', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:2', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:3', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:4', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:5', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:6', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type='TPU')]


Using manual placement and using one tpu for our computation. There are 8 cores available. 

In [10]:
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])


with tf.device('/TPU:0'):
  c = tf.matmul(a , b)

print(f'C Device: {c.device}')
print(c)

C Device: /job:worker/replica:0/task:0/device:TPU:0
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)


###  Distribution Strategies 

- While training the model we run it on multiple TPU's in a data-parallel way. 
- We got 8 cores of TPU and we can run our computation on each of them parallely. 
- TensorFlow offers several distribution strategies. We will use `tf.distribute.TPUStrategy` that will lets us run tensors on TPUs. 



In [11]:
# Creating an object of the distributive strategy 
strategy = tf.distribute.TPUStrategy(resolver)


INFO:tensorflow:Found TPU system:


INFO:tensorflow:Found TPU system:


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


During the distribution strategies, the variable created within the strategy's scope will be replicated across all the replicas and can kept in sync during all-reduce algo. 

In [13]:
# It should print out 8 times 

@tf.function 
def matmul_fn(x , y):
  return tf.matmul(x , y)


# Running on the tpu strategy where it replicate's the function 8 times 
# Pass in the function  + arguments for the function
z = strategy.run(fn = matmul_fn , args = (a , b))
print(z)

PerReplica:{
  0: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  1: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  2: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  3: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  4: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  5: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  6: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32),
  7: tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)
}


We can see that after counting the output we get to know that it has been ran 8 times, because of 8 cores TPU we're running the function on each single TPU's 


### Classification on TPU's 


We will use the above strategy to train a Keras Classification model. 

> **Note**: Keras **model creation** needs to be inside the `strategy.scope` so that the variable can be created on each TPU device. Its more enabling the switch for our model to let it use the TPU services. 

Rest code can be outside of the `strategy.scope`

In [15]:
# Simple Sequential model

def create_model():
  return tf.keras.Sequential(
      [tf.keras.layers.Conv2D(256, 3, activation='relu', input_shape=(28, 28, 1)),
       tf.keras.layers.Conv2D(256, 3, activation='relu'),
       tf.keras.layers.Flatten(),
       tf.keras.layers.Dense(256, activation='relu'),
       tf.keras.layers.Dense(128, activation='relu'),
       tf.keras.layers.Dense(10)])

### Loading the dataset 

- Wbile using the Cloud TPU we gotta be efficient while making a dataset with `tf.data.Dataset`, its impossible for the Cloud TPU's to work unless if we feed the data into tf.data.Dataset API real qucik. 
- 