<a href="https://colab.research.google.com/github/HagarIbrahiem/Distributed-Training-Strategies-with-TensorFlow/blob/main/Tensorflow_%7C_Basic_Mirrored_Strategy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mirrored Strategy: Basic

Train the model using [Mirrored Strategy -One GPU](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy)


 **MirroredStrategy**

`tf.distribute.MirroredStrategy` supports synchronous distributed training on multiple GPUs on one machine. It creates one replica per GPU device. Each variable in the model is mirrored across all the replicas. Together, these variables form a single conceptual variable called `MirroredVariable`. These variables are kept in sync with each other by applying identical updates.

Efficient all-reduce algorithms are used to communicate the variable updates across the devices. All-reduce aggregates tensors across all the devices by adding them up, and makes them available on each device. It’s a fused algorithm that is very efficient and can reduce the overhead of synchronization significantly. There are many all-reduce algorithms and implementations available, depending on the type of communication available between devices. By default, it uses the NVIDIA Collective Communication Library ([NCCL](https://developer.nvidia.com/nccl)) as the all-reduce implementation. You can choose from a few other options or write your own.



## Imports Libs ⚒️ ⚙️

In [2]:
# Import Required Libs , TensorFlow ,and TensorFlow Datasets

import warnings
warnings.filterwarnings('ignore')


import os, time
import numpy as np
from IPython.display import HTML,display

try:
  # %tensorflow_version only exists .
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow_datasets as tfds
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt

tfds.disable_progress_bar()




Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.


In [3]:
# Constants

MODULE_HANDLE = 'https://tfhub.dev/tensorflow/resnet_50/feature_vector/1'
IMAGE_SIZE = 224
NEW_IMAGE_SIZE = (IMAGE_SIZE, IMAGE_SIZE)
BATCH_SIZE = 32
BUFFER_SIZE = 10000




# Load Data ⌛

`TFDS` provides a collection of ready-to-use datasets for use with TensorFlow, and other Machine Learning frameworks.

It handles downloading and preparing the data deterministically and constructing a tf.data.Dataset (or np.array).

We are going to use the `tf_flowers` dataset  which is only split into a `TRAINING` set. We have to use tfds.splits to split this training set into to a training_set, a validation_set, and a test_set. 

In this example we are splitting:
- 80% to the training_set 
- 10% to the validation_set 
- 10% to the validation_set

In [4]:
# Load the Fashion-MNIST dataset

datasets, info = tfds.load(name='fashion_mnist', with_info=True, as_supervised=True, data_dir='./data')



Downloading and preparing dataset 29.45 MiB (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to ./data/fashion_mnist/3.0.1...
Dataset fashion_mnist downloaded and prepared to ./data/fashion_mnist/3.0.1. Subsequent calls will reuse this data.


In [5]:
fashion_mnist_train, fashion_mnist_test = datasets['train'], datasets['test']

# Explore & visualize Data  🔍 📊 👀

In [6]:
# Get the number of examples in each set from the dataset info.
print('Total Number of Training Images: {}'.format(len(fashion_mnist_train)))
print('Total Number of Test Images: {} \n'.format(len(fashion_mnist_test)))

Total Number of Training Images: 60000
Total Number of Test Images: 10000 



In [7]:

# Get the class names from the dataset
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
num_classes = len(class_names)

print(class_names)
print('Total Number of Classes: {}'.format(num_classes))

['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Total Number of Classes: 10


# Process Data  👀 🧐

In [8]:
# Function for normalizing the image
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255

    return image, label

**P.S💡** Next, you create your training and test datesets in the batch size you want by shuffling through your buffer size. ⌛

# Initialize  Strategy   👀 🧐

In [9]:

# Define the strategy to use and print the number of devices found
strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
print ('Number of devices: {}'.format(strategy.num_replicas_in_sync))

Number of devices: 1


In [10]:
BUFFER_SIZE = 10000

BATCH_SIZE_PER_REPLICA = 64
# Use for Mirrored Strategy
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
# Use for No Strategy
# BATCH_SIZE = BATCH_SIZE_PER_REPLICA * 1

In [11]:
# Set up the train and eval data set
train_dataset = fashion_mnist_train.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
eval_dataset = fashion_mnist_test.map(scale).batch(BATCH_SIZE)

# Bulid Model  ⚙️🏗️ 

Now, you define strategy using the `MirroredStrategy()` class. Print to see the number of devices available.

Note:

- If you are running this on a mahcine without GPUs, you'll see it gives a `warning` about no presence of GPU devices.
- If you are running this in Colab, make sure you have selected your Runtime to be `GPU` for it to detect it, and you'll see there's only 1 device that is available.

- **Basic Mirrored Strategy**
- **Running [[ GPU P100  ]] Mode on Kaggle**
- **Providing `Only One GPU`**

In [12]:
# Running [[ ONE GPU ]] on Kaggle
# Define the strategy to use and print the number of devices found
strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

Number of devices: 1


In [13]:
BUFFER_SIZE = 10000

BATCH_SIZE_PER_REPLICA = 64
# Use for Mirrored Strategy
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
# Use for No Strategy
# BATCH_SIZE = BATCH_SIZE_PER_REPLICA * 1

In [14]:

with strategy.scope():
    model = tf.keras.Sequential([
      tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
      tf.keras.layers.MaxPooling2D(),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(64, activation='relu'),
      tf.keras.layers.Dense(10)
    ])




In [15]:
# configure the optimizer, loss and metrics
model.compile(loss='sparse_categorical_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

# display summary
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 5408)              0         
                                                                 
 dense (Dense)               (None, 64)                346176    
                                                                 
 dense_1 (Dense)             (None, 10)                650       
                                                                 
Total params: 347,146
Trainable params: 347,146
Non-trainable params: 0
__________________________________________________

**P.S.💡**
Due to the Total params: `23,571,397` consisting of `10,245` trainable params and `23,561,152` non-trainable params, running this large model on a typical `CPU` will require a considerable amount of time

# Train Model  🔥 🌡️ 

Let the Magic Begin !🔮

In [20]:
with strategy.scope():
    EPOCHS = 10
    start = time.time()
    model.fit(train_dataset, epochs=EPOCHS)
    end = time.time()



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


**P.S.💡** The `time` library is being utilized to estimate the duration of the model training process. Specifically, the number of `epochs` is set to `10`, 

In [21]:
duration = round( end - start , 2) 

display(HTML(f"<h5><b >The duration required for the model to train using Basic Mirrored Strategy : </b> <b style='color:red'>{duration} Seconds 🧐  ✨. </b></h5>"))