<a href="https://colab.research.google.com/github/Anjasfedo/Learning-TensorFlow/blob/main/eat_tensorflow2_in_30_days/Chapter6_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6-4 Model Training Using Multiple GPUs

We recommend using pre-defined `fit` method for training when using multiple GPU, which only requires two additional line of code.

In Colab notebook, choose GPU in Edit -> Notebook Settings -> Hardware Accelerator

Introduction to MirroredStrategry:
- The strategy gives a copy to each of the N computing devices before training.
- When a batch of training data is received, devide the data into N portions and transfer them into N devices (data parallelism).
- Each device calculate the local variables (morrored variables) to calculate the gradient according to the received portion of data.
= Implement All-reduce operation in parallel computing, exchange the gradient data and calculate summation allows each device to obtain the gradient sum from all the devices.
- Update the local variables (mirrored variables) using the gradient sum.
- Proceed to the next round of training when all the devices updated their local variables (This is a fully synchronized strategy).

In [2]:
%tensorflow_version 2.x
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras import *

Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.
2.17.0


In [3]:
# Simulate two logical GPUs with one physical GPU
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Set two logical GPUs for simulation
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024),
         tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    print(e)

1 Physical GPU, 2 Logical GPUs


## 1. Data Preparation

In [4]:
MAX_LEN = 300
BATCH_SIZE = 32

(x_train, y_train), (x_test, y_test) = datasets.reuters.load_data()
x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=MAX_LEN)
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=MAX_LEN)

MAX_WORDS = x_train.max() + 1
CAT_NUM = y_train.max() + 1

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/reuters.npz
[1m2110848/2110848[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1us/step


In [5]:
ds_train = tf.data.Dataset.from_tensor_slices((x_train, y_train)) \
          .shuffle(buffer_size=1000).batch(BATCH_SIZE) \
          .prefetch(tf.data.experimental.AUTOTUNE).cache()

ds_test = tf.data.Dataset.from_tensor_slices((x_test, y_test)) \
          .shuffle(buffer_size=1000).batch(BATCH_SIZE) \
          .prefetch(tf.data.experimental.AUTOTUNE).cache()

## 2. Model Defining

In [9]:
tf.keras.backend.clear_session()

def create_model():
  model = models.Sequential()

  model.add(layers.Embedding(MAX_WORDS, 7, input_length=MAX_LEN))
  model.add(layers.Conv1D(filters=64, kernel_size=5, activation='relu'))
  model.add(layers.MaxPool1D(2))
  model.add(layers.Conv1D(filters=32, kernel_size=2, activation='relu'))
  model.add(layers.MaxPool1D(2))
  model.add(layers.Flatten())
  model.add(layers.Dense(CAT_NUM, activation='softmax'))

  return (model)

In [10]:
def compile_model(model):
  model.compile(optimizer=optimizers.Nadam(),
                loss=losses.SparseCategoricalCrossentropy(),
                metrics=[metrics.SparseCategoricalAccuracy(), metrics.SparseTopKCategoricalAccuracy(5)])
  return (model)

## 3. Model Training

In [12]:
# Add the following two lines of code
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
  model = create_model()
  model.summary()
  model = compile_model(model)

In [13]:
history = model.fit(ds_train, validation_data=ds_test, epochs=10)

Epoch 1/10
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 19ms/step - loss: 2.4580 - sparse_categorical_accuracy: 0.3721 - sparse_top_k_categorical_accuracy: 0.7160 - val_loss: 1.7239 - val_sparse_categorical_accuracy: 0.5494 - val_sparse_top_k_categorical_accuracy: 0.7640
Epoch 2/10
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 15ms/step - loss: 1.6087 - sparse_categorical_accuracy: 0.5834 - sparse_top_k_categorical_accuracy: 0.7738 - val_loss: 1.6252 - val_sparse_categorical_accuracy: 0.5797 - val_sparse_top_k_categorical_accuracy: 0.7809
Epoch 3/10
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 20ms/step - loss: 1.2615 - sparse_categorical_accuracy: 0.6634 - sparse_top_k_categorical_accuracy: 0.8286 - val_loss: 1.7759 - val_sparse_categorical_accuracy: 0.5779 - val_sparse_top_k_categorical_accuracy: 0.7916
Epoch 4/10
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - loss: 0.9073 - sparse_categoric