<a href="https://colab.research.google.com/github/joanby/tensorflow2/blob/master/Collab%2014%20-%20Entrenamiento%20Distribuido.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Paso 1: Instalación de dependencias y configuración del entorno

In [2]:
#!pip install tensorflow-gpu==2.3.0

## Paso 2: Importar las dependencias del proyecto

In [3]:
import time
import numpy as np
import tensorflow as tf

In [4]:
tf.__version__

'2.4.1'

## Paso 3: Pre procesado del dataset

### Cargar el dataset del MNIST

In [5]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


### Normalización de las imágenes

In [6]:
X_train = X_train / 255.
X_test = X_test / 255.

In [7]:
X_train.shape

(60000, 28, 28)

### Redimensionar el dataset

In [8]:
X_train = X_train.reshape(-1, 28*28)
X_test = X_test.reshape(-1, 28*28)

In [9]:
X_train.shape

(60000, 784)

## Paso 4: Entrenamiento distribuido

### Definir un modelo normal (no distribuido)

In [10]:
model_normal = tf.keras.models.Sequential()

In [11]:
model_normal.add(tf.keras.layers.Dense(units=128, activation='relu', input_shape=(784,)))

In [12]:
model_normal.add(tf.keras.layers.Dropout(rate=0.2))

In [13]:
model_normal.add(tf.keras.layers.Dense(units=10, activation='softmax'))

In [14]:
model_normal.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

### Definir una estrategia distribuida

In [18]:
distribute = tf.distribute.MirroredStrategy()

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)


### Definir un modelo distribuido

In [19]:
with distribute.scope():
  model_distributed = tf.keras.models.Sequential()
  model_distributed.add(tf.keras.layers.Dense(units=128, activation='relu', input_shape=(784,)))
  model_distributed.add(tf.keras.layers.Dropout(rate=0.2))
  model_distributed.add(tf.keras.layers.Dense(units=10, activation='softmax'))
  model_distributed.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).


### Comparar la velocidad de entrenamiento normal vs distribuida

In [20]:
start_time = time.time()
model_distributed.fit(X_train, y_train, epochs=10, batch_size=25)
print("El entrenamiento distribuido ha tardado: {}".format(time.time() - start_time))

Epoch 1/10
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
El entrenamiento distribuido ha tardado: 81.56268

In [21]:
start_time = time.time()
model_normal.fit(X_train, y_train, epochs=10, batch_size=25)
print("El entrenamiento normal ha tardado: {}".format(time.time() - start_time))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
El entrenamiento normal ha tardado: 54.373366832733154
