Skip to content

Using Keras with MXNet in Multi GPU mode

Sandeep Krishnamurthy edited this page Jul 4, 2017 · 9 revisions

Table of contents

  1. Overview
  2. Objective
  3. Prerequisites
  4. Prepare the Data
  5. Build the Model
  6. Compile the Model with Multi-GPU support
  7. Train the Model
  8. References

Overview

In this tutorial, we will use Keras, with MXNet backend, on a multi-GPU machine, to train a Convolutional Neural Network (CNN) model on CIFAR10 small images dataset.

Objective

The main objective of this tutorial is to show how to use multiple GPUs for training the neural network using Keras with MXNet backend.

If you are already a user of Keras and just want to know how to use multiple GPUs for your model training with MXNet backend, it is as simple as compiling the model by specifying the list of GPU IDs to be used for model training.

Example, for training with 4 GPUs, we compile the model with context set to a list - [gpu(0), gpu(1), gpu(2), gpu(3)]

# Compile the model by specifying context with list of GPU IDs to be used during training.
gpu_list = ["gpu(0)", "gpu(1)", "gpu(2)", "gpu(3)"] 
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'], context=gpu_list)

That's it! MXNet backend will use 4 GPUs for training your model!

Below is the more detailed tutorial to train a Convolutional Neural Network (CNN) model on CIFAR10 small images dataset using Keras with MXNet backend.

Prerequisites

  1. GPU machine with CUDA and cuDNN installed
  2. Keras
  3. MXNet with GPU support

Follow the step by step installation instructions here to set up your machine with Keras with MXNet backend.

Prepare the Data

CIFAR10 is a dataset of 50,000 32x32 color training images, labeled over 10 categories, and 10,000 test images. Load CIFAR10 dataset using keras's dataset utility.

We will use categorical cross entropy to calculate the loss in model training. Hence, convert the integer representation of 10 categories, in the train and test dataset, to binary representation using to_categorical function.

import keras
from keras.datasets import cifar10

num_classes = 10

# The data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# Convert class vectors to binary class matrices.
Y_train = keras.utils.np_utils.to_categorical(y_train, num_classes)
Y_test = keras.utils.np_utils.to_categorical(y_test, num_classes)

Build the Model

Build a sequential model with 3 layers (1 input layer, 1 hidden layer, and 1 output layer). We do not dive deep in to the architectural details of the neural network and focus mainly on our objective of this tutorial to show case how to use multiple GPUs in Keras with MXNet backend.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D

model = Sequential()

model.add(Convolution2D(32, 3, 3, border_mode='same',
                        input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

Compile the Model with Multi-GPU support

You can easily use multiple GPUs for training with Keras using MXNet backend, by compiling the model with context parameter having a list of GPU IDs. MXNet will fetch this context and use specified GPUs during training the model.

Example, for training with 4 GPUs, we compile the model with context set to a list - [gpu(0), gpu(1), gpu(2), gpu(3)]

# Initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
    
# Prepare GPU list to be used in training
NUM_GPU = 4
gpu_list = []
for i in range(NUM_GPU):
    gpu_list.append('gpu(%d)' % i)

# Let's train the model using RMSprop. Specify context with list of GPU IDs to be used during training.
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'], context=gpu_list)

Train the Model

Most deep learning models dealing with images requires some form of image augmentation (modifying the image) techniques on training data for better accuracy, convergence and various other advantages for a good training process. Keras exposes a powerful collection of image augmentation techniques via ImageDataGenerator class. ImageDataGenerator augments the image during the training process i.e., it performs just-in-time image augmentation and feed the augmented image to the network.

We first create 'datagen' object of type ImageDataGenerator by specifying a set of image augmentations to perform on CIFAR training data images. Example - width_shift, height_shift, random_flip.

Data generator is then fit on to the training data followed by using flow() function of ImageDataGenerator to iterate over training data in batches for the given batch_size during model training process.

from keras.preprocessing.image import ImageDataGenerator

batch_size = 32
epochs = 200

# This will do preprocessing and realtime data augmentation:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

# Compute quantities required for feature-wise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(X_train)

# Fit the model on the batches generated by datagen.flow().
history = model.fit_generator(datagen.flow(X_train, Y_train,
                                     batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=epochs,
                        validation_data=(X_test, Y_test))

References