<h1><center>Tensorflow Large Model Support (TFLMS) Tutorial - 02</center></h1>

## Large Model Support

TensorFlow Large Model Support (TFLMS) is a Python module that provides an approach to training large models and data that cannot normally be fit in to GPU memory. It takes a computational graph defined by users, and automatically adds swap-in and swap-out nodes for transferring tensors from GPUs to the host and vice versa. During training and inferencing this makes the graph execution operate like operating system memory paging. The system memory is effectively treated as a paging cache for the GPU memory and tensors are swapped back and forth between the GPU memory and CPU memory.

Click <a href= https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.5.4/navigation/pai_tflms.html>here</a> for further information on how to use TFLMS and best practices.

TFLMS source code is publicly available as a pull request in the <a href= https://github.com/tensorflow/tensorflow/pull/19845>TensorFlow repository</a>.

Here are links to blog posts, papers, and videos that describe TensorFlow Large Model support, use cases, and performance characteristics.

1) <a href = https://arxiv.org/pdf/1807.02037.pdf> TFLMS: Large Model Support in TensorFlow by Graph Rewriting here </a>

2) <a href = https://www.youtube.com/watch?vwdVPh3tUQ5A/> 4 minute introduction to TensorFlow Large Model support </a> – This video is a good quick introduction to TensorFlow Large Model Support. Note that the performance numbers at the end of this video are now out dated. See the performance links below for updated performance numbers.

3) <a href = https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9426-using+tensor+swapping+and+nvlink+to+overcome+gpu+memory+limits+with+tensorflow/> NVIDIA GPU Technology Conference 2019 presentation </a> – A 40 minute presentation that discusses the use of TFLMS to overcome GPU memory limits and performance characteristics of TFLMS.

4) <a href = https://developer.ibm.com/linuxonpower/2019/05/17/performance-results-with-tensorflow-large-model-support-v2/> Performance results with TensorFlow Large Model Support v2 </a>

5) <a href = https://developer.ibm.com/linuxonpower/2018/07/27/tensorflow-large-model-support-case-study-3d-image-segmentation/> A case study using TensorFlow Large Model Support with 3D U-Net for 3D image segmentation </a>

6) <a href = https://arxiv.org/abs/1812.07816> Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method </a> – This paper contains a comparison of using TFLMS versus patching method for large images. It also contains a comparison of TFLMS vs gradient checkpointing.

7) <a href = https://arxiv.org/abs/1811.12174> Data-parallel distributed training of very large models beyond GPU capacity </a> – This paper contains a real world use case of using TFLMS with IBM Distributed Deep Learning.

8) <a href = https://developer.ibm.com/linuxonpower/2018/12/19/performance-of-3dunet-multi-gpu-model-for-medical-image-segmentation-using-tensorflow-large-model-support/> Performance of 3DUnet Multi GPU Model for Medical Image Segmentation using TensorFlow Large Model Support </a> – This blog post contains performance comparisons of whole system training using TFLMS with IBM Distributed Deep Learning and Horovod on x86 and IBM AC922 servers.

## MNIST Classification using TFLMS and Keras

In this example we will train a CNN on MNIST dataset, using Tensorflow Large Model Support along with Keras. 

<b>Note: This example just demonstrates how to deploy LMS into Keras applications. The program runs as well without TFLMS. The model itself is not big enough to necessitate TFLMS.</b>

In [1]:
from __future__ import print_function
import tensorflow as tf
from tensorflow.python import keras
from tensorflow.python.keras.datasets import mnist
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Dropout, Flatten
from tensorflow.python.keras.layers import Conv2D, MaxPooling2D
from tensorflow.python.keras import backend as K

from tensorflow_large_model_support import LMS
tf.logging.set_verbosity(tf.logging.INFO)


For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.



<br><br>
Load the MNIST dataset and separate into training set and testing set.


## MNIST Dataset Overview

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 255. 

In this example, each image will be converted to float32 and normalized to [0, 1].

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

More info: http://yann.lecun.com/exdb/mnist/

In [2]:
# input image dimensions
img_rows, img_cols = 28, 28
num_classes = 10
epochs = 12
batch_size = 128

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print('x_test shape:', x_test.shape)
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print('y_train shape:', y_train.shape)
print(y_train.shape[0], 'train labels')
print('y_test shape:', y_test.shape)
print(y_test.shape[0], 'test labels')

x_train shape: (60000, 28, 28, 1)
60000 train samples
x_test shape: (10000, 28, 28, 1)
10000 test samples
y_train shape: (60000, 10)
60000 train labels
y_test shape: (10000, 10)
10000 test labels



## CNN Overview

![CNN](http://personal.ie.cuhk.edu.hk/~ccloy/project_target_code/images/fig3.png)

In [3]:
def createModel():
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
    return model

LMS( ) is the Keras callback to activate Large Model Support. If we do not specify specific tuning parameters to LMS( ), the auto tuning will determine that TFLMS is not needed and disable it.

In [None]:
model = createModel()
ms_callback = LMS(swapout_threshold=40, swapin_ahead=3, swapin_groupby=2)

#Run "watch -n0.1 nvidia-smi" to see the GPU utilization
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=2,
          validation_data=(x_test, y_test),
          callbacks=[lms_callback])
score = model.evaluate(x_test, y_test, verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])
