# 6-3 Model Training Using Single GPU

The training procedure of deep learning is usually time consuming. It even takes tens of days for training, and there is no need to mention those take days or hours.

The time is mainly consumpted by two stages, data preparation and parameter iteration.

We can increase the number of process to alleviate this issue if data preparation takes the majority of time.

However, if the majority of time is taken by parameter iteration, we need to use GPU or Google TPU for acceleration.

You may refer to this article for further details: ["GPU acceleration for Keras Models - How to Use Free Colab GPUs (in Chinese)"](https://zhuanlan.zhihu.com/p/68509398)

There is no need to modify source code for switching from CPU to GPU when using the pre-defined `fit` method or the customized training loops. When GPU is available and the device is not specified, TensorFlow automatically chooses GPU for tensor creating and computing.

However, for the case of using shared GPU with multiple users, sucha as using server of the company or the lab, we need to add following code to specify the GPU ID and the GPU memory size that we are going to use, in order to avoid the GPU resources to be occupied by a single user (actually TensorFlow acquires all GPU cors and all GPU memories by default) and allows multiple users perform training on it.


In Colab notebook, choose GPU in Edit -> Notebook Settings -> Hardware Accelerator

Note: the following code only executes on Colab.

You may use the following link for testing (tf_singleGPU, in Chinese)

https://colab.research.google.com/drive/1r5dLoeJq5z01sU72BX2M5UiNSkuxsEFe


In [3]:
# %tensorflow_version 2.x
import tensorflow as tf
print(tf.__version__)

2.2.0


In [4]:
from tensorflow.keras import * 

# Time stamp
@tf.function
def printbar():
    ts = tf.timestamp()
    today_ts = ts%(24*60*60)

    hour = tf.cast(today_ts//3600+8,tf.int32)%tf.constant(24)
    minite = tf.cast((today_ts%3600)//60,tf.int32)
    second = tf.cast(tf.floor(today_ts%60),tf.int32)
    
    def timeformat(m):
        if tf.strings.length(tf.strings.format("{}",m))==1:
            return(tf.strings.format("0{}",m))
        else:
            return(tf.strings.format("{}",m))
    
    timestring = tf.strings.join([timeformat(hour),timeformat(minite),
                timeformat(second)],separator = ":")
    tf.print("=========="*8,end = "")
    tf.print(timestring)

### 1. GPU Configuration


In [5]:
gpus = tf.config.list_physical_devices("GPU")

if gpus:
    gpu0 = gpus[0] # Only use GPU 0 when existing multiple GPUs
    tf.config.experimental.set_memory_growth(gpu0, True) # Set the usage of GPU memory according to needs
    # The GPU memory usage could also be fixed (e.g. 4GB)
    #tf.config.experimental.set_virtual_device_configuration(gpu0,
    #    [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)]) 
    tf.config.set_visible_devices([gpu0],"GPU") 

Compare the computing speed between GPU and CPU.


In [6]:
printbar()
with tf.device("/gpu:0"):
    tf.random.set_seed(0)
    a = tf.random.uniform((10000,100),minval = 0,maxval = 3.0)
    b = tf.random.uniform((100,100000),minval = 0,maxval = 3.0)
    c = a@b
    tf.print(tf.reduce_sum(tf.reduce_sum(c,axis = 0),axis=0))
printbar()

2.24953795e+11


In [7]:
printbar()
with tf.device("/cpu:0"):
    tf.random.set_seed(0)
    a = tf.random.uniform((10000,100),minval = 0,maxval = 3.0)
    b = tf.random.uniform((100,100000),minval = 0,maxval = 3.0)
    c = a@b
    tf.print(tf.reduce_sum(tf.reduce_sum(c,axis = 0),axis=0))
printbar()

2.24953795e+11


### 2. Data Preparation


In [8]:
MAX_LEN = 300
BATCH_SIZE = 32
(x_train,y_train),(x_test,y_test) = datasets.reuters.load_data()
x_train = preprocessing.sequence.pad_sequences(x_train,maxlen=MAX_LEN)
x_test = preprocessing.sequence.pad_sequences(x_test,maxlen=MAX_LEN)

MAX_WORDS = x_train.max()+1
CAT_NUM = y_train.max()+1

ds_train = tf.data.Dataset.from_tensor_slices((x_train,y_train)) \
          .shuffle(buffer_size = 1000).batch(BATCH_SIZE) \
          .prefetch(tf.data.experimental.AUTOTUNE).cache()
   
ds_test = tf.data.Dataset.from_tensor_slices((x_test,y_test)) \
          .shuffle(buffer_size = 1000).batch(BATCH_SIZE) \
          .prefetch(tf.data.experimental.AUTOTUNE).cache()
          

In [9]:
tf.keras.backend.clear_session()

def create_model():
    
    model = models.Sequential()

    model.add(layers.Embedding(MAX_WORDS,7,input_length=MAX_LEN))
    model.add(layers.Conv1D(filters = 64,kernel_size = 5,activation = "relu"))
    model.add(layers.MaxPool1D(2))
    model.add(layers.Conv1D(filters = 32,kernel_size = 3,activation = "relu"))
    model.add(layers.MaxPool1D(2))
    model.add(layers.Flatten())
    model.add(layers.Dense(CAT_NUM,activation = "softmax"))
    return(model)

model = create_model()
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 300, 7)            216874    
_________________________________________________________________
conv1d (Conv1D)              (None, 296, 64)           2304      
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 148, 64)           0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 146, 32)           6176      
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 73, 32)            0         
_________________________________________________________________
flatten (Flatten)            (None, 2336)              0         
_________________________________________________________________
dense (Dense)                (None, 46)                1

### 4. Model Training


In [10]:
optimizer = optimizers.Nadam()
loss_func = losses.SparseCategoricalCrossentropy()

train_loss = metrics.Mean(name='train_loss')
train_metric = metrics.SparseCategoricalAccuracy(name='train_accuracy')

valid_loss = metrics.Mean(name='valid_loss')
valid_metric = metrics.SparseCategoricalAccuracy(name='valid_accuracy')

@tf.function
def train_step(model, features, labels):
    with tf.GradientTape() as tape:
        predictions = model(features,training = True)
        loss = loss_func(labels, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    train_loss.update_state(loss)
    train_metric.update_state(labels, predictions)
    
@tf.function
def valid_step(model, features, labels):
    predictions = model(features)
    batch_loss = loss_func(labels, predictions)
    valid_loss.update_state(batch_loss)
    valid_metric.update_state(labels, predictions)
    

def train_model(model,ds_train,ds_valid,epochs):
    for epoch in tf.range(1,epochs+1):
        
        for features, labels in ds_train:
            train_step(model,features,labels)

        for features, labels in ds_valid:
            valid_step(model,features,labels)

        logs = 'Epoch={},Loss:{},Accuracy:{},Valid Loss:{},Valid Accuracy:{}'
        
        if epoch%1 ==0:
            printbar()
            tf.print(tf.strings.format(logs,
            (epoch,train_loss.result(),train_metric.result(),valid_loss.result(),valid_metric.result())))
            tf.print("")
            
        train_loss.reset_states()
        valid_loss.reset_states()
        train_metric.reset_states()
        valid_metric.reset_states()

train_model(model,ds_train,ds_test,10)

Epoch=1,Loss:1.99834275,Accuracy:0.477733254,Valid Loss:1.65038812,Valid Accuracy:0.574799657

Epoch=2,Loss:1.45043743,Accuracy:0.62391448,Valid Loss:1.49970984,Valid Accuracy:0.631789863

Epoch=3,Loss:1.14211118,Accuracy:0.705299497,Valid Loss:1.49788928,Valid Accuracy:0.650044501

Epoch=4,Loss:0.851342082,Accuracy:0.778779805,Valid Loss:1.65619576,Valid Accuracy:0.650489748

Epoch=5,Loss:0.615026534,Accuracy:0.847138703,Valid Loss:1.90303969,Valid Accuracy:0.651380241

Epoch=6,Loss:0.459995359,Accuracy:0.890002251,Valid Loss:2.13435435,Valid Accuracy:0.653606415

Epoch=7,Loss:0.368590444,Accuracy:0.912937,Valid Loss:2.25343442,Valid Accuracy:0.650489748

Epoch=8,Loss:0.31209,Accuracy:0.925406396,Valid Loss:2.35734868,Valid Accuracy:0.651825488

Epoch=9,Loss:0.2767061,Accuracy:0.933533728,Valid Loss:2.45272017,Valid Accuracy:0.651825488

Epoch=10,Loss:0.249817386,Accuracy:0.937653065,Valid Loss:2.54394341,Valid Accuracy:0.644256473

