Hi everyone! In this notebook we will learn how to train a model with eager mode of Tensorflow 2.0. For teaching objectives, we just focus on implementing the training phase with eager mode. A more comprehensive training procedure is given in KerasTraining.ipynb.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
import utils
import matplotlib.pyplot as plt
print(tf.__version__)

In [None]:
class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

Preparing Data

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32').reshape((-1,28,28,1))
x_test = x_test.astype('float32').reshape((-1,28,28,1))
x_train /= 255.0
x_test /= 255.0
trainDataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))

Preparing a simple Model.

In [None]:
cnnModel = keras.Sequential()
cnnModel.add(layers.Conv2D(input_shape=[28,28,1],filters = 32, kernel_size = 3, strides = 1,
                       activation = 'relu'))
cnnModel.add(layers.Conv2D(filters = 32, kernel_size = 3, strides = 1,
                       activation = 'relu'))
cnnModel.add(layers.MaxPool2D(pool_size = 2, strides = 2))
cnnModel.add(layers.Conv2D(filters = 64, kernel_size = 3, strides = 1,
                       activation = 'relu'))
cnnModel.add(layers.Conv2D(filters = 64, kernel_size = 3, strides = 1,
                       activation = 'relu'))
cnnModel.add(layers.MaxPool2D(pool_size = 2, strides = 2))
cnnModel.add(layers.Flatten())
cnnModel.add(layers.Dense(units = 512, activation = 'relu'))
cnnModel.add(layers.Dense(units = 10, activation = 'softmax'))
cnnModel.summary()

Necessary components in training procedure: <b>Optimizer</b>, <b>loss</b> and <b>metrics</b>.

In [None]:
optimizer = tf.keras.optimizers.RMSprop()

compute_loss = tf.keras.losses.SparseCategoricalCrossentropy()

compute_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()

Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. During eager execution, use tf.GradientTape to trace operations for computing gradients later.

All forward-pass operations get recorded to a "tape". To compute the gradient, play the tape backwards and then discard. So a particular tf.GradientTape can only compute gradient once; subsequent calls throw a runtime error.

In [None]:
def train_one_step(model, optimizer, x, y):
    with tf.GradientTape() as tape:
        logits = model(x)
        loss = compute_loss(y, logits)

    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    compute_accuracy(y, logits)
    return loss

tf.function annotation will construct a graph by packing operations in the function, which may reduce the running time. However, more time is needed for initializing the graph.

In [None]:
subTrainDataset = trainDataset.batch(64).take(1000)

In [None]:
@tf.function
def train(model, optimizer, trainDataset):
    step =  0
    for x,y in subTrainDataset:
        step += 1
        loss = train_one_step(model, optimizer, x, y)
        if tf.equal(step % 50, 0):
            tf.print("loss: ", loss, " accuracy :", compute_accuracy.result())

In [None]:
train(cnnModel, optimizer, trainDataset)